1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
|
unicode for Debian
==================
This package is the Debian version of unicode, a C++ library for Unicode encoding.
CLI interface (package unicode-tools)
-------------------------------------
* unicode-recode
Usage: recode <from-format> <from-file> <to-format> <to-file>
Format:
UTF-8 UTF-8
UTF-16 UTF-16, native endian
UTF-16LE UTF-16, little endian
UTF-16BE UTF-16, big endian
UTF-32 UTF-32, native endian
UTF-32LE UTF-32, little endian
UTF-32BE UTF-32, big endian
ISO-8859-1 ISO-8859-1 (Latin-1)
ISO-8859-15 ISO-8859-15 (Latin-9)
Exit code: 0 if valid, 1 otherwise.
* unicode-validate
Usage: validate <format> <file>
Format:
UTF-8 UTF-8
UTF-16 UTF-16, big or little endian
UTF-16LE UTF-16, little endian
UTF-16BE UTF-16, big endian
UTF-32 UTF-32, big or little endian
UTF-32LE UTF-32, little endian
UTF-32BE UTF-32, big endian
Exit code: 0 if valid, 1 otherwise.
C++ interface (package libunicode-dev)
--------------------------------------
Example:
#include <unicode.h>
...
std::string utf8_value {u8"äöü"};
std::u16string utf16_value{unicode::convert<char, char16_t>(utf8_value)};
And for C++20:
std::u8string utf8_value {u8"äöü"};
std::u16string utf16_value{unicode::convert<char8_t, char16_t>(utf8_value)};
The following encodings are implicitly deducted from types:
* char resp. char8_t (C++20): UTF-8
* char16_t: UTF-16
* char32_t: UTF-32
You can specify different container types directly:
std::deque<char> utf8_value {...};
std::list<wchar_t> utf16_value{unicode::convert<std::deque<char>, std::list<wchar_t>>(utf8_value)};
Explicit encoding specification is also possible:
std::string value {"äöü"};
std::u32string utf32_value{unicode::convert<unicode::ISO_8859_1, unicode::UTF_32>(value)};
Supported encodings are:
* unicode::UTF_8
* unicode::UTF_16
* unicode::UTF_32
* unicode::ISO_8859_1
* unicode::ISO_8859_15
Supported basic types:
* char
* char8_t (C++20)
* wchar_t (UTF-16 on Windows, UTF-32 on Linux)
* char16_t
* char32_t
* uint8_t, int8_t
* uint16_t, int16_t
* uint32_t, int32_t
* basically, all basic 8-bit, 16-bit and 32-bit that can encode
UTF-8, UTF-16 and UTF-32, respectively.
Supported container types:
* All std container types that can be iterated (vector, list, deque, array)
* Source and target containers can be different container types
Validation can be done like this:
bool valid{unicode::is_valid_utf<char16_t>(utf16_value)};
Or via explicit encoding specification:
bool valid{unicode::is_valid_utf<unicode::UTF_8>(utf8_value)};
Contact
-------
Reichwein IT <mail@reichwein.it>
|