Unicode LibraryUnicode encoding, conversion and validation is well understood and supported in C++ and Linux. So why providing another Unicode library for C++?
Currently (C++20 being the latest C++ standard), Unicode conversion is provided in the C++ standard library, but marked as deprecated. boost::locale provides means for Unicode conversion, but as the name suggests, it is locale dependent and using boost::locale can add dozens of megabytes to a simple executable just because of Unicode conversion which should not directly depend on locales.
Therefore, this library is provided as a C++17 and C++20 conformant way for the basic task of converting between UTF-8 (default encoding under Linux), UTF-16 (default encoding under Windows) and UTF-32 (default encoding in Qt, and generally in GUI/typesetting like FreeType).
The command line interface is just a runtime application of the provided library. There are other tools available that offer the same functionality, see below.
Usage: unicode-recode <from-format> <from-file> <to-format> <to-file>
Format:
UTF-8 UTF-8
UTF-16 UTF-16, native endian
UTF-16LE UTF-16, little endian
UTF-16BE UTF-16, big endian
UTF-32 UTF-32, native endian
UTF-32LE UTF-32, little endian
UTF-32BE UTF-32, big endian
ISO-8859-1 ISO-8859-1 (Latin-1)
ISO-8859-15 ISO-8859-15 (Latin-9)
Exit code: 0 if valid, 1 otherwise.
Usage: unicode-validate <format> <file>
Format:
UTF-8 UTF-8
UTF-16 UTF-16, big or little endian
UTF-16LE UTF-16, little endian
UTF-16BE UTF-16, big endian
UTF-32 UTF-32, big or little endian
UTF-32LE UTF-32, little endian
UTF-32BE UTF-32, big endian
Exit code: 0 if valid, 1 otherwise.
Example:
#include <unicode.h>
...
std::string utf8_value {"äöü"};
std::u16string utf16_value{unicode::convert<char, char16_t>(utf8_value)};
And for C++20:
std::u8string utf8_value {u8"äöü"};
std::u16string utf16_value{unicode::convert<char8_t, char16_t>(utf8_value)};
The following encodings are implicitly deducted from types:
* char resp. char8_t (C++20): UTF-8
* char16_t: UTF-16
* char32_t: UTF-32
You can specify different container types directly:
std::deque<char> utf8_value {...};
std::list<wchar_t> utf16_value{unicode::convert<std::deque<char>, std::list<wchar_t>>(utf8_value)};
Explicit encoding specification is also possible:
std::u8string value {"äöü"};
std::u16string utf16_value{unicode::convert<unicode::UTF_8, unicode::UTF_16>(value)};
std::string value {"äöü"};
std::u32string utf32_value{unicode::convert<unicode::ISO_8859_1, unicode::UTF_32>(value)};
Supported encodings are:
* unicode::UTF_8
* unicode::UTF_16
* unicode::UTF_32
* unicode::ISO_8859_1
* unicode::ISO_8859_15
Supported basic types:
* char
* char8_t (C++20)
* wchar_t (UTF-16 on Windows, UTF-32 on Linux)
* char16_t
* char32_t
* uint8_t, int8_t
* uint16_t, int16_t
* uint32_t, int32_t
* basically, all basic 8-bit, 16-bit and 32-bit that can encode
UTF-8, UTF-16 and UTF-32, respectively.
Supported container types:
* All std container types that can be iterated (vector, list, deque)
* Source and target containers can be different container types
Validation can be done like this:
bool valid{unicode::is_valid_utf<char16_t>(utf16_value)};
Or via explicit encoding specification:
bool valid{unicode::is_valid_utf<unicode::UTF_8>(utf8_value)};
Download is available from https://www.reichwein.it/download
Installation via Debian's APT mechanism is supported for the following operating systems:/etc/apt/sources.list:
# For Debian 11: deb http://www.reichwein.it/debian/ stable debian11 # For Ubuntu 21.04: deb http://www.reichwein.it/debian/ stable ubuntu2104 # For Ubuntu 21.10: deb http://www.reichwein.it/debian/ stable ubuntu2110
The package reichwein-keyring helps apt to control cryptographic trust upon the packages. It can be manually installed from the above sources.
unicode-tools (Command Line Interface, CLI) and libunicode-dev (C++ development files) via the operating system's package mechanism:
# apt-get update # apt-get install unicode-tools libunicode-dev
Source code is available at https://www.reichwein.it/download
The git repository can be browsed at https://www.reichwein.it/cgit/unicode.git/ and cloned via:
$ git clone http://reichwein.it/git/unicode
For Debian-like systems, you can use the following APT configuration. Add the respective line from the following choices to /etc/apt/sources.list:
# For Debian 11: deb-src http://www.reichwein.it/debian/ stable debian11 # For Ubuntu 21.04: deb-src http://www.reichwein.it/debian/ stable ubuntu2104 # For Ubuntu 21.10: deb-src http://www.reichwein.it/debian/ stable ubuntu2110