Skip to content

Encoding examples

Tom Honermann edited this page Jul 2, 2017 · 2 revisions

Encoding examples


Basic encoding

Text_view output iterators allow encoding a code unit sequence by writing characters (a class object with an associated character set and code point value) via an iterator. For example:

using CT = utf8_encoding::character_type;
std::string s;
auto it = make_otext_iterator<utf8_encoding>(std::back_insert_iterator<std::string>{s});
*it = CT{0x00F8}; // Encodes U+00F8 as \xC3\xB8.
assert(s[0] == '\xC3');
assert(s[1] == '\xB8');

Error handling with exceptions

By default, exceptions are thrown when errors occur during encoding operations.

using CT = utf8_encoding::character_type;
std::string s;
auto it = make_otext_iterator<utf8_encoding>(std::back_insert_iterator<std::string>{s});
*it = CT{0xD800}; // UTF-16 high surrogate; not a valid character.  throws text_encode_error.

Error handling with substitutions

Text_view's error policies allow creating output iterators that substitute a character set specific substitution character when errors are encountered during an encoding operation. For example:

using CT = utf8_encoding::character_type;
std::string s;
auto it = make_otext_iterator<utf8_encoding>(std::back_insert_iterator<std::string>{s});
*it = CT{0xD800}; // UTF-16 high surrogate; not a valid character.
assert(s[0] == '\xEF');  // The UTF-8 code unit sequence 0xEF 0xBF 0xBD is the
assert(s[1] == '\xBF');  // encoding of U+FFFD, the Unicode substitution character.
assert(s[2] == '\xBD');