-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for UTF-16 #30
Comments
There is no great need for a converter, since The plain UTF-16 converter should not be an |
There is a difference between the The former could decode the bytes |
Exactly, that's what i was alluding to with a UTF-16 little/big-endian converter, which is a byte to string converter, not a code-unit to string converter. Your example appears to be big-endian (aka network-order). I would expect a plain |
It seems that we do have a few uses of UTF-16 decoding internally. It's definitely not a high priority (there are literally about 4 uses of this), but we'll probably want to have something in |
This is very useful for Emoji support. Simple task to highlight search emoji in the text text to much time/code lines without UTF16 support |
I extracted these lines of code from /// Invalid codepoints or encodings may be substituted with the value U+fffd.
const int _UNICODE_REPLACEMENT_CHARACTER_CODEPOINT = 0xfffd;
const int _UNICODE_BYTE_ZERO_MASK = 0xff;
const int _UNICODE_BYTE_ONE_MASK = 0xff00;
const int _UNICODE_VALID_RANGE_MAX = 0x10ffff;
const int _UNICODE_PLANE_ONE_MAX = 0xffff;
const int _UNICODE_UTF16_RESERVED_LO = 0xd800;
const int _UNICODE_UTF16_RESERVED_HI = 0xdfff;
const int _UNICODE_UTF16_OFFSET = 0x10000;
const int _UNICODE_UTF16_SURROGATE_UNIT_0_BASE = 0xd800;
const int _UNICODE_UTF16_SURROGATE_UNIT_1_BASE = 0xdc00;
const int _UNICODE_UTF16_HI_MASK = 0xffc00;
const int _UNICODE_UTF16_LO_MASK = 0x3ff;
/// Produce a list of UTF-16LE encoded bytes. This method produces UTF-16LE
/// bytes with no BOM.
List<int> encodeUtf16le(String str) {
final utf16CodeUnits = _stringToUtf16CodeUnits(str);
final encoding = List<int>.filled(2 * utf16CodeUnits.length, -1);
var i = 0;
for (final unit in utf16CodeUnits) {
encoding[i++] = unit & _UNICODE_BYTE_ZERO_MASK;
encoding[i++] = (unit & _UNICODE_BYTE_ONE_MASK) >> 8;
}
return encoding;
}
List<int> _stringToUtf16CodeUnits(String str) {
return codepointsToUtf16CodeUnits(str.codeUnits);
}
/// Encode code points as UTF16 code units.
List<int> codepointsToUtf16CodeUnits(List<int> codepoints,
{int offset = 0,
int? length,
int replacementCodepoint = _UNICODE_REPLACEMENT_CHARACTER_CODEPOINT}) {
final listRange = codepoints;
var encodedLength = 0;
for (final value in listRange) {
if ((value >= 0 && value < _UNICODE_UTF16_RESERVED_LO) ||
(value > _UNICODE_UTF16_RESERVED_HI && value <= _UNICODE_PLANE_ONE_MAX)) {
encodedLength++;
} else if (value > _UNICODE_PLANE_ONE_MAX &&
value <= _UNICODE_VALID_RANGE_MAX) {
encodedLength += 2;
} else {
encodedLength++;
}
}
final codeUnitsBuffer = List<int>.filled(encodedLength, -1);
var j = 0;
for (final value in listRange) {
if ((value >= 0 && value < _UNICODE_UTF16_RESERVED_LO) ||
(value > _UNICODE_UTF16_RESERVED_HI && value <= _UNICODE_PLANE_ONE_MAX)) {
codeUnitsBuffer[j++] = value;
} else if (value > _UNICODE_PLANE_ONE_MAX &&
value <= _UNICODE_VALID_RANGE_MAX) {
var base = value - _UNICODE_UTF16_OFFSET;
codeUnitsBuffer[j++] = _UNICODE_UTF16_SURROGATE_UNIT_0_BASE +
((base & _UNICODE_UTF16_HI_MASK) >> 10);
codeUnitsBuffer[j++] =
_UNICODE_UTF16_SURROGATE_UNIT_1_BASE + (base & _UNICODE_UTF16_LO_MASK);
} else {
codeUnitsBuffer[j++] = replacementCodepoint;
}
}
return codeUnitsBuffer;
} |
The only way we had to decode UTF-16 previously was
package:utf
which has been discontinued. We should add autf16
encoder and decoder here.The text was updated successfully, but these errors were encountered: