-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sorting / comparison code so it uses Unicode code point order #52
Comments
This may be of some use: /* chosen char examples from:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort
*/
// expected unicode code point order
const expected = ['a', '\uFF3A', '\uD855\uDE51'];
console.log('expected', expected);
// use native code to sort by code point
const codePoint = ['\uFF3A', 'a', '\uD855\uDE51'];
const collator = new Intl.Collator();
codePoint.sort(collator.compare);
console.log('codePoint', codePoint);
// default native sort is by utf-16 code units
const utf16 = ['\uFF3A', 'a', '\uD855\uDE51'];
utf16.sort();
console.log('utf16', utf16);
// use 'en' locale to sort
const en = ['\uFF3A', 'a', '\uD855\uDE51'];
const enCollator = new Intl.Collator('en');
en.sort(enCollator.compare);
console.log('en', en); Output from node 16:
It could be that my default locale, which is More test values should be added and specs checked to make sure that using |
Yes, it is the case that not passing any options to
|
Other comparator functions to consider: https://stackoverflow.com/a/70137366/399274 function compareCodePoints(s1, s2) {
const len = Math.min(s1.length, s2.length);
let i = 0;
// note: `c1` unused; iteration is constructed this
// way to ensure proper character enumeration
for (const c1 of s1) {
if (i >= len) {
break;
}
const cp1 = s1.codePointAt(i);
const cp2 = s2.codePointAt(i);
const order = cp1 - cp2;
if (order !== 0) {
return order;
}
i++;
if (cp1 > 0xFFFF) {
i++;
}
}
return s1.length - s2.length;
} Other algorithms to consider here: https://icu-project.org/docs/papers/utf16_code_point_order.html |
See comments here: w3c/rdf-canon#17 (comment)
The text was updated successfully, but these errors were encountered: