-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-ASCII digits #74
Comments
Good point, thanks for contributing. If I use these unicode digits, I'll need to find some way how to compare the string segments that are composed of unicode digits. Essentially "parsing" the unicode digits into numbers and comparing them. Currently if I only consider 0-9, the comparison is trivial and fast, and the number parsing doesn't even occur. I'm not sure if the current simple comparison of digit values would work well enough. But I suppose it might work better than just treating unicode digits as "other characters". I like your example comparing results to |
I see that the Windows compare treats ੨ as something between 2 and 3. It would be interesting to find some simple mechanism that will let me do the same thing fast: A |
Feel free to do so, the code is from https://github.com/icsharpcode/ILSpy/blob/master/ILSpy/TreeNodes/NaturalStringComparer.cs - we were looking for options to no longer use a native import. That is when we were like "Wait, Unicode is more than 0-9". |
This is now implemented in c303896 It is released as version 4.3.0 (https://github.com/tompazourek/NaturalSort.Extension/releases/tag/4.3.0) In case you find discrepancies, please file new issues. Thank you again for contributing with this idea, it wouldn't have happened without you. |
NaturalSort.Extension/src/NaturalSort.Extension/NaturalSortComparer.cs
Lines 179 to 181 in 6ec645d
There are many more Unicode codepoints that can be used as digits, as can be seen here: https://www.compart.com/en/unicode/category/Nd Each of these has a numeric value assigned, for example https://www.compart.com/en/unicode/U+0A68 (which has the value 2).
I suggest using
char.IsDigit
instead to handle this correctly.see https://github.com/christophwille/poc-oh/blob/main/src/NaturalSortTests/Program.cs for a comparison with
StrCmpLogicalW
:Input: A, A10, A11, Z, A੨, A੨੨
NaturalSort.Extensions: A, A੨, A੨੨, A10, A11, Z
StrCmpLogicalW: A, A੨, A10, A11, A੨੨, Z
The sort order of
StrCmpLogicalW
makes perfect sense if you replace ੨ with 2.The text was updated successfully, but these errors were encountered: