-
Notifications
You must be signed in to change notification settings - Fork 138
Normalize Unicode characters #340
base: master
Are you sure you want to change the base?
Conversation
6f24f26
to
e925dbc
Compare
Test plan:
macOS:
Windows:
Linux:
|
Testing on macOS the first test case passes, the second one fails 😞 |
Hmmmmm ... now I can't get either one to pass 😡 |
Had a chance to test on Linux (Ubuntu 16.04) - off of this branch, both tests passed for me (off of fuzzy-finder master, the first test passed and the second test failed). |
Tested this with the help of @50Wliu. Confirmed on Mac OS X 10.13.3 using the APFS file system, both tests fail. According to our research, APFS does not do Unicode file name normalization at all. |
@gjtorikian Can you confirm which OS version and filesystem you tested this on? |
Another helpful article on APFS file system normalization. |
I'm also on 10.13.3 and APFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Normalize to backslashes on Windows
if (process.platform === 'win32') {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paste in erstiebegrüßung. The newly-added file should appear.
Paste in erstiebegrüßung. The newly-added file should appear.
Requirements
Description of the Change
Background: different Unicode sequences can be regarded as equivalent. On Windows and Linux, filenames are converted to NFC, or where characters like
ñ
are coded as one character. On macOS, filenames are converted to NFD, whereñ
is coded asn
+◌̃
. Both forms are considered to refer to the same character.Fuzzy Finder was not taking Unicode normalization into account when searching for files. With this PR, NFD normalization for the filter query is now performed on macOS, while NFC normalization is performed for other platforms. This should allow for proper searching of filenames containing unicode characters.
Alternate Designs
NFKD and NFKC could be used instead, which would mean in addition the above,
ff
would also be treated as identical toff
.Benefits
Fixes edge cases when searching for filenames with unicode characters.
Possible Drawbacks
None.
Applicable Issues
Fixes #69.
/cc @gjtorikian I think this should fix the issue you're seeing. If you'd like to test this PR but don't know how, I can provide guidance.