Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Normalize Unicode characters #340

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

winstliu
Copy link
Contributor

@winstliu winstliu commented Feb 19, 2018

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • All new code requires tests to ensure against regressions

Description of the Change

Background: different Unicode sequences can be regarded as equivalent. On Windows and Linux, filenames are converted to NFC, or where characters like ñ are coded as one character. On macOS, filenames are converted to NFD, where ñ is coded as n + ◌̃. Both forms are considered to refer to the same character.

Fuzzy Finder was not taking Unicode normalization into account when searching for files. With this PR, NFD normalization for the filter query is now performed on macOS, while NFC normalization is performed for other platforms. This should allow for proper searching of filenames containing unicode characters.

Alternate Designs

NFKD and NFKC could be used instead, which would mean in addition the above, would also be treated as identical to ff.

Benefits

Fixes edge cases when searching for filenames with unicode characters.

Possible Drawbacks

None.

Applicable Issues

Fixes #69.

/cc @gjtorikian I think this should fix the issue you're seeing. If you'd like to test this PR but don't know how, I can provide guidance.

@winstliu
Copy link
Contributor Author

winstliu commented Feb 19, 2018

Test plan:

  1. Create a new file called erstiebegrüßung.html.
  2. Open Fuzzy Finder using the fuzzy-finder:toggle-file-finder command.

macOS:

  • Paste in erstiebegrüßung. The newly-added file should appear.
  • Paste in erstiebegrüßung. The newly-added file should appear.

Windows:

  • Paste in erstiebegrüßung. The newly-added file should appear.
  • Paste in erstiebegrüßung. The newly-added file should appear.

Linux:

  • Paste in erstiebegrüßung. The newly-added file should appear.
  • Paste in erstiebegrüßung. The newly-added file should appear.

@lee-dohm
Copy link
Contributor

lee-dohm commented Mar 8, 2018

Testing on macOS the first test case passes, the second one fails 😞

@lee-dohm
Copy link
Contributor

lee-dohm commented Mar 8, 2018

Hmmmmm ... now I can't get either one to pass 😡

@rsese
Copy link

rsese commented Mar 8, 2018

Had a chance to test on Linux (Ubuntu 16.04) - off of this branch, both tests passed for me (off of fuzzy-finder master, the first test passed and the second test failed).

@lee-dohm
Copy link
Contributor

lee-dohm commented Mar 9, 2018

Tested this with the help of @50Wliu. Confirmed on Mac OS X 10.13.3 using the APFS file system, both tests fail. According to our research, APFS does not do Unicode file name normalization at all.

@lee-dohm
Copy link
Contributor

lee-dohm commented Mar 9, 2018

@gjtorikian Can you confirm which OS version and filesystem you tested this on?

@lee-dohm
Copy link
Contributor

lee-dohm commented Mar 9, 2018

Another helpful article on APFS file system normalization.

@gjtorikian
Copy link
Contributor

I'm also on 10.13.3 and APFS.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // Normalize to backslashes on Windows	
    if (process.platform === 'win32') {	

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // Normalize to backslashes on Windows	
    if (process.platform === 'win32') {	

Commit 05b0b51
#340: Normalize Unicode characters

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paste in erstiebegrüßung. The newly-added file should appear.
Paste in erstiebegrüßung. The newly-added file should appear.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filenames with UTF-8 characters are not found
4 participants