Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] unicode normalization problem #444

Closed
GomPam opened this issue Oct 24, 2018 · 27 comments
Closed

[Bug] unicode normalization problem #444

GomPam opened this issue Oct 24, 2018 · 27 comments
Milestone

Comments

@GomPam
Copy link

GomPam commented Oct 24, 2018

mac use to NFD and windows is NFC mechanism. so, we get this normaliztion problem.
user text (like commit message, author..) display different between mac and windows if not used alphabet.
can you fix it?

Link: https://en.wikipedia.org/wiki/Unicode_equivalence

@GomPam GomPam changed the title unicode normalization problem [Bug] unicode normalization problem Oct 24, 2018
@alkee
Copy link

alkee commented Jan 25, 2019

Voting up.
I need really this as Korean user in file names and commit messages.

@kueecc
Copy link

kueecc commented Feb 7, 2019

+1

1 similar comment
@sckimos
Copy link

sckimos commented Mar 20, 2019

+1

@kueecc
Copy link

kueecc commented May 2, 2019

Any updates?

@DanPristupov
Copy link
Contributor

DanPristupov commented May 2, 2019

How is that supposed to be fixed? Do other MacOS applications work properly?

@GomPam
Copy link
Author

GomPam commented May 3, 2019

@DanPristupov
As far as I know, SourceTree is known to be fine.

I find another git repo issue like this problem
nodejs

nodejs/node#2165
https://nodejs.org/en/docs/guides/working-with-different-filesystems/

@alkee
Copy link

alkee commented May 10, 2019

simply in swift, just make sure always use NFC encoding from user input like commit message and file path

   let nfd = "맥과 윈도우즈는 한글 저장방식이 다릅니다.ext" // string by user input
   let nfc = nfd.precomposedStringWithCanonicalMapping // converting to nfc
   // now, you can use `nfc` instead of `nfd`

little more detailed in

@GomPam
Copy link
Author

GomPam commented May 10, 2019

@alkee good infomation 👍
@DanPristupov need more infomation?
we need a feedback :)

@DanPristupov
Copy link
Contributor

DanPristupov commented May 11, 2019

I'm still not really sure why the problem happens and what should be a correct solution for that (from the Apple point of view). Are developers supposed to convert every input string? What will happen with other languages after that?

However I made a small test.

Made a "맥과 윈도우즈는 한글 저장방식이 다릅니다.ext" commit in Mac, and in Windows it became unreadable as you said.
Made a "맥과 윈도우즈는 한글 저장방식이 다릅니다.ext" commit in Windows. Everything seems to be correct on both platforms.

However passing strings in nfc form (using precomposedStringWithCanonicalMapping) to git process doesn't change anything (I used NSProcess().arguments). The string is still corrupted.

Docs (https://developer.apple.com/documentation/foundation/nstask/1414375-launchedtaskwithlaunchpath).

The NSTask object converts both path and the strings in arguments to appropriate C-style strings (using fileSystemRepresentation) before passing them to the task via argv[] .

@alkee
Copy link

alkee commented May 13, 2019

Sorry for the not worked way. It's OS specific behavior for unicode, so there is no correct solution. In the point of git (not an Apple), they support configuration for the issue.

Can you test one more thing with the git config core.precomposeunicode ?

git config --global core.precomposeunicode true

It worked well with terminal git and happy to other platforms.

@GomPam
Copy link
Author

GomPam commented May 13, 2019

@DanPristupov
um.. i make that sample repository
https://github.com/GomPam/UnicodeNormalizeTest

  • MacOSX
    스크린샷 2019-05-13 오전 10 25 46

  • Windows
    스크린샷 2019-05-13 오전 10 26 19

i use to same sentence.

유니코드 정규화 확인을 위한 문장. And Check With Alphabet.

@GomPam
Copy link
Author

GomPam commented May 13, 2019

@alkee "core.precomposeunicode" config is affects filenames.

https://git-scm.com/docs/git-config#Documentation/git-config.txt-coreprecomposeUnicode
... When core.precomposeUnicode=true, Git reverts the unicode decomposition of filenames done by Mac OS.

so... i think its not working.. :(

@GomPam
Copy link
Author

GomPam commented May 15, 2019

@DanPristupov
i make little program.
just print "arg + hex string"
and then i understand what you said.

However passing strings in nfc form (using precomposedStringWithCanonicalMapping) to git process doesn't change anything (I used NSProcess().arguments). The string is still corrupted.
(#444 (comment))

so
i think a trick.
make temp shellscript then execute it.
maybe i think this work.

===
updated.
test done. its works.
https://gist.github.com/GomPam/3986e317774085e3fb25cb5c00be0dd0

@GomPam
Copy link
Author

GomPam commented May 28, 2019

@DanPristupov
can we use this way.. or not?
update plz :)

@DanPristupov
Copy link
Contributor

I will not wrap every git request into an sh script.

However I want to try to pass the commit message using git commit --file=... command. May be it will be encoded properly.

@GomPam
Copy link
Author

GomPam commented May 29, 2019

@DanPristupov
that way is better than me and works right!
when can we see apply that??

====

https://gist.github.com/GomPam/a801af883073f2cbad5747de5eca6443
https://github.com/GomPam/UnicodeNormalizeTest

캡처

@hrs-o
Copy link

hrs-o commented Jun 25, 2019

Is #656 the same problem?

@GomPam
Copy link
Author

GomPam commented Jun 25, 2019

@hrs-o
i think probably same problem.

@GomPam
Copy link
Author

GomPam commented Jul 26, 2019

@DanPristupov when do you plan to include this issue in the milestone?
1.0.82? 1.0.83? or more later?

@jun0683
Copy link

jun0683 commented Nov 21, 2019

+1

@DanPristupov
Copy link
Contributor

Hi. Could you try if the problem is fixed in that version (1.0.87.5): https://fork.dev/update/files/Fork.dmg?

It's not available for update yet.

@GomPam
Copy link
Author

GomPam commented Dec 11, 2019

@DanPristupov
its been a while since your last comment :)

its probably successfully working!

Commit Message:

[Mac OSX] TestCommit 1.0.87.5 - 한글, ひらがな, 你好吗, Alphabet
한글,
ひらがな,
你好吗,
Alphabet

스크린샷 2019-12-11 오전 9 29 43

On Windows:
스크린샷 2019-12-11 오전 9 32 48

Thanks, @DanPristupov

@GomPam GomPam closed this as completed Dec 11, 2019
@DanPristupov DanPristupov reopened this Dec 11, 2019
@DanPristupov
Copy link
Contributor

DanPristupov commented Dec 11, 2019

Let's wait for the 1.0.88 update. I'm not sure if this change will not make any problems for other users.

However unicode docs (http://www.unicode.org/faq/normalization.html#2) say it should be OK:

Q: Which forms of normalization should I support?

A: The choice of which to use depends on the particular program or system. NFC is the best form for general text, since it is more compatible with strings converted from legacy encodings. NFKC is the preferred form for identifiers, especially where there are security concerns (see UTR #36). NFD and NFKD are most useful for internal processing.

@DanPristupov DanPristupov added this to the 1.0.88 milestone Dec 13, 2019
@DanPristupov
Copy link
Contributor

I just released 1.0.88. Could you double check that everything still works please?

@jun0683
Copy link

jun0683 commented Dec 16, 2019

yes~ it is fixed. awesome~

@GomPam
Copy link
Author

GomPam commented Dec 16, 2019

@DanPristupov
Hi. i think probably working great
all (google supported language) of the unicode commit message is fine

commit on mac (fork 1.0.88)
GomPam/UnicodeNormalizeTest@df02f40

and then windows
스크린샷 2019-12-16 오전 10 51 42

@kueecc
Copy link

kueecc commented Dec 17, 2019

It works perfectly! Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants