Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalisation bug with Hangul / symbols sequence #10

Closed
stedolan opened this issue May 26, 2017 · 4 comments
Closed

Normalisation bug with Hangul / symbols sequence #10

stedolan opened this issue May 26, 2017 · 4 comments
Labels

Comments

@stedolan
Copy link

I re-ran the fuzzing job after #8 was fixed, and it found another case:

                   s: [U+c100 U+20d2 U+11c1 U+11c1 "섀⃒ᇁᇁ"]
            toNFC(s): [U+c11a U+20d2 U+11c1 "섚⃒ᇁ"]
            toNFD(s): [U+1109 U+1164 U+20d2 U+11c1 U+11c1 "섀⃒ᇁᇁ"]
     toNFD(toNFC(s)): [U+1109 U+1164 U+11c1 U+20d2 U+11c1 "섚⃒ᇁ"]

The last two lines should be equal.

@dbuenzli dbuenzli added the bug label May 26, 2017
@dbuenzli
Copy link
Owner

Thanks. It seems that I broke something in the 9459c90 fix since this was previously correct:

> unftrip -a --nfc
섀⃒ᇁᇁ
U+C100
U+20D2
U+11C1
U+11C1
U+000A

@dbuenzli
Copy link
Owner

dbuenzli commented May 26, 2017

The bug introduced was that I would combine two characters with ccc=0 even if there was a character between them that has ccc<>0 which not what the composition algorithm mandates. In this case I would compose U+C100 with U+11C1 which yields U+C11A but the U+20D2 between the two prevents this.

Further testing and breakage welcome.

@stedolan
Copy link
Author

For the record, further fuzzing revealed no more bugs. I ran afl-fuzz for a few days (at 10k tests/sec), and it got to pending=0 with no new paths found for more than a day (which is the closest that afl-fuzz ever gets to saying that it's "done"). The test was to check that these equations hold on arbitrary input sequences.

@dbuenzli
Copy link
Owner

Cool, thanks for the report !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants