Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash while generating excerpts #1064

Merged
merged 3 commits into from
Dec 29, 2020
Merged

Conversation

eshurakov
Copy link
Contributor

Closes #1055

Fix

In this PR we're fixing the crash that happens while generating excerpts. The root cause if unknown, but related to the unicode form of the string.

The fix is to convert the string to normalised form using precomposedStringWithCanonicalMapping

Test

  1. Checkout a develop branch and run the app.
  2. Create a note with the any title and the content from the following file: crash.txt
  3. Search for "t", the app should crash
  4. Update the branch to issue/1055-fix-excerpt-crash and run the app
  5. Search for "t", the app should not crash

Review

Only one developer is required to review these changes, but anyone can perform the review.

Release

RELEASE-NOTES.txt was updated in c967202 with:

Fixed a bug that caused a crash while searching #1055

@eshurakov eshurakov added bug Something isn't working. [feature] search Anything related to searching. labels Dec 29, 2020
@eshurakov eshurakov added this to the 4.30 ❄️ milestone Dec 29, 2020
Copy link
Contributor

@jleandroperez jleandroperez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful fix!!

:shipit:

@peril-automattic
Copy link

You can trigger an installable build for these changes by visiting CircleCI here.

@eshurakov
Copy link
Contributor Author

Thank you @jleandroperez!

@eshurakov eshurakov merged commit f68b4c5 into develop Dec 29, 2020
@eshurakov eshurakov deleted the issue/1055-fix-excerpt-crash branch December 29, 2020 19:18
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a failing test case? Was it the one given in the test?

As given we have the original body as these code points (as hex numbers):

["74", "a", "30bf", "a", "67", "6c", "6f", "300", "62"]

After normalization we get:

["74", "a", "30bf", "a", "67", "6c", "f2", "62"]

That dangling 0x300 is suspicious because if we were to cut off the string between it and the preceding 0x6f then we'd have an invalid Unicode sequence, which could crash the app. (though the browser is handling it fine )

Can we look at how the excerpt was being generated, how it was cutting up the body? It seems incredibly likely that the crash is due to splitting a string at non-boundaries and this fix may be a patch that leaves a lot open, things that normalization wouldn't prevent (in this case it might be that the fact that it was able to eliminate the combining grave eliminated the appearance of the bug).

Maybe we can try and recreate with some bad combinations of zero-width joiners or with characters for which normalization cannot remove the combining marks, such as with नि - \u{0928}\u{093f}.

@eshurakov
Copy link
Contributor Author

eshurakov commented Dec 30, 2020

Thank you for the comment @dmsnell!

The failing string indeed is in the test, we managed to shortened it to t\n\u{30bf}\nglo\u{0300}b. Interesting enough if we use t\n\nglo\u{0300}b as a test string, the crash doesn't happen.
We tracked the issue down to the enumerateSubstrings(in:options:using:) method of NSString (docs), which is indeed was incorrectly splitting text and causing a crash down the road.
For example running it on the following strings produce the following results:
t\nglo\u{0300}b => ["t", "glòb"]
t\n\u{30bf}\nglo\u{0300}b => ["t", "タ", "glo", "b"]

So I wouldn't say that combining marks are causing this, but rather some combination of characters in the same string that is causing it. I will file a bug with Apple after I create a new test project to show the issue.

I tried to test a string containing \u{0928}\u{093f} and it worked fine. Let's see if we get more crashes after using normalization.

@dmsnell
Copy link
Member

dmsnell commented Dec 30, 2020

Fascinating. It's surprising to me that the same suffix produced different splits - actually when I ran a test for it I got the same results, mismatching what you found here.

I'm using this simple code:

import Foundation

let s = "t\n\u{30bf}\nglo\u{0300}b"
// let s = "t\nglo\u{0300}b"
let range = s.startIndex..<s.endIndex

s.enumerateSubstrings(in:range, options: [.byWords, .localized, .substringNotRequired]) { (word, wordRange, enclosingRange, stop) in
    debugPrint(s[wordRange])
    debugPrint(s[enclosingRange])
}

Maybe we're destroying that combining grave somehow and messing it up? Since it's in the middle of the string I don't know how it could be removed.

@eshurakov
Copy link
Contributor Author

@dmsnell

actually when I ran a test for it I got the same results, mismatching what you found here

Oh, that's interesting. I was always running this test on iOS devices, but running the same test code for macOS project or straight from the command line doesn't show any issues and produces correct results.

So my guess is that the issue is caused by linked iOS frameworks 🤔

@pachlava
Copy link
Contributor

Verified on 4.30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. [feature] search Anything related to searching.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes while generating excerpts for ja, zh, ko locales
4 participants