Fix crash while generating excerpts #1064

eshurakov · 2020-12-29T15:53:06Z

Fix

In this PR we're fixing the crash that happens while generating excerpts. The root cause if unknown, but related to the unicode form of the string.

The fix is to convert the string to normalised form using precomposedStringWithCanonicalMapping

Test

Checkout a develop branch and run the app.
Create a note with the any title and the content from the following file: crash.txt
Search for "t", the app should crash
Update the branch to issue/1055-fix-excerpt-crash and run the app
Search for "t", the app should not crash

Review

Only one developer is required to review these changes, but anyone can perform the review.

Release

RELEASE-NOTES.txt was updated in c967202 with:

Fixed a bug that caused a crash while searching #1055

…distance in string

jleandroperez

Beautiful fix!!

peril-automattic · 2020-12-29T16:35:08Z

You can trigger an installable build for these changes by visiting CircleCI here.

eshurakov · 2020-12-29T19:18:01Z

Thank you @jleandroperez!

dmsnell

Do we have a failing test case? Was it the one given in the test?

As given we have the original body as these code points (as hex numbers):

["74", "a", "30bf", "a", "67", "6c", "6f", "300", "62"]

After normalization we get:

["74", "a", "30bf", "a", "67", "6c", "f2", "62"]

That dangling 0x300 is suspicious because if we were to cut off the string between it and the preceding 0x6f then we'd have an invalid Unicode sequence, which could crash the app. (though the browser is handling it fine l̀)

Can we look at how the excerpt was being generated, how it was cutting up the body? It seems incredibly likely that the crash is due to splitting a string at non-boundaries and this fix may be a patch that leaves a lot open, things that normalization wouldn't prevent (in this case it might be that the fact that it was able to eliminate the combining grave eliminated the appearance of the bug).

Maybe we can try and recreate with some bad combinations of zero-width joiners or with characters for which normalization cannot remove the combining marks, such as with नि - \u{0928}\u{093f}.

eshurakov · 2020-12-30T07:20:05Z

Thank you for the comment @dmsnell!

The failing string indeed is in the test, we managed to shortened it to t\n\u{30bf}\nglo\u{0300}b. Interesting enough if we use t\n\nglo\u{0300}b as a test string, the crash doesn't happen.
We tracked the issue down to the enumerateSubstrings(in:options:using:) method of NSString (docs), which is indeed was incorrectly splitting text and causing a crash down the road.
For example running it on the following strings produce the following results:
t\nglo\u{0300}b => ["t", "glòb"]
t\n\u{30bf}\nglo\u{0300}b => ["t", "タ", "glo", "b"]

So I wouldn't say that combining marks are causing this, but rather some combination of characters in the same string that is causing it. I will file a bug with Apple after I create a new test project to show the issue.

I tried to test a string containing \u{0928}\u{093f} and it worked fine. Let's see if we get more crashes after using normalization.

dmsnell · 2020-12-30T22:35:18Z

Fascinating. It's surprising to me that the same suffix produced different splits - actually when I ran a test for it I got the same results, mismatching what you found here.

I'm using this simple code:

import Foundation

let s = "t\n\u{30bf}\nglo\u{0300}b"
// let s = "t\nglo\u{0300}b"
let range = s.startIndex..<s.endIndex

s.enumerateSubstrings(in:range, options: [.byWords, .localized, .substringNotRequired]) { (word, wordRange, enclosingRange, stop) in
    debugPrint(s[wordRange])
    debugPrint(s[enclosingRange])
}

Maybe we're destroying that combining grave somehow and messing it up? Since it's in the middle of the string I don't know how it could be removed.

eshurakov · 2021-01-04T11:26:07Z

@dmsnell

actually when I ran a test for it I got the same results, mismatching what you found here

Oh, that's interesting. I was always running this test on iOS devices, but running the same test code for macOS project or straight from the command line doesn't show any issues and produces correct results.

So my guess is that the issue is caused by linked iOS frameworks 🤔

pachlava · 2021-01-19T11:51:39Z

Verified on 4.30

eshurakov added 2 commits December 29, 2020 16:33

Use precomposedStringWithCanonicalMapping to avoid crash calculating …

c3816c8

…distance in string

Update RELEASE-NOTES.txt

c967202

eshurakov added bug Something isn't working. [feature] search Anything related to searching. labels Dec 29, 2020

eshurakov added this to the 4.30 ❄️ milestone Dec 29, 2020

eshurakov requested a review from jleandroperez December 29, 2020 15:53

jleandroperez approved these changes Dec 29, 2020

View reviewed changes

Update NoteBodyExcerptTests.swift

b22ec32

eshurakov merged commit f68b4c5 into develop Dec 29, 2020

eshurakov deleted the issue/1055-fix-excerpt-crash branch December 29, 2020 19:18

dmsnell reviewed Dec 29, 2020

View reviewed changes

eshurakov mentioned this pull request May 26, 2021

Global search: some Chinese character cause CRASH Automattic/simplenote-macos#958

Open

eshurakov mentioned this pull request Nov 15, 2022

Searching UTF-8 Characters "太陽", "平均", "赤道", "太", "均", etc. Cause App to Crash #1488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crash while generating excerpts #1064

Fix crash while generating excerpts #1064

eshurakov commented Dec 29, 2020

jleandroperez left a comment

peril-automattic bot commented Dec 29, 2020

eshurakov commented Dec 29, 2020

dmsnell left a comment

eshurakov commented Dec 30, 2020 •

edited

Loading

dmsnell commented Dec 30, 2020

eshurakov commented Jan 4, 2021

pachlava commented Jan 19, 2021

Fix crash while generating excerpts #1064

Fix crash while generating excerpts #1064

Conversation

eshurakov commented Dec 29, 2020

Fix

Test

Review

Release

jleandroperez left a comment

Choose a reason for hiding this comment

peril-automattic bot commented Dec 29, 2020

eshurakov commented Dec 29, 2020

dmsnell left a comment

Choose a reason for hiding this comment

eshurakov commented Dec 30, 2020 • edited Loading

dmsnell commented Dec 30, 2020

eshurakov commented Jan 4, 2021

pachlava commented Jan 19, 2021

eshurakov commented Dec 30, 2020 •

edited

Loading