-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash while generating excerpts #1064
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful fix!!
You can trigger an installable build for these changes by visiting CircleCI here. |
Thank you @jleandroperez! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a failing test case? Was it the one given in the test?
As given we have the original body
as these code points (as hex numbers):
["74", "a", "30bf", "a", "67", "6c", "6f", "300", "62"]
After normalization we get:
["74", "a", "30bf", "a", "67", "6c", "f2", "62"]
That dangling 0x300
is suspicious because if we were to cut off the string between it and the preceding 0x6f
then we'd have an invalid Unicode sequence, which could crash the app. (though the browser is handling it fine l̀
)
Can we look at how the excerpt was being generated, how it was cutting up the body? It seems incredibly likely that the crash is due to splitting a string at non-boundaries and this fix may be a patch that leaves a lot open, things that normalization wouldn't prevent (in this case it might be that the fact that it was able to eliminate the combining grave eliminated the appearance of the bug).
Maybe we can try and recreate with some bad combinations of zero-width joiners or with characters for which normalization cannot remove the combining marks, such as with नि - \u{0928}\u{093f}
.
Thank you for the comment @dmsnell! The failing string indeed is in the test, we managed to shortened it to So I wouldn't say that combining marks are causing this, but rather some combination of characters in the same string that is causing it. I will file a bug with Apple after I create a new test project to show the issue. I tried to test a string containing |
Fascinating. It's surprising to me that the same suffix produced different splits - actually when I ran a test for it I got the same results, mismatching what you found here. I'm using this simple code: import Foundation
let s = "t\n\u{30bf}\nglo\u{0300}b"
// let s = "t\nglo\u{0300}b"
let range = s.startIndex..<s.endIndex
s.enumerateSubstrings(in:range, options: [.byWords, .localized, .substringNotRequired]) { (word, wordRange, enclosingRange, stop) in
debugPrint(s[wordRange])
debugPrint(s[enclosingRange])
} Maybe we're destroying that combining grave somehow and messing it up? Since it's in the middle of the string I don't know how it could be removed. |
Oh, that's interesting. I was always running this test on iOS devices, but running the same test code for macOS project or straight from the command line doesn't show any issues and produces correct results. So my guess is that the issue is caused by linked iOS frameworks 🤔 |
Verified on 4.30 |
Closes #1055
Fix
In this PR we're fixing the crash that happens while generating excerpts. The root cause if unknown, but related to the unicode form of the string.
The fix is to convert the string to normalised form using
precomposedStringWithCanonicalMapping
Test
issue/1055-fix-excerpt-crash
and run the appReview
Only one developer is required to review these changes, but anyone can perform the review.
Release
RELEASE-NOTES.txt
was updated in c967202 with: