Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Android Invalid Span Crashes #2279

Merged
merged 11 commits into from
May 25, 2020

Conversation

mchowning
Copy link
Contributor

@mchowning mchowning commented May 20, 2020

This addresses wordpress-mobile/WordPress-Android#9832.

Related PRs:
wordpress-mobile/WordPress-Android#11965
WordPress/gutenberg#22500

Description

Android has a too-long history of Span-based crashes

Android has had a number of crashes due to the attempt to apply invalid selection indices. These can arise for different reasons, and we have written a fair bit of JS code trying to predict the specific circumstances where these errors will occur so we can avoid them. We have not been as successful as we need to be because over the last 90 days this crash has caused almost 140,000 crashes affecting 20,000 users. 😬

One of the causes of this crash is the fact that AztecAndroid removes whitespace, which would cause the JS to think the text was longer than it was on the native side. As a result, JS would sometimes try to set an index that was too large. I believe the code we had written to avoid this issue was generally avoiding the crashes here, but this specific cause is no longer an issue because a change was recently merged that stops the native side from removing the whitespace. For that reason, I actually had a PR removing those whitespace checks, but I have closed that in preference for this PR (which also removes those whitespace checks). You can read that PR description for more detail on that specific issue though.

A second cause of these crashes is that Android renders unknown HTML tags as an image span, and does not use the text in the span in the length of the text in the view (for example, <mark>some text</mark> is treated kind of like a single character on the native side, but is treated as multiple characters on the javascript side).

Approach Taken In This PR

One approach to addressing this would be to continue to try to anticipate the circumstances leading to the crash based on the text itself (i.e., check if there's an unknown tag) and calculate the proper selection indices based on the differences in the text on the JS and javascript side. This would be a tricky task with any mistakes likely resulting in many crashes for our users (as shown by the fact that we're still seeing many of these crashes despite our repeated previous attempts to fix this). A safer approach is to just check the selection index on the native side and avoid using any selection index that is outside the bounds of the text on the native side. I believe this will completely avoid any crashes due to an invalid selection index.

One concern with this approach is that it could leave the user in a bad state because of the mismatch between what JS thinks the state of the view is in and the actual state of the view on the native side. I do not think this is a concern that should stop us from using this approach, however. First, we're currently crashing the app anytime this happens, so it will be pretty difficult for us to get the user into a worse state by failing to update the selection indices. Second, not updating the selection indices when they are invalid is actually the approach that iOS uses, and it seems to have not caused any issues for them.

But We Don't Want to Just Ignore This

This is still an issue that we want to keep track of and address more robustly though. Among other things, because of the mismatched text between native and JS, it is possible to have blocks not split in the correct place (i.e., pressing enter splits the block somewhere other than where the cursor is located). So we want to still track this issue and work on addressing it more robustly even once we get the crashes to stop.

For that reason, I am also sending off an exception to Sentry that includes the information about text size, and the selection indices. In addition, I am also including the tags in any UnknownHtmlSpan classes on the text view (so for example, text with <mark>abc</mark>alksdj would send "mark"). This will both (1) let us know if this is happening in cases where there are not UnknownHtmlSpans (i.e., maybe there's another cause that I haven't uncovered) and (2) help us track the usage of different tags so we can see which tags have high usage and we may want to support in the future.

I do not believe there is any privacy concern with sending the html tag without any of the related content contained inside that tag to Sentry, but I want to call that out specifically so it does not get overlooked.

Here is an example of the exception in Sentry.

Proposing This as a Hotfix

Because I believe this is a pretty safe fix, we're relatively early in the 14.9 release cycle, and the high frequency of this crash, I think it makes sense to do this as a 1.28.1 hotfix so we can include it in the 14.9 release. These PRs are currently set up with that in mind, so if we decide to not make this a hotfix, they'll need to be retargeted and updated.

E2E Tests

I have added E2E tests for the scenarios where this was causing crashes. On one hand, these seem like a perfect case for E2E tests because they are edge cases that will not be covered much by regular testing. On the other hand, these are edge cases and if we covered every edge case with an E2E test, our test suite would take forever. I decided to go ahead and add these for now since our test suite still has a long way to go before it gets too large and if/when that happens we can always remove these tests. Definitely open to other thoughts though.

Testing

In testing this, note that even though the exception is no longer causing a crash, we are still printing the stacktrace, so you can still identify when the exception occurred. In WPAndroid release builds, the exception is also sent to Sentry.

Unknown HTML tags

Based on wordpress-mobile/WordPress-Android#9832 (comment)

Covered in a new E2E test

Create a post with the following HTML:

<!-- wp:paragraph -->
<p><unknownhtmlelement>abc</unknownhtmlelement>D</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>E</p>
<!-- /wp:paragraph -->
  1. Put your cursor at the beginning of the second paragraph block with the single E
  2. Tap the backspace key to merge the two blocks
  3. Confirm no crash and no red screen.

Multiline Paragraph Blocks From Web

From #1507
Covered by a new E2E test

  1. Create a post on Gutenberg web with a para block that ends with one or more empty line(s) (Shift+Enter)
  2. Open this new post in GB mobile
  3. Tap on the Plus symbol to add a new block below the para
  4. Put the focus into this new block and taps on Backspace to merge this new block with the one above.
  5. Confirm no crash or red screen, and the two blocks are merged

Handling spaces with Lists

From #885

Two spaces + Letter

Covered by a new E2E test

  1. Begin editing new list
  2. Add two spaces, then any letter to the first list item
  3. Tap enter with cursor at the end of the first list item to add a second list item
  4. The list should create a new list item
  5. Pressing either undo or backspace should remove the new line
  6. Confirm no crash or red screen

Three spaces

  1. Begin editing new list
  2. Add three spaces to the first list item
  3. Tap enter with cursor at the end of the first list item to add a second list item
  4. The list should create a new list item
  5. Press backspace should remove the new list item
  6. Press enter to create a new list item again
  7. Press undo to remove the new list item
  8. Confirm no crash or red screen

Letter + Three Spaces

  1. Begin editing new list
  2. Add any letter, then add three spaces (second space creates a period) to the first list item
  3. Tap enter with cursor at the end of the first list item to add a second list item
  4. The list should create a new list item
  5. Press backspace should remove the new list item
  6. Press enter to create a new list item again
  7. Press undo to remove the new list item
  8. Confirm no crash or red screen

PR submission checklist:

  • I have considered adding unit tests where possible.
  • I have considered if this change warrants user-facing release notes and have added them to RELEASE-NOTES.txt if necessary.

@mchowning
Copy link
Contributor Author

mchowning commented May 20, 2020

👋 @hypest ! I'm pretty confident that you'll have some thoughts on this, so wanted to give you a heads up that you may want to give it a look, particularly since I think it may make sense to do this as a hotfix to include it in the 14.9 release.

Also, I would appreciate if if you could confirm that you don't see any problems with sending "unknown" HTML tags to Sentry (additional context in the "But We Don't Want to Just Ignore This" section of my PR novel description)

@mchowning mchowning force-pushed the issue_9832/span_index_exceptions branch from 53d1f4d to 0cc01d6 Compare May 20, 2020 19:35
@mchowning mchowning force-pushed the issue_9832/span_index_exceptions branch 4 times, most recently from 2b798a2 to 29dd24a Compare May 20, 2020 21:39
@mchowning mchowning force-pushed the issue_9832/span_index_exceptions branch from 29dd24a to 63af455 Compare May 20, 2020 21:43
@mchowning
Copy link
Contributor Author

I just added an "Android Native Unit Tests" job to circle ci to run the HTML parsing unit test I added (hopefully we'll add more tests in the future). My intention is to make this a Required check once this gets merged to develop. Let me know if you think we may not want to do that.

Copy link
Contributor

@geriux geriux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I think that adding that exception will help us a lot to debug issues.

Cool to see more tests added for this 👏

One thing I noticed by testing some of the cases:

Three spaces

The list should create a new list item and the spaces in the item above it will be trimmed

Spaces aren't trimmed in the first line, they're kept. I think that wouldn't be an issue but figured I'd report it.

Letter + Three Spaces

The list should create a new list item and the spaces in the item above it will be trimmed

Same as above

I also did some writing flow test cases and all good 👍

I'd still wait for @hypest thoughts on this PR since I might miss some things.

@mchowning
Copy link
Contributor Author

mchowning commented May 21, 2020

Thanks @geriux ! Regarding the spaces not being trimmed: that was actually a mistake on my part in writing the test cases. Those spaces used to be trimmed, but a few weeks ago a change was made to no longer trim whitespace. I copied those test cases from before that change, and I failed to notice the references to trimming whitespace needed to be removed. Nice job catching that! I have now updated the test cases to not state that those spaces would be trimmed. 🙂

@geriux
Copy link
Contributor

geriux commented May 21, 2020

Thanks @geriux ! Regarding the spaces not being trimmed: that was actually a mistake on my part in writing the test cases. Those spaces used to be trimmed, but a few weeks ago a change was made to no longer trims whitespace. I copied those test cases from before that change, and I failed to notice the references to trimming whitespace needed to be removed. Nice job catching that! I have now updated the test cases to not state that those spaces would be trimmed. 🙂

Awesome! Thanks for clarifying, all working then 🎉

@hypest
Copy link
Contributor

hypest commented May 22, 2020

👋 @hypest ! I'm pretty confident that you'll have some thoughts on this, so wanted to give you a heads up that you may want to give it a look

Thank you Matt! Here are my thoughts just by reading the PR description. I'll have a look at the code afterwards.

One concern with this approach is that it could leave the user in a bad state because of the mismatch between what JS thinks the state of the view is in and the actual state of the view on the native side. I do not think this is a concern that should stop us from using this approach, however.

I agree that it's time to be more defensive at this point since the issue has been around for long and happening a lot.

...mismatched text between native and JS...

Yes, that was always my concern with "swallowing/avoid" the crash: Instead of a single place where the mismatch issue would appear, swallowing would spread it around and manifest it as random and seemingly unrelated writing flow issues. I'm still not 100% sure to be honest. We'll have to keep an eye open to see if we start to have more "random writing flow issues" and assess.

For that reason, I am also sending off an exception to Sentry that includes the information about text size, and the selection indices.

Let's remember though that even with that logging, we will probably not be able to connect the dots when a writing flow issue happens. The issue might be correlated to the logged entry, or not.

I do not believe there is any privacy concern with sending the html tag without any of the related content contained inside that tag to Sentry

I don't think there's a concern here. The logging is quite generic and as you mention too, it can help prioritize adding better support to the tags that matter the most.

Because I believe this is a pretty safe fix, we're relatively early in the 14.9 release cycle, and the high frequency of this crash, I think it makes sense to do this as a 1.28.1 hotfix so we can include it in the 14.9 release.

+1

I decided to go ahead and add these for now since our test suite still has a long way to go before it gets too large and if/when that happens we can always remove these tests.

+1

RELEASE-NOTES.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@hypest hypest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a pass Matt and this looks good. I've left a few non-major comments.

I wonder though, can you check if the swallowed exception will leave any Sentry breadcrumbs? It might be useful 🤞 if it does so we can connect the dots between seemingly unrelated issues.

@mchowning
Copy link
Contributor Author

mchowning commented May 22, 2020

can you check if the swallowed exception will leave any Sentry breadcrumbs? It might be useful 🤞 if it does so we can connect the dots between seemingly unrelated issues.

Good thought @hypest . I assumed that having Sentry capture the exception would be enough for Sentry to also use it as a breadcrumb if there was a subsequent crash. I tested this, and I was... what's the right word... oh yeah... wrong. (Relevant discussion: p1590156889259100-slack-platform9)

Updated to also send the exception message as a breadcrumb in 999672f. Verified that it is showing in subsequent crashes like this("Illegal selection index for text..."):

image

@mchowning mchowning force-pushed the issue_9832/span_index_exceptions branch from 473e373 to 5065bce Compare May 25, 2020 13:50
Copy link
Contributor

@hypest hypest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @mchowning from the code side of thing. By the way, thanks for adding tests!

I'll leave it to @geriux 's testing to sign off the testing part.

@mchowning mchowning changed the base branch from master to release/1.28.1 May 25, 2020 14:20
@mchowning
Copy link
Contributor Author

It looks like we're going to have 2 fixes in 1.28.1 (maybe 3), so I created a release/1.28.1 branch off of release/14.9 for us to consolidate those changes while we create the 1.28.1 release and retargeted this PR from master to that branch. cc: @SergioEstevao

Copy link
Contributor

@geriux geriux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All working after the latest changes! LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants