Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race conditions by collecting on main thread #12912

Merged
merged 8 commits into from
Nov 22, 2024

Conversation

malinajirka
Copy link
Contributor

@malinajirka malinajirka commented Nov 12, 2024

Closes: #12565

I haven't tested all the flows yet, I'm looking for initial thoughts about this solution @samiuelson @kidinov @AnirudhBhat. Do you think this change is safe for a beta fix or would be better to include it in the regular cycle?

Description

The app currently incorrectly handles scenario when a required card reader update fails due to low battery during the connection flow.

This issue is caused by a race condition between the Connect VM <-> SW Update VM <-> Stripe SDK updates.

The connect VM emits navigation event to move to the SW Update VM when the Stripe SDK starts na update. However, the app also asynchronously listens to Card Reader connection status here. Before the app gets a chance to navigate to the SW Update Dialog/VM, the Stripe SDK emits NotConnected event which resets the UpdateStatus to Unknown => when the SW Update Dialog/VM finally loads and starts listening to updates, the only status update it receives is Unknown since the software update is no longer in progress - this results in an empty/white dialog being displayed (we don't display any UI for unknown state).

The solution I propose is simplify the code and remove the race condition by listening to Card Reader updates on the main thread - this method doesn't do any heavy operations and reasoning about events that are performed in the order that they were emitted is much easier.

Unfortunately, this solution doesn't work in 100% of cases. The navigation component is asynchronous and sometimes the status changes to Unknown even with the above "fix". I implemented a workaround in the follow up commit that dismisses the empty/white dialog in such case and adds a flag (uiLoaded) to the tracking event the app has been recording to indicate whether the dialog was dismissed before the UI was loaded. This avoids the UI glitch but users won't learn about the reason of the failure - we plan to keep an eye on the tracking event to understand how common is such scenario in production. In the end, we decided to add an artificial delay that propagates the state to the UI after 500ms, "ensuring" the navigation component gets a chance to navigate to the Update flow in time.

Steps to reproduce

  1. Enable Simulated reader in the dev settings.
  2. Set the Update behavior to LOW_BATTERY_ERROR.
  3. Attempt to collect a payment for on order.
  4. Notice an empty dialog appears - not always but vast majority of times.

Testing information

Repeat the steps to reproduce, and notice the user is informed their card reader needs to be charged.

Also test connection to both HW readers as well as TTP reader to verify there isn't any regression. Last but not least, test if the successful SW update flow works as expected.

The tests that have been performed

I tested the connect/payment/update flow on the simulated readers, including TTP. I also tested connection/payment flows on a HW reader.

Images/gif

Before After
Screenshot 2024-11-13 at 15 00 34 Screenshot_20241113-145307
  • I have considered if this change warrants release notes and have added them to RELEASE-NOTES.txt if necessary. Use the "[Internal]" label for non-user-facing changes.

Reviewer (or Author, in the case of optional code reviews):

Please make sure these conditions are met before approving the PR, or request changes if the PR needs improvement:

  • The PR is small and has a clear, single focus, or a valid explanation is provided in the description. If needed, please request to split it into smaller PRs.
  • Ensure Adequate Unit Test Coverage: The changes are reasonably covered by unit tests or an explanation is provided in the PR description.
  • Manual Testing: The author listed all the tests they ran, including smoke tests when needed (e.g., for refactorings). The reviewer confirmed that the PR works as expected on big (tablet) and small (phone) in case of UI changes, and no regressions are added.

@malinajirka malinajirka added the feature: mobile payments Related to mobile payments / card present payments / Woo Payments. label Nov 12, 2024
@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Nov 12, 2024

📲 You can test the changes from this Pull Request in WooCommerce-Wear Android by scanning the QR code below to install the corresponding build.
App Name WooCommerce-Wear Android
Platform⌚️ Wear OS
FlavorJalapeno
Build TypeDebug
Commit99fe759
Direct Downloadwoocommerce-wear-prototype-build-pr12912-99fe759.apk

@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Nov 12, 2024

📲 You can test the changes from this Pull Request in WooCommerce Android by scanning the QR code below to install the corresponding build.

App Name WooCommerce Android
Platform📱 Mobile
FlavorJalapeno
Build TypeDebug
Commit99fe759
Direct Downloadwoocommerce-prototype-build-pr12912-99fe759.apk

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.73%. Comparing base (1acd3ef) to head (99fe759).
Report is 9 commits behind head on release/21.2.

Additional details and impacted files
@@               Coverage Diff               @@
##             release/21.2   #12912   +/-   ##
===============================================
  Coverage           39.73%   39.73%           
  Complexity           5983     5983           
===============================================
  Files                1267     1267           
  Lines               73227    73228    +1     
  Branches            10042    10042           
===============================================
+ Hits                29094    29096    +2     
+ Misses              41567    41566    -1     
  Partials             2566     2566           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@kidinov
Copy link
Contributor

kidinov commented Nov 13, 2024

@malinajirka I still can reproduce this issue. Rarely but it still there. Here are the logs

2024-11-13 10:31:10.320 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Software update status: com.woocommerce.android.cardreader.connection.event.SoftwareUpdateStatus$InstallationStarted@a1e1365, thread: main
2024-11-13 10:31:10.326 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Software update status: Failed(errorType=BatteryLow(currentBatteryLevel=0.15), message=Update failed due to low battery), thread: main
2024-11-13 10:31:10.328 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Connection status in the VM: NotConnected(errorMessage=Update failed due to low battery), thread: main
2024-11-13 10:31:10.340 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Connection status: NotConnected(errorMessage=Update failed due to low battery) thread: main
2024-11-13 10:31:10.341 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Software update status: com.woocommerce.android.cardreader.connection.event.SoftwareUpdateStatus$Unknown@a88c7ec, thread: main

Logs.patch
Where the logs were added ^

My understanding is that as navigation itself is async, there are still cases when updated Failed is still delivered to the ConnectionVM, not CardReaderUpdateViewModel.

Not sure what will be a good solution here; Maybe to unsubscribe from softwareUpdateStatus after ShowUpdateInProgress triggered?

@malinajirka
Copy link
Contributor Author

malinajirka commented Nov 13, 2024

I still can reproduce this issue. Rear but it still there. Here is the logs

Nice catch! The asynchronous aspect of the navigation component keeps biting us 😢.

Not sure what will be a good solution here; Maybe to unsubscribe from softwareUpdateStatus after ShowUpdateInProgress triggered?

Hmm, this won't help I believe. We change the status to Unknown within the CardReaderModule not the VM.

I'll keep thinking about a proper solution. In the meantime, perhaps, we could merge this PR + dismiss the dialog if the only state is Unknown - to avoid this broken looking behavior => and add tracking to learn about how common the scenario is. Wdyt @samiuelson @kidinov ?

An ugly workaround could be to keep the last two states in memory - if the last one is unknown, check the one before that. Having said that, I don't like that.

@kidinov
Copy link
Contributor

kidinov commented Nov 13, 2024

@malinajirka

Hmm, this won't help I believe. We change the status to Unknown within the CardReaderModule not the VM.

My understanding is that it doesn't matter that we change it unless we change it before the failed status is received by CardReaderUpdateViewModel. Having said that, race conditions are possible with this approach, too, as we don't have any guarantee what comes first

I'll keep thinking about a proper solution. In the meantime, perhaps, we could merge this PR + dismiss the dialog if the only state is Unknown - to avoid this broken looking behavior => and add tracking to learn about how common the scenario is. Wdyt @samiuelson @kidinov ?

If you believe that finding a proper solution now requires more effort than adding tracking and then analyzing the results, then it makes sense.

@malinajirka malinajirka marked this pull request as ready for review November 13, 2024 14:01
@malinajirka malinajirka added this to the 21.2 milestone Nov 13, 2024
@malinajirka
Copy link
Contributor Author

malinajirka commented Nov 13, 2024

My understanding is that it doesn't matter that we change it unless we change it before the failed status is received by CardReaderUpdateViewModel. Having said that, race conditions are possible with this approach, too, as we don't have any guarantee what comes first

That is correct, but unsubscribing from softwareUpdateStatus doesn't impact the order or speed of this at all so AFAICT such change won't make any difference. Am I missing something?

If you believe that finding a proper solution now requires more effort than adding tracking and then analyzing the results, then it makes sense.

Considering, the swUpdate flow is triggered automatically by Stripe as part of the Connection flow, I can think of only two proper solutions:

  1. Integrate SW Update flow as part of the connection flow even on the VM+UI layer => so we won't need to navigate anywhere.
  2. Implement a workaround - eg. buffer previous states or delay reset in the card reader module.

The first one is quite time consuming and the second one feels like an ugly workaround. So I went with the option of dismissing the dialog and tracking the state - I'll dive into the data after a couple months and based on the numbers, we can decide if we want to implement a proper solution (unless we come up with a better solution before that).

@kidinov
Copy link
Contributor

kidinov commented Nov 13, 2024

That is correct, but unsubscribing from softwareUpdateStatus doesn't impact the order or speed of this at all so AFAICT such change won't make any difference. Am I missing something?

If we unsubscribe from it before we consume the "failed" event, then unless it's replaced by producers with "unknown" before "failed" event consumed in the Update VM, then it should work, IIC

The first one is quite time consuming and the second one feels like an ugly workaround. So I went with the option of dismissing the dialog and tracking the state - I'll dive into the data after a couple months and based on the numbers, we can decide if we want to implement a proper solution (unless we come up with a better solution before that).

👍

Copy link
Collaborator

@samiuelson samiuelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@malinajirka
Copy link
Contributor Author

Thanks for the review @samiuelson !

After follow up conversation on Slack we decided to add a delay to the state reset instead of just dismissing the dialog. The benefit of using the delay is that all users will understand why the update failed (due to low battery). It's a hacky workaround, but the value/effort ration is good.

@kidinov Since you were able to reproduce the issue after the first "fix" (switch to the main dispatcher). Would you please mind reviewing the current state of the PR?

@wpmobilebot wpmobilebot modified the milestones: 21.2, 21.3 Nov 22, 2024
@wpmobilebot
Copy link
Collaborator

Version 21.2 has now entered code-freeze, so the milestone of this PR has been updated to 21.3.

@kidinov
Copy link
Contributor

kidinov commented Nov 22, 2024

@malinajirka I missed the comment 😮‍💨

I'll retest it soon and update release notes too

@malinajirka
Copy link
Contributor Author

@kidinov Sorry, my bad, I should have followed up - it slipped my mind that this hasn't been merged yet.

@kidinov kidinov changed the base branch from trunk to release/21.2 November 22, 2024 13:57
@dangermattic
Copy link
Collaborator

1 Message
📖 This PR contains changes to RELEASE-NOTES.txt.
Note that these changes won't affect the final version of the release notes as this version is in code freeze.
Please, get in touch with a release manager if you want to update the final release notes.

Generated by 🚫 Danger

Copy link
Contributor

@kidinov kidinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retested that on a real device and emulator - works fine. Merging to the release branch as a new beta will be cut off anyway

@kidinov kidinov merged commit 10c81f8 into release/21.2 Nov 22, 2024
14 of 15 checks passed
@kidinov kidinov deleted the issue/12565-fix-sw-failed-due-to-low-battery branch November 22, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: mobile payments Related to mobile payments / card present payments / Woo Payments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Woo POS] We don’t handle the low battery event during IPP flow properly
6 participants