-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NoQA] Fix reauthentication #52727
[NoQA] Fix reauthentication #52727
Conversation
I see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this quick work, please let me know when this one is ready for review
@allgandalf it seems the |
wow, that's awesome, I also noticed that your branch is 1000+ commits behind main, it's better we merge main, can you do that please |
@neil-marcellini Could you please share how to reproduce/test this reauth issue? I tried to reproduce this locally by having a bash script running which would periodically turn off and on my Mac's network. While doing that the local proxy server (npm run web) started to crash. Later I tried to do the same in staging env but without any luck, I didn't get logged out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress so far. Let's make sure this is rock solid before we ship it. I'll start doing some manual testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's please examine the other re-authentication test in this file, and make sure it's failing before any fixes from this PR. For example, I don't understand why we mock it to expect another call after failed re-authentication using the old authToken. It seems to me re-auth should fail to fetch, then get retried and succeed.
Lines 102 to 110 in c9cb455
// Fail the call to re-authenticate | |
.mockImplementationOnce(actualXhr) | |
// The next call should still be using the old authToken | |
.mockImplementationOnce(() => | |
Promise.resolve({ | |
jsonCode: CONST.JSON_CODE.NOT_AUTHENTICATED, | |
}), | |
) |
It's also odd that we mock several responses but then only assert about two.
Lines 145 to 146 in c9cb455
expect(callsToOpenPublicProfilePage.length).toBe(1); | |
expect(callsToAuthenticate.length).toBe(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neil-marcellini I didn't touch this test but this one is quite interesting. If we run this on main
it will succeed but if we look at the logs it actually signs out the user, giving us a false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. The test doesn't really make sense. That's why I modified it in my PR. It still wasn't working quite right for me, but maybe with your latest changes it will.
Let's update the test, verify it's passing with the fix, then cherry pick the test to main and make sure it's failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neil-marcellini I can confirm that both added tests are failing on main
:
Sure, yes it's a bit tricky. I explained in this comment how I was able to manually reproduce the issue. I used a backend change to mark the auth token as expired when I add a comment, but you might be able to modify the app's network logic to mock that on the frontend. I modified app to fail to fetch once when re-authenticating. I'll see if I can get that working and write manual test steps into this PR description. |
You can merge this PR Manual test re-auth failing to fetch into this one with the fix locally, and verify if it's working. That manual test PR currently fails on main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your latest changes look good. Still a few more things to update as mentioned in Slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great, thanks! I'll finish manually testing and then we should be good to go. Lmk if you want to address the non-blocking comments now, or in a follow up PR.
Yes will do that in this PR |
@ Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button] |
One more thing, maybe we should have a special max retry count for re-authentication? We reach the 10 retry limit pretty quickly, and since it's quite important that we don't sign people out, I think it would be good if the re-auth max throttle retry time was about a couple minutes. Also, it would be smart to manually test that we don't sent reauth retries while offline, only once the app is back online, otherwise it's likely they will keep failing to fetch and hit the max retry count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all of Neil's NAB
comments, and I actually don't think they are NABs. I would like to have them be done. If not in this PR, then in a separate PR that comes right after this.
I also think we should have the separate retry limit as Neil suggests.
@tgolen I think this is out of scope for this PR, but I'm happy to tackle this in the next PR |
@zirgulis when i merged https://github.com/Expensify/App/pull/52281/files and tested it as mentioned in testing steps, I do get logged out: Screen.Recording.2024-11-22.at.1.36.17.AM.movI even got logged out when i sent a message: Screen.Recording.2024-11-22.at.1.38.06.AM.mov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your latest changes, it looks really solid to me now. 10 retries ends up with a pretty long delay, and I also tested it while offline and verified that the retries are paused which is great because I think it would be pretty unlikely for this to actually fail now. I updated the PR description for you and I think this is good to go.
@allgandalf it looks like maybe you didn't pull the latest changes from remote before testing, because the log lines are different now.
Trying again now |
Reviewer Checklist
Screenshots/VideosMacOS: Chrome / SafariOnline: Screen.Recording.2024-11-22.at.7.03.04.PM.movOffline: Screen.Recording.2024-11-22.at.7.23.25.PM.movMacOS: DesktopOnline:Screen.Recording.2024-11-22.at.7.26.38.PM.movOffline: Screen.Recording.2024-11-22.at.7.38.52.PM.movAndroid: NativeScreen.Recording.2024-11-22.at.7.43.50.PM.movAndroid: mWeb ChromeScreen.Recording.2024-11-22.at.7.47.48.PM.moviOS: NativeScreen.Recording.2024-11-22.at.7.50.39.PM.moviOS: mWeb SafariScreen.Recording.2024-11-22.at.7.52.00.PM.mov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests well in both online and offline mode:
-
Verified that the re-authentication count stops at 10 and it slows down till it reached 10.
-
Verified that in offline mode, there is no call made, which means that the calls are paused until the user is online
-
Verified that we are not logged out in both the cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go! Thanks guys
✋ This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release. |
🚀 Deployed to staging by https://github.com/neil-marcellini in version: 9.0.66-0 🚀
|
🚀 Deployed to production by https://github.com/mountiny in version: 9.0.66-8 🚀
|
Explanation of Change
We were seeing a problem where the user's Auth token would expire, which triggers an Authenticate request to re-authenticate the user. However, if the user is on a bad connection it's possible this request fails to fetch. In that case we were catching the error and signing the user out, which means they could lose data in queued write requests, and it's annoying to have to sign in again.
To fix this, we add a retry mechanism with exponential backoff, using the same throttling mechanism we already have for the SequentialQueue. The throttle is now a class so we can have separate instances. If re-auth is still failing after the maximum number of retries we'll log them out, but that should be extremely rare. We verified that if the user is offline when re-authenticating that the retries are paused until they come back online, so failed to fetch errors are very unlikely to cause a log out now.
Finally, we fixed the re-auth tests and added one for this exact flow, verifying that they fail on main and only pass with the fix. Previously, the tests were invalid, false positives.
Fixed Issues
$ #51707
PROPOSAL: N/A
Tests
Ndebug failing to fetch in Authenticate
and verify that it's logged about 10 times and that Authenticate requests are made more and more rarelyOffline tests
Ndebug failing to fetch in Authenticate
and verify that it's not logged repeatedly, signaling that the retry is paused while offlineNdebug failing to fetch in Authenticate
and verify that it's logged about 10 times and that Authenticate requests are made more and more rarelyQA Steps
No QA, this is a very specific situation that is very hard to reproduce.
PR Author Checklist
### Fixed Issues
section aboveTests
sectionOffline steps
sectionQA steps
sectiontoggleReport
and notonIconClick
)src/languages/*
files and using the translation methodSTYLE.md
) were followedAvatar
, I verified the components usingAvatar
are working as expected)StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
)Avatar
is modified, I verified thatAvatar
is working as expected in all cases)Design
label and/or tagged@Expensify/design
so the design team can review the changes.ScrollView
component to make it scrollable when more elements are added to the page.main
branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTest
steps.Screenshots/Videos
Tests were only run on web since changes are platform independent.
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
Online
OnlineReauth2024-11-21_18-29-01.mp4
Offline
OfflineReAuth2024-11-21_18-33-46.mp4
MacOS: Desktop