Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2024-07-24] [$250] ping and ReconnectApp are called back to back on a bad wifi network #44269

Closed
1 of 6 tasks
m-natarajan opened this issue Jun 24, 2024 · 22 comments
Assignees
Labels
AutoAssignerNewDotQuality Used to assign quality issues to engineers Awaiting Payment Auto-added when associated PR is deployed to production Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor

Comments

@m-natarajan
Copy link

m-natarajan commented Jun 24, 2024

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Version Number:
Reproducible in staging?: needs reproduction
Reproducible in production?: needs reproduction
If this was caught during regression testing, add the test name, ID and link from TestRail:
Email or phone of affected tester (no customers):
Logs: https://stackoverflow.com/c/expensify/questions/4856
Expensify/Expensify Issue URL:
Issue reported by: @quinthar
Slack conversation: https://expensify.slack.com/archives/C05LX9D6E07/p1719023935665339

Action Performed:

  1. Be on a bad WiFi network
  2. Open the app

Expected Result:

Shouldn't call ping and reconnectApp several times

Actual Result:

on a really bad wifi network, where it concludes it's online but for some reason can't contact the server, it just hammers Ping and ReconnectApp back to back, filling the network queue with tons of parallel unfinished commands.

Workaround:

unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • Android: Native
  • Android: mWeb Chrome
  • iOS: Native
  • iOS: mWeb Safari
  • MacOS: Chrome / Safari
  • MacOS: Desktop

Screenshots/Videos

image (16)

bugd.txt

image (17)

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~01f591c76409c7f4d0
  • Upwork Job ID: 1805714893198480172
  • Last Price Increase: 2024-06-25
Issue OwnerCurrent Issue Owner: @kadiealexander
@m-natarajan m-natarajan added Daily KSv2 Bug Something is broken. Auto assigns a BugZero manager. AutoAssignerNewDotQuality Used to assign quality issues to engineers labels Jun 24, 2024
Copy link

melvin-bot bot commented Jun 24, 2024

Triggered auto assignment to @kadiealexander (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.

Copy link

melvin-bot bot commented Jun 24, 2024

Triggered auto assignment to @srikarparsi (AutoAssignerNewDotQuality)

@melvin-bot melvin-bot bot added Weekly KSv2 and removed Weekly KSv2 labels Jun 24, 2024
@srikarparsi
Copy link
Contributor

Making this external to see if there's a reliable way to reproduce, the root cause, and proposals to fix.

@srikarparsi srikarparsi added the External Added to denote the issue can be worked on by a contributor label Jun 25, 2024
@melvin-bot melvin-bot bot changed the title ping and ReconnectApp are called back to back on a bad wifi network [$250] ping and ReconnectApp are called back to back on a bad wifi network Jun 25, 2024
Copy link

melvin-bot bot commented Jun 25, 2024

Job added to Upwork: https://www.upwork.com/jobs/~01f591c76409c7f4d0

@melvin-bot melvin-bot bot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Jun 25, 2024
Copy link

melvin-bot bot commented Jun 25, 2024

Triggered auto assignment to Contributor-plus team member for initial proposal review - @rushatgabhane (External)

@quinthar
Copy link

This feels extremely easy to reproduce: just close your laptop lid for a few minutes, and reopen.

@srikarparsi
Copy link
Contributor

srikarparsi commented Jun 27, 2024

I just tried this (closing laptop and reopen) and this was my network tab:

image

2 pings were called which isn't as many as your screenshot here but it's still back to back pings which shouldn't happen.

We already have a check to make sure that we don't send a Ping command when one is pending and this seems to be working because I don't see [NetworkConnection] recheck NetInfo in the console.

So I have two theories:

  1. The component is being re-rendered and the state of isOffline or hasPendingNetworkStatus is getting reset? So NetInfo is calling Ping again. I don't think this is the cause but I think these variables should be wrapped in a useRef since they remain for the lifetime of the component?
  2. We don't set or check for hasPendingNetworkCheck inside of NetInfo so there could be duplicate calls being made there. I think this one's more likely where a network check fails so isOffline is being set to false. Then NetInfo checks again but so does recheckNetworkConnection since hasPendingNetworkCheck isn't set by NetInfo.

I'm still looking into these but they are my initial thoughts based on the code. cc @roryabraham and @adhorodyski if you have any additional thoughts since you guys worked on these PRs to introduce NetInfo and periodic checks.

@adhorodyski
Copy link
Contributor

@srikarparsi you're correct about the periodic check.

The call itself feels solid, as it should bail out if only the function early return kicks in (which from the logs looks fine, no subsequent recheck NetInfo).

If hasPendingNetworkCheck is reliable, this periodic check should cause us no harm (but that's an assumption).

@adhorodyski
Copy link
Contributor

On higher-level problem I see with this implementation is that's it's really, really imperative so it's easy to make a mistake and cause such a behaviour over time. Declarative APIs work better especially with React codebases and there are open source libraries to solve just that.

@srikarparsi
Copy link
Contributor

I created this PR to check if a network check is pending before starting a new one. Still need to test but I think this would be a quick way to stop repetitive calls. @adhorodyski if you could take a look at it as well that would be appreciated.

I also agree that our current implementation might not be the best way of doing it. NetInfo has parameters that we seem to be implementing in a custom way. For example, NetInfo already has reachabilityShortTimeout and reachabilityLongTimeout which are defaulted to 5s and 60s. So when the internet is not detected, it should be rechecking for connection every 5s. And when it is detected, it should be rechecking every 60s. But we had to re-add the 60s check in this PR so I think there might just be something wrong with our current implementation which we need to fix.

@OlimpiaZurek
Copy link
Contributor

I wasn’t able to reproduce this issue by closing and opening the laptop lid. Every time I tried, the Ping and ReconnectApp methods were only called once.
However, based on the code and description provided, it appears that the problem is related to the way the app handles network checks and reconnections. When an app determines that it is online but cannot connect to the server, it initiates multiple Ping and ReconnectApp requests simultaneously. This leads to high amounts of network traffic and unfinished commands. Reconnect logic does not control or limit the number of reconnect attempts. This can be problematic in environments with poor network conditions, leading to a constant flood of network activity.

Given this, I think adding this additional check makes sense as it ensures that a new network check only starts if there isn't already one in progress.

@melvin-bot melvin-bot bot added Overdue Reviewing Has a PR in review and removed Daily KSv2 Overdue labels Jun 28, 2024
@melvin-bot melvin-bot bot removed the Help Wanted Apply this label when an issue is open to proposals by contributors label Jul 1, 2024
@OlimpiaZurek
Copy link
Contributor

The change from this PR seems to cause regression.

Overall, I agree with Adam that we should adopt a more declarative approach to handling network connections. Currently, we are using an imperative approach, which seems error-prone. For example, the recheckNetworkConnection function is used both as middleware and in an interval, leading to risk of potential errors and multiple calls.

NetInfo provides built-in functions for re-checking the connection, such as reachabilityShortTimeout, which runs every 5 seconds if the Internet is not detected, and reachabilityLongTimeout, which runs every 60 seconds when the Internet is connected. These built-in mechanisms are designed to handle network rechecks reliably.

Given the complexity of our custom implementation, it's challenging to determine if the root cause of this issue is due to NetInfo or our custom logic. Therefore, maybe we should consider removing the custom recheckNetworkConnection solution and relying solely on NetInfo's built-in functionality? This approach simplifies our codebase and leverages the library's tested and optimized features.

To ensure this change meets our needs, I’d suggest to double-check that it provides the required functionality.

Given the difficulty in reproducing the issue, I believe we should conduct thorough testing to ensure that NetInfo's built-in mechanisms handle all necessary scenarios and edge cases.

To achieve this, we need to confirm which specific functionalities we want to test and verify.

Here are some examples:

  • Detection of online and offline states.
  • Handling intermittent connectivity and redundant checks.
  • Managing and processing sequential queues.

@srikarparsi
Copy link
Contributor

Therefore, maybe we should consider removing the custom recheckNetworkConnection solution and relying solely on NetInfo's built-in functionality?

I agree with this. And if it doesn't work and we verify that it's not a problem with our implementation, then I think it's better to make the fix upstream in NetInfo.

I think this should be the first step so I'll close this PR. @OlimpiaZurek let me know if I can do anything to help you with this.

@OlimpiaZurek
Copy link
Contributor

I prepared a PR to remove hasPendingNetworkCheck flag.

I also prepared PR with the fix to the NetInfo library.

@muttmuure
Copy link
Contributor

Thanks for the update!

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Jul 15, 2024
@melvin-bot melvin-bot bot changed the title [$250] ping and ReconnectApp are called back to back on a bad wifi network [HOLD for payment 2024-07-24] [$250] ping and ReconnectApp are called back to back on a bad wifi network Jul 17, 2024
@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label Jul 17, 2024
Copy link

melvin-bot bot commented Jul 17, 2024

Reviewing label has been removed, please complete the "BugZero Checklist".

Copy link

melvin-bot bot commented Jul 17, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 9.0.7-8 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-07-24. 🎊

For reference, here are some details about the assignees on this issue:

Copy link

melvin-bot bot commented Jul 17, 2024

BugZero Checklist: The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed:

@rushatgabhane
Copy link
Member

  1. The PR that introduced the bug has been identified. Link to the PR: N.A. This was always there

  2. The offending PR has been commented on, pointing out the bug it caused and why, so the author and reviewers can learn from the mistake. Link to comment: N.A.

  3. A discussion in #expensify-bugs has been started about whether any other steps should be taken (e.g. updating the PR review checklist) in order to catch this type of bug sooner. Link to discussion: N.A.

  4. Determine if we should create a regression test for this bug. Yes!

  5. If we decide to create a regression test for the bug, please propose the regression test steps to ensure the same bug will not reach production again

             1. Go offline
             2. Go to network tab in browser
             3. Verify that `openApp` isn't repeatedly called
    

@melvin-bot melvin-bot bot added Daily KSv2 and removed Weekly KSv2 labels Jul 23, 2024
@kadiealexander
Copy link
Contributor

kadiealexander commented Jul 24, 2024

Payouts due:

@muttmuure
Copy link
Contributor

Woohoo! Great work

@JmillsExpensify
Copy link

$250 approved for @rushatgabhane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoAssignerNewDotQuality Used to assign quality issues to engineers Awaiting Payment Auto-added when associated PR is deployed to production Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor
Projects
Development

No branches or pull requests

9 participants