Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

mseroczynski · 2020-06-01T12:52:44Z

DownloadManager has Requirements.NETWORK_UNMETERED, have for example 10 elements queued, and then wi-fi range is lost. Then, when wi-fi range is back and phone is able to use connection ExoPlayer still thinks there is no network and keeps all the queue elements in state QUEUED with DownloadManager.notMetRequirements = 3.

I've done some microwave kitchen testing today (2,4ghz Faraday cage) and been able to reproduce this issue repeatedly.

andrewlewis · 2020-06-01T13:01:13Z

Which ExoPlayer version are you using?

mseroczynski · 2020-06-01T13:35:11Z

2.11.4

andrewlewis · 2020-06-01T13:38:48Z

@ojw28 Please take a look.

ojw28 · 2020-06-01T14:42:24Z

@mseroczynski - What is your DownloadService implementation returning from getScheduler? Do you have any idea whether your process is being killed between connectivity being lost and becoming available again, when you reproduce this issue? Thanks!

mseroczynski · 2020-06-01T18:05:12Z

I've followed example code in this matter:

  override fun getScheduler(): Scheduler? {
    return if (Util.SDK_INT >= 21) PlatformScheduler(this, JOB_ID) else null
  }

I assume process wasn't killed but will try to determine it sometime later (currently I'm busy working on workaround fix to recover from stuck download by modifying current requirements by adding 'or DEVICE_IDLE' flag, just to set it back like it was a line later - it allows me to reinstantiate RequirementsWatcher, hoping for it to work, isn't confirmed yet). It's not super easy to reproduce (happens 1 out of ~50 tries when microwaving Xiaomi Mi5 7.0 and 2 times when walking away from Wi-Fi on Motorola Nexus 6 7.1.1).

Below is state of DownloadManager.currentDownloads after onIdle call when it happened:

ojw28 · 2020-06-01T18:25:57Z

It might be worth trying to use WorkManagerScheduler instead of PlatformScheduler, just in case the underlying implementation has some extra fixes that didn't make it into the platform for Android N.

mseroczynski · 2020-06-02T09:30:26Z

FYI reinstating RequirementsWatcher did not help (my code thats using https://github.com/pwittchen/ReactiveNetwork/ could correctly detect that there is wrong state in ExoPlayer not met requirements yet still workaround fix described in last post failed to work), I'm trying out WorkSchedulerManager now.

mseroczynski · 2020-06-02T12:22:01Z

There is surely something wrong elsewhere, process is not killed and issue is reproducible on other versions of Android (happened on 10 now). (Scheduler implementation doesn't seem to play a role). I think it's a critical bug and I'm having trouble fixing it outside of ExoPlayer. Try to take 10+ devices for a 20 minute walk like me and you'll supposedly reproduce it. Also, FYI I'm triggering resumeAll on queue on every app-in-foreground event, it does nothing because ExoPlayer keeps thinking there is no NETWORK_UNMETERED.

Please let me know, if this issue will be ignored in 2.11.5 as I'll have to fork ExoPlayer to base RequirementsWatcher on verified working implementations like these: https://github.com/pwittchen/ReactiveNetwork/tree/RxJava2.x/library/src/main/java/com/github/pwittchen/reactivenetwork/library/rx2/network/observing/strategy

ojw28 · 2020-06-03T14:07:18Z

I've been able to reproduce this by walking repeatedly outside and back into WiFi connectivity (I'm not able to reproduce by simulating the same thing with a microwave; perhaps the way the signal strength drops off and comes back differently in the two cases makes a difference).

When the issue reproduces, RequirementsWatcher.NetworkCallback receives a capabilities changed event indicating that the network is validated (NET_CAPABILITY_VALIDATED). But immediately after this when we evaluate the network in Requirements.getNotMetNetworkRequirements, the platform tells us that the network is not connected, or connecting, or validated, and that it is metered.

It's unclear whether we should be listening to a different capabilities change in RequirementsWatcher.NetworkCallback, or whether there's an issue in the underlying platform where it doesn't make the change in the network status visible quickly enough. In the latter case we may need to work around the problem by adding a small delay before we evaluate the network. Further investigation is needed into these points.

As an aside, there's really no need to ping an issue three times in under an hour with no useful new information. All it does is spam a lot of people with email. We obviously can't commit to fixing it in a given release prior to understanding what the problem is and what the correct fix is, and at this point we're still figuring that out.

Issue: #7453 PiperOrigin-RevId: 314710328

ojw28 · 2020-06-04T16:44:41Z

@mseroczynski - The commits referenced above should largely mitigate this issue (one is into dev-v2 and the other is the same change cherry-picked into dev-v2-r2.11.5).

There's still something weird going on in the problematic case where the events that the platform is firing seem to be inconsistent with what we get when we then query the platform through its getters. This is described in a little more detail in my post above. Interestingly, when I tried to create a minimal example that listens to the same events and queries the same APIs, I could not make the issue occur. It's possible that the network requests the Downloader instances are making are important for triggering the problem, perhaps because if they fail at around the time the WiFi network is lost, it causes the network to be declassified as the "active" one by the underlying platform. This is just a guess, but would explain why my minimal reproduction attempt didn't reproduce the same problem. It may also imply that the working alternatives you found may also not work properly once fully integrated into DownloadService/DownloadManager workflow.

The way the commits referenced above mitigate this issue is by re-evaluating connectivity when we receive additional events that get fired every now and again and imply continued network connectivity. By doing this, even if we do hit the problematic case, it should resolve itself when one of these subsequent events gets fired and the network is classified as being active again by the underlying platform. It appears that turning the display on is sufficient for this to happen, at least on a Pixel test device. Most likely any other network activity that the device does (e.g. push notifications) will also cause things to return to their correct state, so in practice this mitigation is probably going to be good enough to resolve the issue pretty quickly in nearly all cases, and in the worst case will resolve the issue as soon as the user looks at their device. The events we're listening to and the way in which we query connectivity looks to be broadly inline with how WorkManager does things.

Marking as fixed because, in practice, the issue is probably fixed well enough. We're having some discussions with the Android core networking team to try and work out exactly what's happening, and we may come up with a "better" fix in due course, if we manage to figure it out.

ojw28 · 2020-06-05T14:47:38Z

Subsequent fix in 226583f / 46d29b2.

mseroczynski added bug needs triage labels Jun 1, 2020

andrewlewis self-assigned this Jun 1, 2020

andrewlewis added need more info and removed bug needs triage labels Jun 1, 2020

google-oss-bot added needs triage and removed need more info labels Jun 1, 2020

andrewlewis assigned ojw28 and unassigned andrewlewis Jun 1, 2020

ojw28 added bug and removed needs triage labels Jun 3, 2020

ojw28 mentioned this issue Jun 3, 2020

Stuck downloads when changing network conditions #6798

Closed

ojw28 added a commit that referenced this issue Jun 4, 2020

Fix more cases of downloads not being resumed

9b5cab0

Issue: #7453 PiperOrigin-RevId: 314710328

ojw28 added a commit that referenced this issue Jun 4, 2020

Fix more cases of downloads not being resumed

1347a22

Issue: #7453 PiperOrigin-RevId: 314710328

ojw28 closed this as completed Jun 4, 2020

Tolriq mentioned this issue Jun 5, 2020

use new API to learn about connectivity changes bumptech/glide#4222

Closed

google locked and limited conversation to collaborators Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

mseroczynski commented Jun 1, 2020

andrewlewis commented Jun 1, 2020

mseroczynski commented Jun 1, 2020

andrewlewis commented Jun 1, 2020

ojw28 commented Jun 1, 2020

mseroczynski commented Jun 1, 2020 •

edited

Loading

ojw28 commented Jun 1, 2020

mseroczynski commented Jun 2, 2020 •

edited

Loading

mseroczynski commented Jun 2, 2020 •

edited

Loading

ojw28 commented Jun 3, 2020 •

edited

Loading

ojw28 commented Jun 4, 2020 •

edited

Loading

ojw28 commented Jun 5, 2020

Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

Comments

mseroczynski commented Jun 1, 2020

andrewlewis commented Jun 1, 2020

mseroczynski commented Jun 1, 2020

andrewlewis commented Jun 1, 2020

ojw28 commented Jun 1, 2020

mseroczynski commented Jun 1, 2020 • edited Loading

ojw28 commented Jun 1, 2020

mseroczynski commented Jun 2, 2020 • edited Loading

mseroczynski commented Jun 2, 2020 • edited Loading

ojw28 commented Jun 3, 2020 • edited Loading

ojw28 commented Jun 4, 2020 • edited Loading

ojw28 commented Jun 5, 2020

mseroczynski commented Jun 1, 2020 •

edited

Loading

mseroczynski commented Jun 2, 2020 •

edited

Loading

mseroczynski commented Jun 2, 2020 •

edited

Loading

ojw28 commented Jun 3, 2020 •

edited

Loading

ojw28 commented Jun 4, 2020 •

edited

Loading