Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network calls hang after coming back online after being offline #3278

Open
arstk8 opened this issue Apr 11, 2017 · 18 comments
Open

Network calls hang after coming back online after being offline #3278

arstk8 opened this issue Apr 11, 2017 · 18 comments
Labels
android Relates to usage specifically on Android needs info More information needed from reporter
Milestone

Comments

@arstk8
Copy link

arstk8 commented Apr 11, 2017

I have an interesting issue I'm seeing with okhttp, but so far I can only reproduce it on Android emulators (default Android emulator and Genymotion)

If I make a bunch of requests to the same server (enough to saturate the connection pool) and then go offline (airplane mode) and then come back online, any subsequent requests I make to that server hang until one of the connections in the connection pool is removed after 5 minutes of inactivity.

I'm not sure if the emulator thing is a timing issue (emulators execute faster in my experience) or if it's a quirk with the networking configuration vs a real device. The fact that it happens on more than one type of emulator seems significant to me.

I have a sample app which reproduces it here: https://github.com/arstk8/TestEsriNetworkCall/

I know you're preferred way to reproduce is via MockWebserver test case, but I can't figure out a way to reproduce it that way.

@swankjesse
Copy link
Collaborator

Thanks for putting together the sample app. If you have any insight into how OkHttp could reliably detect this situation, that’d be handy.

@arstk8
Copy link
Author

arstk8 commented Apr 13, 2017

I don't, unfortunately. The only thing I can think of is some kind of reachability test, but that seems expensive.

@swankjesse swankjesse added this to the Icebox milestone Apr 15, 2017
@kenumir
Copy link

kenumir commented May 27, 2017

Basing on this example i make some test and:

  • there's no need make 40 requests, one is enough
  • when request is http/1.1 SocketTimeoutException occurs only on first request (next is ok)
  • when request is http/2 SocketTimeoutException occurs on every next request
  • when canceling request (http/2) and try make next request everything is getting work

This is my observation, I hope this helps

@ekstro
Copy link

ekstro commented May 29, 2017

Setting "keepAliveDuration" also has an impact on this.

@yschimke
Copy link
Collaborator

yschimke commented Oct 1, 2017

Had a quick look at this with OkHttp 3.9.0

A few comments:

  1. Still happening with 3.9.0
  2. The EventListener API makes it a lot easier to understand what okhttp is attempting e.g. when it reuses connections from before airplane mode etc.
  3. I can't get http/2 to work to that server - I don't think the server supports ALPN
  4. Can reproduce with other Http/2 servers, the connection is reestablished after a couple of requests [edit]

One option might be to support platform reachability services, specifically on Android, and test connections after reachability events before handing them out.

@yschimke yschimke added the enhancement Feature not a bug label Oct 3, 2017
@yschimke
Copy link
Collaborator

yschimke commented Oct 3, 2017

Tagging as enhancement - I think since this is so easily reproducible even on Android 8 with OkHttp 3.9.0, that detecting this cleanly and making it work will be relatively high impact. n.b. I couldn't get into the permanent http/2 failure state that @kenumir saw, so maybe that is fixed.

@tucomel
Copy link

tucomel commented Aug 12, 2018

not fixed to me, Im using okhttp3.10 with emulator on android api 27 and just after go back online (turn airplane mode off) at first request the the timeout error come out, but after first request everything works

@minobilis
Copy link

minobilis commented Nov 20, 2018

I had the same issue on emulator and some samsung devices.
Forcing okhttp client to use HTTP/1.1 helped me to avoid such exceptions.

new OkHttpClient.Builder()
.protocols(Arrays.asList(Protocol.HTTP_1_1))
.build();

It might be due to how HTTP/2 connections established and maintained at OS level on these devices.

@heltonandreazza
Copy link

heltonandreazza commented Aug 26, 2019

Hey guys,

I thought this was supposed to be fixed here #3261, but I ran @arstk8's sample (https://github.com/arstk8/TestEsriNetworkCall/) with okhttp 3.12.3 and sdk 26, the issue seems to persist.

@minobilis I've also tried your workaround by changing on MainActivity.java:34 new OkHttpClient.Builder().build() to new OkHttpClient.Builder().protocols(Arrays.asList(Protocol.HTTP_1_1)).build() it doesn't fix the problem either.

Apparently the issue presents itself only on Android emulators.

Is there any fix on sight that could be done?
Is there any difference between making HTTP requests on emulators and physical devices?

Thanks in advance!!

@yschimke
Copy link
Collaborator

yschimke commented Apr 11, 2020

Probably worth retesting after #5920 which is in 4.5, I'm hopeful that fixes a ton of connectivity issues with dodgy Android networking.

@yschimke yschimke added android Relates to usage specifically on Android needs info More information needed from reporter and removed enhancement Feature not a bug labels Apr 11, 2020
@lyind
Copy link

lyind commented Feb 15, 2021

I think I am still seeing this with OkHttp 4.9.1 on Android 9/10. It is critical for me.

This is a thread dump of the connection thread in question:

OkHttp bla.example.com ( 571 )
        com.android.org.conscrypt.NativeCrypto.SSL_read(Native Method)
        com.android.org.conscrypt.NativeSsl.read(NativeSsl.java:407)
        com.android.org.conscrypt.ConscryptFileDescriptorSocket$SSLInputStream.read(ConscryptFileDescriptorSocket.java:579)
        okio.InputStreamSource.read(JvmOkio.kt:90)  <-- Nothing to read here, socket probably gone, HTTP/2 keep-alive anyone?
        okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:129)
        okio.RealBufferedSource.request(RealBufferedSource.kt:206)
        okio.RealBufferedSource.require(RealBufferedSource.kt:199)
        okhttp3.internal.http2.Http2Reader.nextFrame(Http2Reader.kt:89)
        okhttp3.internal.http2.Http2Connection$ReaderRunnable.invoke(Http2Connection.kt:618)
        okhttp3.internal.http2.Http2Connection$ReaderRunnable.invoke(Http2Connection.kt:609)
        okhttp3.internal.concurrent.TaskQueue$execute$1.runOnce(TaskQueue.kt:98)
        okhttp3.internal.concurrent.TaskRunner.runTask(TaskRunner.kt:116)
        okhttp3.internal.concurrent.TaskRunner.access$runTask(TaskRunner.kt:42)
        okhttp3.internal.concurrent.TaskRunner$runnable$1.run(TaskRunner.kt:65)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
        java.lang.Thread.run(Thread.java:764)

Is that the same issue or a different one? Now trying to limit the protocol used to HTTP/1.1.

I have the client configured like this:

        this.client = builder
                .addInterceptor(new DefaultHeaderInterceptor(userAgent))
                .retryOnConnectionFailure(true)
                .pingInterval(8, TimeUnit.SECONDS)
                .connectTimeout(9, TimeUnit.SECONDS)
                .readTimeout(7, TimeUnit.SECONDS)
                .writeTimeout(7, TimeUnit.SECONDS)
                .connectionPool(new ConnectionPool(1, 25, TimeUnit.SECONDS))
                ...
                .build();

@lyind
Copy link

lyind commented Mar 17, 2021

Limiting Protocol to HTTP/1.1 did not help much. The exact same issues still occurs.

This bug is especially easy to notice when restricting the ConnectionPool to 1 idle connection (as then all HTTP requests will hang when the issue occurs).

Ideas:
At least for HTTP2 it should be possible to walk the ConnectionPool's IDLE connections and check if there was any Pong received recently (if timeout enabled) and forcefully close() or interrupt() the connection/thread. That should fix the issue. For HTTP/1.1 maybe one could use some "last response received" timestamp together with the connection timeout?

@korilin
Copy link

korilin commented Jun 26, 2023

When I use EventListener to collect network time statistics, I find some connection timeout longer than OkHttpClient setting.

callStart(302619159)->proxySelectStart(302619209)->withRequestStart(302619210)->proxySelectEnd(302619210)->dnsStart(302619210)->proxySelectStart(302619221)->withRequestStart(302619221)->proxySelectEnd(302619222)->dnsStart(302619222)->dnsEnd(302619234)->connectStart(302619237)->connectFailed(302639262)->connectStart(302639264)->connectFailed(302647219)->proxySelectStart(302647264)->withRequestStart(302647264)->proxySelectEnd(302647265)->dnsStart(302647265)->proxySelectStart(302647275)->withRequestStart(302647275)->proxySelectEnd(302647275)->dnsStart(302647275)->dnsEnd(302647284)->connectStart(302647285)->connectFailed(304721621)->canceled(304721622)->withRequestEnd(304721622)->callFailed(304721639)->withRequestEnd(304721639)

The problem fragment is connectStart(302647285)->connectFailed(304721621)->canceled(304721622). It took 2074336 ms. The data of this abnormal time is more than 3% of the reported data in our project.

OKHttpClient Setting:

  • connectTimeout = 15s
  • callTimeout = 60s

Other Info:

  • OkHttp version: 4.9.1
  • Time record method: SystemClock.elapsedRealtime()
  • We use http1.1

Is this the same problem mentioned in this issue?

@lyind
Copy link

lyind commented Jun 26, 2023

Is this the same problem mentioned in this issue?

It sounds at least somewhat related.

It's been some time, but IIRC the problem is that in certain situations the HTTP library really can't do much as it doesn't receive a timely response from the OS. Some active watcher thread/timer and more complicated logic would be required.

@yschimke
Copy link
Collaborator

The other approach which we haven't wanted to get into, is to subscribe to android network changes.

#7470

@ln-12
Copy link

ln-12 commented May 23, 2024

We still have this issue with version 4.12.0 while using it with connect. What I noticed is that is occurs only in streaming requests. However, non-streaming requests seem to be blocked until the streaming call times out. Also, when I toggle the network connection, only every second streaming request fails. When I do it again, it then works. Doing it another time, it fails again.

I tried to listen to the network state and call dispatcher.cancelAll() on my custom dispatcher when the connection is lost but this also did not work. Are there any other ideas for a workaround?

The exception I am getting after the call timeout is:

Unhandled exception
	okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
		at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.kt:148)
		at okhttp3.internal.http2.Http2ExchangeCodec.readResponseHeaders(Http2ExchangeCodec.kt:97)
		at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
		at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at com.connectrpc.okhttp.ConnectOkHttpClient$applyNetworkInterceptor$1.intercept(ConnectOkHttpClient.kt:61)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at com.example.network.ChainExceptionInterceptor.intercept(ChainExceptionInterceptor.kt:15)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.kt:154)
		at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
		at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
		at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
		at java.lang.Thread.run(Thread.java:1012)

@ln-12
Copy link

ln-12 commented May 23, 2024

I think that I found out what is actually happening on emulators (but not only on them)!

After trying out many things, I added a delay between reconnecting to the network and sending the first request. I ended up at ~2 seconds. Then I noticed that this is the time in which the emulator usually connects to the wifi (!) network after airplane mode. With the delay, it worked perfectly fine. Now let's have a look at what is happening without the delay (make sure cellular network and wifi are enabled, this is the default on emulators):

  • Turn airplane mode on
  • Wait a few seconds
  • Turn airplane mode off and look closely at the connection indicators in the status bar
  • First, the airplane icon disappears
  • Second, the 3G icon is shown
  • Third, the 3G icon is replaced by the wifi icon

I assume that the network request was (sometimes) fired exactly in the timeframe between 3G being connected and wifi being connected. After the call was fired to the 3G network connection it just timed out as the emulator killed that interface (?) when switching over to wifi. The response from the server was sent but never reached the client through the 3G connection as is might only be ready to receive data over wifi.

This never happened on my Pixel phone during testing as wifi is still turned on in airplane mode on these devices. Samsung for example behaves the same as the emulator and turns on cellular first before it switched to wifi to this issue is also reproducible on these devices.

@swankjesse @yschimke I hope this helps to understand the actual issue here. Is there something that you could change inside OkHttp in this case?

@sherbaev
Copy link

I am facing the same issue with Android 10 devices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android Relates to usage specifically on Android needs info More information needed from reporter
Projects
None yet
Development

No branches or pull requests