-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.net.SocketTimeoutException from HTTP/2 connection leaves dead okhttp clients in pool #3146
Comments
FYI, we found a workaround...set the connectionPool in the builder so it uses a new connection pool w/ a size of zero and also turn off HTTP/2 support by setting a new protocolList in the builder with only HTTP/1.1 support. |
You’re using 3.6.0? |
yep...3.6.0 unfortunately. Thought about rolling back to pre-http/2 support but that would mean 2.2 which is too far back because of all the okhttp3 dependencies :-( |
Oh that's terrible. We've had problems with similar failures before but I thought we'd fixed ’em all. If you can make a test case that'd be handy, otherwise I'll try to look soon. In the interim you can disable HTTP/2 with the protocols list in the OkHttpClient.Builder. |
Correct me if I'm wrong, but I think part of this is working as expected. HTTP/2 connections can carry N outstanding requests. If one of those requests times out and the HTTP/2 connection is closed, then the other N - 1 requests are also lost. I think the intent is that for HTTP/2 connections, a timeout does not necessarily mean the connection is bad. Is it surprising to 'bring the network down' and not receive any sort of socket exception reading or writing? |
N-1 requests being lost is fine if the connection is down. |
@swankjesse : I couldn't figure out how to write a test for this because making all the sockets disconnected was happening at at an OS level. Tried to write and Android Test Butler one (to flip the network switch on/off on an Android emulator) but the current version of that has issues and probably wouldn't work in this code base :-) |
So our attempts to write to the socket are failing silently? Might need to steal the automatic pings that we added for web sockets. |
Essentially...not that they're failing silently, but they're dead sockets and they're stuck in the pool. We traced through a bit of the code and saw some code that was pulling the a dead socket out of the pool each time it tried to use one which should have cleared things up after 5 dead sockets were pulled out but the network layer still appeared stuck unless we purged the pool w/ evictAll() or waited for the 5 min eviction timeout. Wasn't obvious what a proper fix was... |
Pretty sure this issue is another manifestation of this one: |
I'm sure it's not. We don't see any SSL Handshake exceptions. This bug is actually probably two bugs because we had to disable the connection pool and the HTTP/2 support. #3118 might be affected by the connection pool bug (it doesn't clear the broken idle connection objects in the pool). |
I've seen what you've described but then also the ssl exceptions. Same steps to reproduce as you outlined. |
any updates? I've same issue |
The workaround I described works in our QA testing so far :-) |
@kenyee setting new pool works, but I wonder when an update will arrive? |
Is this issue resolved? I ran into the same issue using 3.5.0. I am using OkHttp to send push to Apple http/2. Yesterday I had this issue resulting in almost 80k push messages not getting delivered.
After I got this error, none of my other requests succeeded. Code:
|
As socket timeout exception is an instance of IO exception, I am not sure if the following approach will work. I am calling evictAll() in the catch block of IOException.
Also how do we check if a connection is stale or not? With Apache HttpClient, there is a way to do it to set a flag for checking stale connections.
|
Any updates? I have the same issue too. :( |
Same issue here! |
We still experiencing the same issue :-( |
I think i'm seeing another manifestation of this on 3.5.0, when the server forcibly closes the connection. We try to establish both a h2 and http1.1 connection. The server responds with 200 to both:
Then at some point we try to read from the http2 connection, which fails in checkNotClosed and throws a StreamResetException
Then, since this is media, we do something that causes a seek to 0 in the media, which needs to reopen the request from the beginning. At this point, we see the same exception as is posted above:
this seems to be very similar to the other cases here, which seem to all be related to an ungraceful shutdown of the connection, and it remaining pooled. I've also confirmed that disabling the ConnectionPool "works around" this issue:
|
is thr any update on this issue? |
@yschimke We are NOT using ExoPlayer. We are using just OkHttp+Retrofit, like in the provided project. |
Ahhhh, was mixing this up with comment by GouravSna. Is there any chance you are interrupting threads doing OkHttp requests? That's still the only reproduction I have for this problem. Interrupting could be a) Thread.interrupt, or b) future.cancel(true). |
Nope, that's basically the whole logic related to Okhttp request: public void makeRequest() {
textView.setText("...");
Call<String> repos = service.helloWorld();
repos.enqueue(new Callback<String>() {
@Override
public void onResponse(Call<String> call, Response<String> response) {
Log.d("TEST", response.body());
textView.setText(response.body());
}
@Override
public void onFailure(Call<String> call, Throwable t) {
t.printStackTrace();
textView.setText(t.getMessage());
}
});
} |
@robertszuba I'll take a further look, since I was able to repro with ExoPlayer with the same symptom, I was focusing on that. It's likely these are two separate bugs in that case. Your repro seems quite simple, I'll try to reproduce with it on the weekend and get back to you. |
I believe this is a bug in the emulator. In particular I think the emulator is breaking the TCP socket in a way OkHttp can't detect. |
The bug can be related to a specific versions of emulator indeed (it works fine on most of them). The problem is that some of our customers report it for specific real devices as well (even though I was not able to reproduce it on my test devices). |
@robertszuba any specific devices you can name? |
Relates to square/okhttp#3146. This was from #71. There is a draft PR https://github.com/square/okhttp/pull/7185/files which documents OkHttp's ideal handling of cancellation including interrupts. But a few key points 1) This is a target state, and OkHttp does not currently handle interrupts correctly. In the past this has been identified, and the advice is to avoid interrupts on Http threads, see discussion on square/okhttp#1902. Also an attempt at a fix here square/okhttp#7023 which wasn't in a form to land. 2) Even with this fixed, it is likely to never be optimal, because of OkHttp sharing a socket connection for multiple inflight requests. From square/okhttp#7185 ``` Thread.interrupt() is Clumsy ---------------------------- `Thread.interrupt()` is Java's built-in mechanism to cancel an in-flight `Thread`, regardless of what work it's currently performing. We recommend against using `Thread.interrupt()` with OkHttp because it may disrupt shared resources including HTTP/2 connections and cache files. In particular, calling `Thread.interrupt()` may cause unrelated threads' call to fail with an `IOException`. ``` This PR leaves the Loader/DataSource thread parked on a countdown latch, while this may seem wasteful and an additional context switch. However in practice the response isn't returned until the Http2Connection and Http2Reader have a response from the server and these means effectively parking in a `wait()` statement here https://github.com/square/okhttp/blob/9e039e94123defbfd5f11dc64ae146c46b7230eb/okhttp/src/jvmMain/kotlin/okhttp3/internal/http2/Http2Stream.kt#L140 PiperOrigin-RevId: 446652468
Samsung Galaxy S21 Ultra with Android 12. But as mentioned, this comes from end users, I don't have S21 to test it myself and I wasn't able to reproduce it on other devices yet. |
Relates to square/okhttp#3146. This was from androidx/media#71. There is a draft PR https://github.com/square/okhttp/pull/7185/files which documents OkHttp's ideal handling of cancellation including interrupts. But a few key points 1) This is a target state, and OkHttp does not currently handle interrupts correctly. In the past this has been identified, and the advice is to avoid interrupts on Http threads, see discussion on square/okhttp#1902. Also an attempt at a fix here square/okhttp#7023 which wasn't in a form to land. 2) Even with this fixed, it is likely to never be optimal, because of OkHttp sharing a socket connection for multiple inflight requests. From square/okhttp#7185 ``` Thread.interrupt() is Clumsy ---------------------------- `Thread.interrupt()` is Java's built-in mechanism to cancel an in-flight `Thread`, regardless of what work it's currently performing. We recommend against using `Thread.interrupt()` with OkHttp because it may disrupt shared resources including HTTP/2 connections and cache files. In particular, calling `Thread.interrupt()` may cause unrelated threads' call to fail with an `IOException`. ``` This PR leaves the Loader/DataSource thread parked on a countdown latch, while this may seem wasteful and an additional context switch. However in practice the response isn't returned until the Http2Connection and Http2Reader have a response from the server and these means effectively parking in a `wait()` statement here https://github.com/square/okhttp/blob/9e039e94123defbfd5f11dc64ae146c46b7230eb/okhttp/src/jvmMain/kotlin/okhttp3/internal/http2/Http2Stream.kt#L140 PiperOrigin-RevId: 446652468
In the last month, since we had this issue crop up, we had 14 occurrences, across 5 OS versions, 6 manufacturers and 12 models. OS Versions: Models: I've just pushed and update to our users, changing the connection pool and protocols, as per one of the first posts.
I'm unable to provide any more info for the time being, we've mostly run into this issue when using our
I'll report back whether the aformentioned suggestion still produces the issue. This was all with OkHttp version 5.0.0-alpha.7 and previous alphas. |
@rolandsarosy-verycreatives any news if the ConnectionPool workaround fixed the problem for you? Also someone mentioned at https://stackoverflow.com/a/60949304/4514843 that setting a pingIntervall would be preferable to disabling the Connectionpool, any thoughts on this maybe @swankjesse? |
setting the ping interval did not resolve the issue for me on 4.10.0 will try @jpearl 's solution and will report back |
^^ above section didn't resolve the issue either :( |
I am facing the java.net.SocketTimeoutException issue on okhttp client calls. Timeout happens when connected on Mobile Data but WIFI connection works pretty fine. (I have verified the mobile data connection is strong & can stream YouTube videos) The error is noted in below Android Devices & OS versions
I have tested on different okhttp versions 4.7.2 , 4.10.0 and latest 5.0.0-alpha.10 The workaround(s) shared (see below), e.g setting the connection pool and protocols has not solved.
Anyone who has a lead on how to fix the timeouts on Mobile Data for select devices can share tips. hello @swankjesse your knowledge on this will also be appreciated. |
@saxal28 @rolandsarosy-verycreatives did you find a working fix for this that has lasted. if so, kindly share. |
@yschimke We are using okhttp 4.9.3 in a Java backend app - wrapped with FailSafe, which ensures a timeout (so some kind of thread interrupt/cancel I guess?) We get a lot of these timeouts and with call instrumentation we see the following
Our 99th percentile response time for request-body-end is around 150 milliseconds. But with these kinds of failures we ALWAYS see 0 millis. We have an overall 450 millisecond timeout set with FailSafe. Where we send the data we often see they received and sent the response well within the time limit. We're using http/2 connections. Does this sound the same as the issue you reproduced? |
It's not clear it's the same. Are you seeing any interrupt exceptions? The safe way to cancel a Call is Call.cancel(). If you are doing that it's not the same issue. |
@yschimke we do appear to see interrupted exceptions...
I think we possibly want the okhttp plugin for failsafe - https://failsafe.dev/okhttp/ - seems like it does a more graceful kill with We'll try that out and report back. |
Yeah, since it's doing cancels without interrupts I think that looks better. The problem I faced is that if using Call.execute() which is synchronous, the actual initial request processing including writing to the HTTP/2 socket is done on the calling thread, so an interrupt leaves the HTTP/2 connection in an inconsistent state. The fix in media3, androidx/media@80928e7 Was to always use enqueue, even to simulate an execute so the thread writing the underlying socket is never a user thread that can be interrupted. |
- [x] close down all cached connections upon socket time out (see)[square/okhttp#3146] - [x] units and impls
Closing as a dupe of #7841 This issue should be already fixed in Media3 by using call.enqueue and clean cancellation. |
Tried writing a unit test w/ TestButler on Android w/ no luck, so I'll write up the steps to reproduce this and include some sample code. This happens if you connect to an HTTP/2 server and your network goes down while the okhttp client is connected to it:
okhttp client should attempt to reopen the HTTP/2 connection instead of being stuck in this state
Code sample for Android (create a trivial view w/ a button and a textview):
The text was updated successfully, but these errors were encountered: