-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client timeout and async close exceptions when setting max duration on pool #7107
Comments
Thanks for the detailed report, and thank you very much for the reproducing instructions as those are invaluable. Indeed there is a problem when the connection expiration feature is enabled on a multiplexed connection pool. I'm currently working on a fix and will keep you updated soon. |
the pool must be asked to know if a released connection can be closed
the pool must be asked to know if a released connection can be closed
I believe I nailed the problem and have a fix ready, hopefully it will get reviewed and merged soon: #7123 Please note that my PR is against 10.0.x because this bug impacts both 10.0.x and 11.0.x identically, and probably 9.4.x too. |
Here are some logs about our pool blocked in a bad state, all our client requests generate these logs :
No sockets opened to the destination, verified with @lorban do you think this is an other issue, not related with this one ? Do you have any idea of what's happening ? We still have not been successful to reproduce the problem locally, this only happens in our production environment. Thanks for your time. |
The logs you included are too brief to jump to any conclusion but since you suspect your connection pool being stuck, I would advise you to dump the server to try to get an idea of what the pool's state actually is and post the output on this issue. Have a look here to get an idea of how to get such dump. What we're looking for looks like the following:
Hopefully, the dump will tell us enough about the state of the pool to figure out how it ended up corrupted. As a secondary step, I would advise you -if possible- to try using the HTTP/1.1 protocol instead of HTTP/2 to figure if the problem is still reproducible or if it's specific to HTTP/2. |
Here a full dump of pools when the issue occurs :
Where the one in bad state is :
Hope it can help to fix the issue. |
Thank you for reporting the pool dump, it really helped figuring out what state the pool and its connections ended up in so I could reproduce the problem locally. That is caused by second bug for which I opened a different issue: #7157. This issue's fix has already been merged into the Thanks again for the report! |
Thank you for fixing quickly these issues. We would like to try the Thanks again for your help. |
Jetty version(s)
11.0.6
Java version/vendor
(use: java -version)
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
OS type/version
Linux 5.4.0-89-generic #100~18.04.1-Ubuntu SMP Wed Sep 29 10:59:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Description
We have been using Jetty in production for a while, client and server side, with a custom client
RoundRobinConnectionPool
that support TTL connection, like this proposal (#1893), after upgrading we have tested the max duration implementation (#5799) and we have encountered several issues by carrying out production load tests.We have detected many
AsynchronousCloseException
after the connection was marked as removed due to expiration and after a certain time all our client requests fails with aTimeoutException
. I don't know if this is two separated issues or the same root cause, but this is a blocking issue for us when all clients requests fails, the pool seems to be in a bad and is not able to recover.How to reproduce?
I have reproduced the
AsynchronousCloseException
problem in the branch11.0.x
by running the concurrent load test and completing theHttpClientLoadTest.LoadTransportScenario.provideClientTransport()
with this snippet :The exception stacktrace :
For reproducibility reason, the max duration is very low (500ms) in this test but in our production load tests the max duration parameter was setted to 60s and the problem was the same.
The text was updated successfully, but these errors were encountered: