-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown does not wait for resumed requests #5105
Comments
@sbordet could you provide some guidance to @thomasdraebing to how resolve this problem? This issue is breaking the rolling restart experience because of the breakages of incoming Git/HTTP requests. Thanks for your feedback. |
Wonder if a new GracefulShutdownHandler is in order. One that simply uses the information from HttpChannel.Listener to know if the request is complete or not. |
A GracefulShutdownHandler sounds like a good idea to me. I also agree that a graceful shutdown should wait requests in all states to complete. |
@sbordet is the fix for this to take into account the _asyncDispatches, as well as the requestStats? |
I tried to implement @joakime 's suggestion in #5156. So far, it does most of what I would expect, but it does not work in one case: If a request is being served, when the shutdown is triggered and then after the trigger of the shutdown is cancelled by the client, the server continues to wait for the request to finish until the timeout. Do you have any idea, how I could detect whether the request is cancelled by the client? Comments and feedback on the draft are very welcome, especially since I am a total newbie to jetty. |
I don't quiet understand the OPs analysis. When an async request is dispatched "resumed" the request stats are not incremented. However, when that async request completes, the request count is checked and if zero then the shutdown will complete. There can never be a current dispatch unless there is a current request, so checking "no current requests AND no current dispatches" is equivalent to "no current requests"... unless we have a bug somewhere. I'm against a separate We need to know more why this fails/succeeds for the OP before we can say more |
@lachlan-roberts as you've recently worked on graceful shutdown, can you tune into this issue and help out (or take over from?) @sbordet |
Do we have a reproducer? Ideally, a unit test that would fail on gerrit@HEAD? |
@thomasdraebing with regards to the issue of detecting client closes, this is not really possible with HTTP/1. When a request is suspended and not doing any IO activity, there is no way to know if the connection has been closed. In order to detect a closer, the container would have to try to read or write something. If it tried to read something then it would detect closures... but it might also read an actual character... which would have to be buffered... then read again... and suddenly we have a potential infinite buffer and a DOS attack vector. So we can't read from the connection whilst the request is suspended. The container also cannot write to the connection.... but the application could, regularly sending either a 102 processing response (if the client can handle it) or some white space (if the content type allows). Not elegant! To solve this problem at the container level, use HTTP/2 which will give you an AsyncListener.onError callback if the connection closes whilst a request is suspended. |
So if the problem is that your graceful shutdown is being held up by such closed connections, then perhaps we do need to provide a "mostly graceful" option that shutsdown the container once there are no dispatched requests. This would have to be optional behaviour, and we'd need the same behaviour in both the Stats handler and any Gracefulshutdown handler. Perhaps your PR could be modified to move that optional logic into a My criteria are:
|
@davido No we unfortunately don't have a test for Gerrit testing this. I did the testing manually, but I could push a change adding a test that reproduces the issue. I would have done this anyway, when we would have updated jetty or applied a different fix. For now, let me describe how I am able to reproduce the issue in the context of Gerrit:
The expected behaviour would be that Gerrit waits for the clone request to finish or the timeout to be reached before stopping Jetty, but actually the request is directly cancelled. A Basically what happened with the request was, it reached the StatisticsHandler and triggered the increase of the Note ,that this issue only happens for git requests, but not for REST API requests. Thus, it could also be an issue with how Jgit, the Git implementation Gerrit uses, handles HTTP requests, which isn't compatible anymore after #2721. |
@thomasdraebing if it is true that "the current request stat is 0 and the current dispatched stat is 1" then that is definitely a jetty bug. @lachlan-roberts can you check the code see if you can find a way for this to happen? I can't see it, other then perhaps in transitory states. |
@gregw Thanks for all your comments and suggestions! I agree that the code duplication should be reduced, e.g. by having the StatisticsHandler extending the GracefulShutdownHandler. I also agree, that both handlers should still provide a method to gracefully shutdown, if only to provide backward compatibility. If we however, have both handlers using the same condition to decide, when the shutdown succeeded, I don't see a lot of benefit of having two distinct handlers, except for having a little less data collected, if using the GracefulShutdownHandler. Not sure, whether this would be worth it. If adding an option to the StatisticsHandler to also check current dispatched requests would be a solution to the issue, I think I would prefer to just add this option to the StatisticsHandler and to not introduce the GracefulShutdownHandler at all. |
However, if you are right and this is a bug in Jetty somewhere else, a new handler or a change in the Statisticshandler might not be required. Let's wait for what @lachlan-roberts finds out. |
@lachlan-roberts @sbordet et al,
The only way I see to get "the current request stat is 0 and the current dispatched stat is 1" is for the async request to complete while it is dispatched? I don't think that can happen. |
Can you past here to stack traces, one that is trigerring line 150 and another that is triggering line 193 code paths? |
I think the issue is, that in the requests I observed |
@davido (line numbers differ a bit. The ones above are from
and
during
and
However, this does not catch the second call of |
I found a minimal fix for my issue. It appears that the async request can already start before the finally block in the StatisticsHandler is reached. In that case the StatisticsHandler does not see it as an async request and will handle it as a synchronous request and directly marks it as finished. By not only checking, if the request is suspended but also if an async request is started, this is fixed. |
@thomasdraebing I agree, I was looking into this earlier and I think the I am currently preparing a PR which will make this change and but also add an optional configuration to not wait for the async requests when doing a graceful shutdown, but instead shutdown when there are no actively dispatched requests. |
If the request is async dispatched, the check state.isSuspended() is not correct to determine if the request was async or not. The check state.isAsyncStarted() should be used instead. Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
…quests in StatisticsHandler Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
If the request is async dispatched, the check state.isSuspended() is not correct to determine if the request was async or not. The check state.isAsyncStarted() should be used instead. Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
…quests in StatisticsHandler Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
StatisticsHandler no longer gives 503 responses after shutdown. Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
Issue #5105 - StatisticsHandler Graceful Shutdown of Async Requests
This has been fixed by PR #5175 and will be available in a 9.4.32 release. |
Thanks for the discussion and the quick fix! |
That's great news, thanks everyone for helping fixing this issue. |
This release fixes the bug "Graceful shutdown does not wait for resumed requests" jetty/jetty.project#5105 Full release notes: https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.32.v20200930 Change-Id: Ie3d28263c6e803ced5b8ec059a8b7e9cca5d3b32
Jetty version
9.4.30.v20200611
Java version
OS type/version
OSX 10.15.6
Description
When a resumed request is running and the server is shut down gracefully, the
StatisticsHandler
does not consider this request, when deciding, whether to let the server shut down. The reason seems to be that the first timeStatisticsHandler.handle()
is called the_requestStats._current.value
is incremented and afterwards decremented, when the request is then resumed,_requestStats._current.value
is not incremented again and thus it is not considered in the shutdown, which since PR #2721 only checks for current requests. Reverting the change fixed the issue for me.For context, the issue occured for me running Gerrit (2.16 or later) and trying to shut down gracefully during a clone request (handled by jgit on server side).
Just reverting the PR mentioned above would again cause issues with graceful shutdowns in other cases. What would be a good way to solve this? Should the StatisticsHandler only let the server shut down if there are no current requests AND no current dispatches? Should a resumed request increment the current request stat, while intermediately decrementing the total request stat? Or would adding a different stat for current resumed request be the way to go?
Thanks and best regards,
Thomas
The text was updated successfully, but these errors were encountered: