Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kestrel should trigger ConnectionClosedRequested on HTTP/2 and HTT/3 GOAWAYs #42057

Closed
halter73 opened this issue Jun 6, 2022 · 9 comments · Fixed by #43431
Closed

Kestrel should trigger ConnectionClosedRequested on HTTP/2 and HTT/3 GOAWAYs #42057

halter73 opened this issue Jun 6, 2022 · 9 comments · Fixed by #43431
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-kestrel help wanted Up for grabs. We would accept a PR to help resolve this issue

Comments

@halter73
Copy link
Member

halter73 commented Jun 6, 2022

Is your feature request related to a problem? Please describe the problem.

Today, when a proxy like YARP tries to shut down when a backend is handling a long running connection like those belonging to SignalR or gRPC, it invariably runs into the now 30 second Host ShutdownTimeout. That's because the backend has no way to observe that the proxy (or really any client) is trying to gracefully close the connection at the app layer.

Describe the solution you'd like

Kestrel should trigger IConnectionLifetimeNotificationFeature.ConnectionClosedRequested when it receives an HTTP/2 or HTTP/3 GOAWAY frame requesting a graceful shutdown. This will allow frameworks like SignalR and gRPC to react to clients and proxies like YARP shutting down and trying to gracefully close the connection. We already trigger this during server shutdown and SignalR already hooks this feature. Unfortunately, SignalR still primarily uses WebSockets over HTTP/1.1 where this wouldn't help, but that might change very soon with #7801.

Additional context

We have at least one internal customer that had a sev1 outage because YARP did not shutdown as quickly as expected. Long running connections to the backend are a suspected culprit.

This could affect the design of #25069 ("SignalR connections are shutdown directly during graceful shutdowns") which is more server shutdown focused but should be designed to support any situation where we want to gracefully close a SignalR connection like when the proxy is shutting down.

For HTTP/1.1, there's no way to indicate you want to gracefully close an ongoing response. A FIN would immediately close the entire connection. As a quicker hackier fix that might work for all HTTP versions, you could also conceive of a privileged endpoint that only YARP can access similar to a healthcheck that tells the backend to restart when Kestrel shuts down. This would trigger ApplicationStopping on the backend and hopefully give more graceful shutdown behavior at the cost of restarting all the backends.

Another option for WebSocket requests specifically (so this wouldn't help gRPC) would be for YARP to initiate the WebSocket closing handshake when it shuts down rather than just forwarding the upgraded stream and hoping it stops before the timeout.

@ghost
Copy link

ghost commented Jun 6, 2022

Thanks for contacting us.

We're moving this issue to the .NET 7 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@adityamandaleeka
Copy link
Member

Triage: this shouldn't take too long to implement. Let's try to get it done in 7 if we can, but it's not release-blocking.

@adityamandaleeka adityamandaleeka added the help wanted Up for grabs. We would accept a PR to help resolve this issue label Jun 6, 2022
@ladeak
Copy link
Contributor

ladeak commented Aug 11, 2022

Questions

Summary

Let me summarize your description in a few words, to make sure my understanding is correct: long running h2 and h3 requests are being proxied through YARP to a 'backend service' doing gRPC or SignalR on ASP.NET Core. When YARP gets shut down, it sends a GOAWAY message to the backend service. On the backend service when Kestrel handles this message, (you suggest) it shall call cancel on IConnectionLifetimeNotificationFeature.ConnectionClosedRequested cancelation token, so that gRPC, SignlalR, etc. can close the connection.

Design Proposal

After a quick initial assessment of the code, I would use Http2Connection.cs Task ProcessGoAwayFrameAsync() method to trigger the cancellation. The cancellation will eventually call StopProcessingNextRequest method (again) due to a subscription in HttpConnection.ProcessRequestsAsync(), but this should be fine as long as the invocation of StopProcessingNextRequest with serverInitiated: false happens first. Otherwise, I see the server sends a GOAWAY

For Http3 - I expect a similar approach.

Finally, covering all with tests will be fun.

@halter73
Copy link
Member Author

@halter73 is the description still up to date? (I am asking because I see some hesitation in the linked ticket microsoft/reverse-proxy#1756)

I think the description is up to date. It's simply the right thing to do in my opinion. I should be relatively easy change to make and it shouldn't hurt perf at all. But not many people have complained about this so far, so it hasn't been a priority.

microsoft/reverse-proxy#1756 is a little different because it's about YARP handling the WebSocket closing handshake and YARP still treats WebSockets as an opaque upgrade stream. It would be a bigger effort to make a change in YARP to handle the WebSocket closing handshake than it would be to make the change to Kestrel suggested in this issue, and it might regress perf a little (but I doubt by much).

I still think it would be best for YARP to do this, I'm not sure if the reason YARP doesn't want to go this route is complexity, perf or something else. @Tratcher or @karelz might be able to provide insight there.

@halter73
Copy link
Member Author

I now remember the likely reason why the YARP folks aren't super excited to do my WebSocket suggestion:

Parsing and redoing the WebSocket framing will have some minor performance implications and injecting a close handshake into a proxied WebSocket could be behavior people don't want, so this should probably be made opt-in. Although, I do think the fast (sic) majority of customers would prefer this behavior.

There are no such downsides to this proposal though.

@ladeak
Copy link
Contributor

ladeak commented Aug 12, 2022

May I pick this up with the proposed design?

@halter73
Copy link
Member Author

I don't see why not! At this point any changes are not going to make it until .NET 8 though. Does this sound good to you @JamesNK?

@ladeak
Copy link
Contributor

ladeak commented Aug 15, 2022

A question related to HTTP3 and Quic:

GOAWAY frames are processed by the Http3ControlStream.

Approach 1

There I can access and trigger the IConnectionLifetimeNotificationFeature connection feature as:

_context.Connection.StopProcessingNextRequest(serverInitiated: false);
_context.ConnectionContext.Features.Get<IConnectionLifetimeNotificationFeature>()?.RequestClose();

but not sure if that is the right way of dealing with dependencies.

Approach 2 (alternative approach I could think of)

Creating a new feature for (control) streams which provides an ability to register callbacks. (Similar to IStreamClosedFeature)

When GOAWAY received all callbacks would be invoked. Http3Connection would register one such callback upon creating the streams. This callback would eventually invoke IConnectionLifetimeNotificationFeature's RequestClose().

Do you have any concerns with Approach 1?

@ghost
Copy link

ghost commented Sep 13, 2022

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

ladeak added a commit to ladeak/aspnetcore that referenced this issue Sep 16, 2022
… GoAway

On GoAway frames triggering `IConnectionLifetimeNotificationFeature`'s cancellation for gRPC and SignalR to initiate teardown of long running requests.

## Description

For Kestrel h2 and h3 triggering `IConnectionLifetimeNotificationFeature` feature's *RequestClose()* method on GoAway frames, so that frameworks like SignalR or gRPC may proactively start closing long running request.
h3 control stream (which is the only stream that handles GoAway) also calls *StopProcessingNextRequest()* method otherwise `HttpConnection` (subscribed to `IConnectionLifetimeNotificationFeature`) would send GoAway's from server. h2 already does the same.

Adding new tests: `Http3InMemory` now implements *RequestClose()* method and `Http2TestBase` mimics sharing the same feature collection on ConnectionFeatures and by the Connection's Features properties.

Fixes dotnet#42057
@ghost ghost locked as resolved and limited conversation to collaborators Oct 20, 2022
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-kestrel help wanted Up for grabs. We would accept a PR to help resolve this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants