-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SocketsHttpHandler.SendAsync reads request Content stream twice #53914
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionUnder very specific conditions, SocketsHttpHandler.SendAsync when using HTTP/2 attempts to copy the HttpRequestMessage.Content twice. In Island Gateway (AKA what eventually became YARP), we have code similar to the following, and this invariant that we do a best-effort check is being violated: The following criteria seem to be necessary to trigger this bug, though it may be broader than this:
It seems like the SocketsHttpHandler machinery may be deciding it is safe to retry because the destination service provably did not start processing the first attempt, but it seems to make the wrong assumption that it is okay to read the HttpRequestMessage.Content again on the next attempt. This is not a valid assumption, and leads to issues in applications such as YARP where the Content is a wrapper around Kestrel's incoming request body stream, which cannot be read twice. Configuration
Other informationFiling this as a result of an offline discussion with @halter73 and @davidfowl. Thanks folks for helping investigate. @karelz FYI.
|
Can you try with a recent .NET 6 build? There are a bunch of situations that can cause us to try to read the content stream twice. If the content doesn't support this, it can throw and we should fail the request. There's not really anything else we can do at that point. |
More info: Things that can cause the content stream to be read twice include stuff like redirects and authentication. We will handle specific status codes like 302 or 401 and issue another request and try to send the content again. Expect: 100-continue can be used to try to avoid having to read the content twice in cases like this, because hopefully we will get the 302 or 401 before proceeding to send content; but obviously this only will work if the expect timeout doesn't expire. Are you using redirects or authentication here? We do have logic to retry requests on connection shutdown -- that's what HttpConnectionPool.SendWithRetryAsync is about. But this shouldn't cause the content stream to be read twice. Once we start sending any content, the request is no longer retryable. |
Thanks Geoff. 1: Unfortunately I do not have a minimal repro of this atm and I am unable to verify on .NET 6 (we will not transition our production workloads before GA). 2: There are no redirects nor authentication at play here. Connection shutdown is likely the trigger -- during the incident where we had this happen, the destination service was overwhelmed and couldn't handle the load. This meant many of our requests were hitting the 1-second timeout for Expect:100-continue. And in at least a few cases, this did cause the content stream to be read twice. We have unambiguous evidence of this based on a few occurrences of the error I mentioned in the issue above (
|
Ahhhh, this is HTTP2. I had assumed HTTP/1.1. Yes, with HTTP2, there are additional scenarios. When we receive a GOAWAY (server-initiated connection shutdown), the server tells us which streams will be processed and which won't; the ones that haven't are retryable, even if we already started sending the content. So in this case you will definitely see us try to read the content twice. That said, this is expected behavior. If you can't support reading the content again, then you should throw and the request should be failed. |
@Tratcher for your take re. YARP. If confirmed this is acceptable, we would at least want to make sure we are handling it correctly and emitting the right data to telemetry. E.g. perhaps this should count as a Transport error (?) |
You can also consider increasing the Expect100ContinueTimeout so that you'll hit this less frequently. |
This part's concerning, it means SerializeToStreamAsync is called concurrently on two threads and the This could be mitigated by making the Started check in YARP thread safe. I'll file an issue for that (microsoft/reverse-proxy#1068). As for the retry scenario in general, yes that's expected to fail in YARP and it should be reported accordingly. What type of failure is it currently reported as? It might depend on if it sent the whole request body before retrying, or if it was aborted part way through. |
Raising the 100-continue timeout is a valid mitigation and shouldn't affect responsive servers. |
Looking at the code, we should be waiting for SerializeToStreamAsync to complete before we actually fail the request. So I don't see how we could end up with concurrent SerializeToStreamAsync calls here. (I'm looking at 6.0 code, but I don't think this has changed much since 3.1.) Note that in cases like this, where we get a GOAWAY or similar and need to cancel sending the request body, we will do two things: (1) Cancel the cancellation token passed to SerializeToStreamAsync |
@geoffkizer there's no cancellation parameter here, can you clarify what you're referring to? This may be the issue at hand. We really are making no effort to stop the first SerializeToStreamAsync, because there doesn't seem to be any signal we should be reacting to. public abstract class HttpContent : IDisposable
{
//...
protected abstract Task SerializeToStreamAsync(Stream stream, TransportContext context);
}
The failure we are experiencing happens before we attempt to write to the request body stream. We were still waiting to read bytes from the other side before proxying to the destination. |
SerializeToStreamAsync added a new cancellable overload in 5.0. YARP hasn't implemented it yet since we're still supporting 3.1.
|
Wouldn't we throw it away in YARP anyway? Right now we have a comment in the code about it
|
@geoffkizer @MihaZupan is this a linked token? What are the possible triggers for it? |
@MihaZupan did you get chance to look at it? It is not clear why untriaged label was removed. |
From reading the code, it looks like it is possible that the request content will be read multiple times concurrently on HTTP 1:
This could cause the exception seen in dotnet/aspnetcore#33409 (comment). Judging from the stack trace seen by David, I suspect a similar race exists for HTTP/2. @geoffkizer If we're saying that the content may be read multiple times, are we also saying that those reads may happen concurrently? |
No, the read should never happen concurrently. |
Your analysis seems correct to me. Seems like we need to possibly do a couple things here. The primary one is, we should wait for the sendRequestContentTask to complete before we propagate the exception upward. That should ensure we don't read concurrently from the request content. However, doing that may introduce some new issues. In order to ensure the sendRequestContentTask completes promptly (or indeed at all), we need to cancel it. Unlike HTTP2, we don't seem to have logic to do this for HTTP1.1; I think we should.
It's certainly possible, but the HTTP2 code is pretty different so we should investigate and confirm One thing I noticed from the stack trace you linked is, the derived HttpContent class doesn't implement the SerializeToStreamAsync overload with a cancellation token. That means that even if we try to cancel sending the request body, it won't necessarily be cancelled promptly. I'm not sure this matters or not in terms of reproducing the issue -- I can't see how it would matter, but maybe I'm missing something. As an aside, this all makes me wish we had common Expect: 100-Continue logic across different HTTP versions. We now have this implemented in three different places, unfortunately. |
Triage: There is workaround (lock in custom HttpContent), so not critical for 6.0. But we should fix it -- i.e. make sure we don't start 2nd read before fully cancelling the 1st one. |
Actually I think there's an even more basic issue here:
We shouldn't be both setting _canRetry = true and actually starting to send the body. We either need to prevent the body from sending when _canRetry is set to true, or if we can't (i.e. we lose the race), then don't set _canRetry to true. Edit: And if we did this, we would prevent the concurrent read because we'd never retry if we started sending the content. |
We recently started seeing this issue on HTTP/2 even without Example callstack of the recent occurrences is below (we had >400 hits of this in a few hours):
|
As I mentioned above -- there are various scenarios where we will try to send the content twice, and if the content doesn't support this, it will fail. It should never try to send the content concurrently, but as long as it's not, then trying to send the content twice is expected in the scenarios discussed above. |
BTW I assume this is still .NET Core 3.1? I.e. no code has changed on the client here? |
Correct, still .NET Core 3.1. Likely the only thing that changed was request timing patterns as well as the destination services we are routing to. |
Hi @geoffkizer , @Tratcher , we meet the exact same issue. We are using .NET Core 3.1 and HTTP/2. May I ask is there any recommended action to mitigate this issue? Or what kind of error is recommended to return to user for this case? |
Triage: This problem exists already in 6.0, so not a regression. |
Triage: still limited customer impact, moving to future. |
I think I figured out how concurrent calls to
The gist is that we aren't waiting for |
@karelz we seem to observe the same behavior using YARP 2.2 and .NET Core 8. You were mentionning that 1 customer has a workaround, what is this workaround and how can we add it in our system? Thanks! |
@sberube I take it you're seeing the That may happen during normal proxy operation if HttpClient decides to retry a request, something that YARP doesn't support as we don't buffer the request body. This issue is tracking a bug in SocketsHttpHandler where those reads may happen concurrently. The workaround mentioned is mitigating just that aspect, and it's being used by YARP internally. |
Description
Under very specific conditions, SocketsHttpHandler.SendAsync when using HTTP/2 attempts to copy the HttpRequestMessage.Content twice.
In Island Gateway (AKA what eventually became YARP), we have code similar to the following, and this invariant that we do a best-effort check is being violated:
https://github.com/microsoft/reverse-proxy/blob/961608d01adc9ab07d4cd3f5adf7d4d3dfff3f0f/src/ReverseProxy/Proxy/StreamCopyHttpContent.cs#L118-L125
The following criteria seem to be necessary to trigger this bug, though it may be broader than this:
Expect: 100-continue
100
response within the first 1 secondIt seems like the SocketsHttpHandler machinery may be deciding it is safe to retry because the destination service provably did not start processing the first attempt, but it seems to make the wrong assumption that it is okay to read the HttpRequestMessage.Content again on the next attempt. This is not a valid assumption, and leads to issues in applications such as YARP where the Content is a wrapper around Kestrel's incoming request body stream, which cannot be read twice.
Configuration
Other information
Filing this as a result of an offline discussion with @halter73 and @davidfowl. Thanks folks for helping investigate.
@karelz FYI.
The text was updated successfully, but these errors were encountered: