Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpConnectionPool violates HttpHeaders thread-safety for CONNECT tunnels #65379

Closed
MihaZupan opened this issue Feb 15, 2022 · 23 comments · Fixed by #73331
Closed

HttpConnectionPool violates HttpHeaders thread-safety for CONNECT tunnels #65379

MihaZupan opened this issue Feb 15, 2022 · 23 comments · Fixed by #73331
Assignees
Labels
area-System.Net.Http bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@MihaZupan
Copy link
Member

MihaZupan commented Feb 15, 2022

In .NET 6, the creation of a connection has been decoupled from the initiating request. That is, a connection attempt may still be ongoing even after the initiating request was canceled or was served by a different connection that became available.

In the case of HttpConnectionKind.SslProxyTunnel (CONNECT tunnel), we copy the original request's User-Agent header to the tunnel request here. The problem is that we access the header too late.
At that point, the original request's header collection could be enumerated on a different thread (served by a different connection).
Or, the SendAsync call associated with the request already returned, seemingly returning ownership back to the user. The user may then attempt to enumerate/inspect the headers, which would again race with the read from EstablishProxyTunnelAsync.

This is a possible explanation for an issue a user hit in #61798 (comment) based on the stack trace.

cc: @Strepto

This specific issue can be worked around by manually forcing the parsing of the User-Agent header:

var socketsHttpHandler = new SocketsHttpHandler();
var customHandler = new CustomHandler(socketsHttpHandler);
var httpClient = new HttpClient(customHandler);

public sealed class CustomHandler : DelegatingHandler
{
    public CustomHandler(HttpMessageHandler innerHandler) : base(innerHandler) { }

    protected override HttpResponseMessage Send(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        _ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing

        return base.Send(request, cancellationToken);
    }

    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        _ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing

        return base.SendAsync(request, cancellationToken);
    }
}
@ghost
Copy link

ghost commented Feb 15, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

In .NET 6, the creation of a connection has been decoupled from the initiating request. That is, a connection attempt may still be ongoing even after the initiating request was canceled or was served by a different connection that became available.

In the case of HttpConnectionKind.SslProxyTunnel (CONNECT tunnel), we copy the original request's User-Agent header to the tunnel request here. The problem is that we access the header too late.
At that point, the original request's header collection could be enumerated on a different thread (served by a different connection).
Or, the SendAsync call associated with the request already returned, seemingly returning ownership back to the user. The user may then attempt to enumerate/inspect the headers, which would again race with the read from EstablishProxyTunnelAsync.

This is a possible explanation for an issue a user hit in #61798 (comment) based on the stack trace.

cc: @Strepto

Author: MihaZupan
Assignees: -
Labels:

bug, area-System.Net.Http

Milestone: -

@geoffkizer
Copy link
Contributor

Presumably the fix here is to make sure we capture the User-Agent from the original request before we actually put the request in the queue (and make it available for other connections to grab), is that right?

@MihaZupan
Copy link
Member Author

Yes, that's right.

@geoffkizer
Copy link
Contributor

Somewhat related here: We are forcing the User-Agent to be parsed here because we use TryGetValues. We should avoid this by using NonValidated here.

(I thought there was an issue on this but I couldn't find it...)

@karelz karelz added this to the 7.0.0 milestone Feb 17, 2022
@karelz karelz added tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed untriaged New issue has not been triaged by the area owner labels Feb 17, 2022
@karelz
Copy link
Member

karelz commented Feb 17, 2022

Triage: Impact on reliability with proxies, we should address it. Hopefully not too complex.

@MihaZupan
Copy link
Member Author

This has been mitigated by #68115 in 7.0 - you should only hit this issue if the request message's headers are modified after it was sent.

Moving to future.

@MihaZupan MihaZupan modified the milestones: 7.0.0, Future Jul 14, 2022
@Edgaras91

This comment was marked as off-topic.

@MihaZupan

This comment was marked as off-topic.

@Edgaras91

This comment was marked as off-topic.

@MihaZupan

This comment was marked as off-topic.

@Edgaras91

This comment was marked as off-topic.

@Edgaras91

This comment was marked as off-topic.

@MihaZupan
Copy link
Member Author

@Edgaras91 can you please open a new issue and share how you are making HTTP calls? Is this only occurring when using a proxy? If you have a reproducible example that would also help a ton to diagnose the issue. IIS does not impact HttpClient's behavior.

Seeing this sort of exception means multiple threads are accessing the header collection concurrently. There are a few known cases (NewRelic, CONNECT tunnels ...), but it's possible something else is happening in your scenario.

@madelson
Copy link
Contributor

Will this fix be backported to .NET 6?

@karelz
Copy link
Member

karelz commented Aug 2, 2022

@madelson can you please give us some numbers on how often it happens to you? How much it impacts your service? Potentially any business impact on your company or your customers.
We will need the info to justify backport into 6.0.

@karelz
Copy link
Member

karelz commented Aug 2, 2022

Triage: Given that we might need to backport the fix to 6.0 (per discussion above - pending impact details), we should fix it in 7.0 (although the impact on 7.0 is much lower than on 6.0) to have the fix ready for potential backport to 6.0.x and to avoid servicing 7.0 if we decide in future to service 6.0.

@karelz karelz removed this from the Future milestone Aug 2, 2022
@karelz karelz added this to the 7.0.0 milestone Aug 2, 2022
@MihaZupan
Copy link
Member Author

@madelson can you also please confirm whether the workaround mentioned in the top issue (accessing the User-Agent header before sending the request) fully resolves this issue for you (we expect it should)?

@madelson
Copy link
Contributor

madelson commented Aug 3, 2022

@MihaZupan thanks for following up. We've deployed the workaround and are actively monitoring. The error was transient, so we want to wait a week or so before declaring victory.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 3, 2022
@madelson
Copy link
Contributor

madelson commented Aug 6, 2022

@MihaZupan I still have to do some more digging to confirm early next week, but so far it looks like the workaround may not have fixed the issue for us.

@MihaZupan
Copy link
Member Author

Can you share how you implemented the workaround? We would expect that it is a 100% reliable fix for this specific issue.

If it's not, it's possible you are running into a different header issue (e.g. using outdated NewRelic) that results in similar issues.
Notably, you shouldn't see HttpConnectionPool.EstablishProxyTunnelAsync in the stack trace at all after the workaround.

@madelson
Copy link
Contributor

madelson commented Aug 8, 2022

@MihaZupan ok I think we have a false alarm; the error was in an environment that hadn't received the fix yet. I'll continue to monitor.

Here's what our implementation of the workaround looks like:

    var transport = new HttpClientTransport(new UserAgentAccessingHandler(new HttpClientHandler
    {
        Proxy = webProxy,
        MaxConnectionsPerServer = this._maxConnectionsPerServer.Value
    }));
    return new BlobServiceClient(uri, options: new() { Transport = transport });

#if NET6_0 // this will remind us to revisit this if we upgrade our .NET version!
    /// <summary>
    /// Workaround for .NET 6 concurrency bug (see https://github.com/dotnet/runtime/issues/65379).
    /// </summary>
    private sealed class UserAgentAccessingHandler : DelegatingHandler
    {
        private const string UserAgentHeader = "User-Agent";

        public UserAgentAccessingHandler(HttpMessageHandler innerHandler) : base(innerHandler) { }

        protected override HttpResponseMessage Send(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            _ = request.Headers.TryGetValues(UserAgentHeader, out _); // Force parsing

            return base.Send(request, cancellationToken);
        }

        protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            _ = request.Headers.TryGetValues(UserAgentHeader, out _); // Force parsing

            return base.SendAsync(request, cancellationToken);
        }
    }
#endif

I also just want to confirm that this is a flavor of the error that should be fixed here:

System.IndexOutOfRangeException: Index was outside the bounds of the array. 
at System.Net.Http.Headers.HttpHeaders.ReadStoreValues[T](Span`1 values, Object storeValue, HttpHeaderParser parser, Int32& currentIndex) 
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringOrStringArray(HeaderDescriptor descriptor, Object sourceValues, String& singleValue, String[]& multiValue) 
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringArray(HeaderDescriptor descriptor, HeaderStoreItemInfo info) at System.Net.Http.HttpConnectionPool.EstablishProxyTunnelAsync(Boolean async, HttpRequestHeaders headers, CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request) 
at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) 
at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) 
at System.Net.Http.HttpClient.g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) 
at Azure.Core.Pipeline.HttpClientTransport.ProcessAsync(HttpMessage message, Boolean async) 
at Azure.Core.Pipeline.HttpPipelineTransportPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline) 
at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.g__ProcessAsyncInner|4_0(HttpMessage message, ReadOnlyMemory`1 pipeline) 
at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async) 
at Azure.Storage.Blobs.BlobRestClient.DownloadAsync(String snapshot, String versionId, Nullable`1 timeout, String range, String leaseId, Nullable`1 rangeGetContentMD5, Nullable`1 rangeGetContentCRC64, String encryptionKey, String encryptionKeySha256, Nullable`1 encryptionAlgorithm, Nullable`1 ifModifiedSince, Nullable`1 ifUnmodifiedSince, String ifMatch, String ifNoneMatch, String ifTags, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.StartDownloadAsync(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, Int64 startOffset, Boolean async, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadStreamingInternal(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, IProgress`1 progressHandler, String operationName, Boolean async, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadInternal(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, Boolean async, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadAsync(HttpRange range, BlobRequestConditions conditions, Boolean rangeGetContentHash, CancellationToken cancellationToken) 
at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadAsync(CancellationToken cancellationToken) 
at Apt.Platform.DataImportWorker.Web.ImportExecution.Executors.AzureBlobPullExecutor.DownloadFileAsync(BlobContainerClient containerClient, String blobName, String localDirectory, Boolean overwriteExistingFiles, Boolean failOnFileExistFlag, Boolean deleteFilesFromRemoteContainer, CancellationToken cancellationToken)

@MihaZupan
Copy link
Member Author

The workaround looks good.
And yes, that stack trace should be the same issue.

@madelson
Copy link
Contributor

madelson commented Aug 11, 2022

EDIT: thought we had a conclusion here but still looking.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 12, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http bug tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants