Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Extensions.Http.Polly: Deadlock when retrying a "503 - Service unavailable" #28384

Closed
marcpaulv opened this issue Aug 3, 2020 · 12 comments · Fixed by #36162
Closed
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-httpclientfactory Includes: HttpClientFactory (some bugs also in Extensions repo) investigate
Milestone

Comments

@marcpaulv
Copy link

Description:

When the Application Pool of an IIS hosted website stops for whatever reason (in my case i stopped it manually), IIS returns a 503 - Service unavailable status code and adds a Connection: close response header.

When retrying such a request with the retry policies (the below is with WaitAndRetryAsync), the first 2 retries are returning the same status (503) but the third retry is causing the application to stop responding (most probably a deadlock happens).

Steps to reproduce

  1. Copy the below code into a .Net framework 4.7.2 console application
  2. Import the following Nuget packages
  <PackageReference Include="Microsoft.Extensions.DependencyInjection">
      <Version>2.1.1</Version>
    </PackageReference>
    <PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions">
      <Version>2.1.1</Version>
    </PackageReference>
    <PackageReference Include="Microsoft.Extensions.Http">
      <Version>2.1.1</Version>
    </PackageReference>
    <PackageReference Include="Microsoft.Extensions.Http.Polly">
      <Version>2.1.1</Version>
    </PackageReference>
    <PackageReference Include="Polly">
      <Version>7.0.3</Version>
    </PackageReference>
  1. Replace URLPlaceholder with a URL pointing to an IIS hosted API hosted on the network (not on localhost)
  2. After receiving a couple of 200 OK requests, manually Stop the Application pool of the IIS hosted API
  3. After receiving a couple of 503 Service Unavailable responses, manually Start the Application pool of the IIS hosted API

Expected behavior

The application should start receiving 200 OK responses.

Actual behaviour

The application hangs after the first 2 retries of the first 503 Service unavailable response.

Note that if the WaitAndRetryAsync policy addition is commented out, the application behaves as expected (without the retries of course).

Code to reproduce the problem

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http;
using Polly;
using Polly.Timeout;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Net.Sockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApp2
{
    class Program
    {
        async static Task Main(string[] args)
        {
            IServiceCollection services = new ServiceCollection();
            services.AddHttpClient("test")
              // comment out the below policy for a correct behavior
               .AddTransientHttpErrorPolicy(builder => builder.WaitAndRetryAsync(new[]{
               TimeSpan.FromSeconds(1),
               TimeSpan.FromSeconds(2),
               TimeSpan.FromSeconds(4),
               }));
  
              // removing the logging handlers as a work around for https://github.com/aspnet/Extensions/issues/563
            ServiceDescriptor loggingHandler = services.FirstOrDefault(e => e.ServiceType == typeof(IHttpMessageHandlerBuilderFilter));
            if (loggingHandler != null)
            {
                services.Remove(loggingHandler);
            }

            IHttpClientFactory factory = services.BuildServiceProvider().GetService<IHttpClientFactory>();

            HttpClient client = factory.CreateClient("test");

            while (true)
            {
                HttpResponseMessage response = null;
                try
                {
                    response = await client.GetAsync("URLplaceholder").ConfigureAwait(false);
                }
                catch (Exception e)
                {
                    // logging
                }

                Thread.Sleep(5000);
            }
        }
    }
}

Observations

  1. With only one retry for the request, the behavior is as expected.
  2. The problem does not seem to be in the Polly library, as the same request executed through a ExecutyPolicyAsync (with the .AddTransientHttpErrorPolicy commented out) is working as expected:
response = await Policy
                .HandleResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.ServiceUnavailable)
                .WaitAndRetryAsync(
                new[]{
                 TimeSpan.FromSeconds(1),
                 TimeSpan.FromSeconds(2),
                 TimeSpan.FromSeconds(4),
                 }).ExecuteAsync(() => client.GetAsync(url)).ConfigureAwait(false);
@marcpaulv marcpaulv changed the title Microsoft.Extensions.Http.Polly: App hang when retrying a "503 - Service unavailable" Microsoft.Extensions.Http.Polly: Deadlock when retrying a "503 - Service unavailable" Aug 3, 2020
@marcpaulv
Copy link
Author

I ve also tryied using the latest package versions (3.1.6) and the problem persists

@marcpaulv
Copy link
Author

The problem also reproduces for 500 - Internal Server Error responses.

Can this be escalated somehow? A week passed and I did not get any valuable feedback yet :)

@marcpaulv
Copy link
Author

marcpaulv commented Aug 17, 2020

I managed to progress in finding the cause of the issue: the connections to the remote server aren't closing; for each retry a new connection is created and the reason the application hangs on the second retry is due to the ServicePointmanager.DefaultConnectionLimit which defaults to 2 and HttpClientHandler uses that for MaxConnectionsPerServer .

Increasing MaxConnectionsPerServer does not solve the problem, it only postpones the inevitable.
image

So back to almost square 1.

Further google ing the problem I found a post suggesting to abort the underlying HttpWebRequest.
Surprisingly, by calling HttpResponseMessage.Dispose() on the response received on each retry, the underlying connection closes.
image

Oddly enough, the disposal is not needed for the second scenario (Observation no 2), where there is no additional DelegatingHandler executing the retry policy.

So the big unanswered question here is: Why/how is the additional DelegatingHandler causing the connections to remain open, needing an explicit HttpResponseMessage.Dispose() ?

@ericsuhong
Copy link

@marcpaulv I am personally curious about this issue, can you share your reproduce code that is used for MaxConnectionsPerServer example?

@marcpaulv
Copy link
Author

marcpaulv commented Aug 27, 2020

@marcpaulv I am personally curious about this issue, can you share your reproduce code that is used for MaxConnectionsPerServer example?

private static string urlString = "http://123123123"; // replace this with your own
        
        async static Task Main(string[] args)
        {
            ServicePoint sp = ServicePointManager.FindServicePoint(new Uri(urlString));

            IServiceCollection services = new ServiceCollection();
            services.AddHttpClient("test")
               // comment out the below policy for a correct behavior
               .AddTransientHttpErrorPolicy(builder => builder.WaitAndRetryAsync(new[]{
               TimeSpan.FromSeconds(1),
               TimeSpan.FromSeconds(2),
               TimeSpan.FromSeconds(4),
               }, onRetry: (response, timespan) =>
               {
                   Console.WriteLine($"Retry: {timespan}, HttpStatusCode={response?.Result.StatusCode},ServicePoint={sp.GetHashCode()} Connections={sp.CurrentConnections}");

                   // uncomment below for workaround to issue
                   //Console.WriteLine("DIsposing HttpresponseMessage");
                   //response?.Result.Dispose();
               }))
                 .ConfigurePrimaryHttpMessageHandler(_ =>
                  {
                      Console.WriteLine($"Setting MaxConnectionsPerServer to 10");
                      var h = new HttpClientHandler
                      {
                          MaxConnectionsPerServer = 10,
                      };
                      return h;
                  });

            // removing the logging handlers as a work around for https://github.com/aspnet/Extensions/issues/563
            ServiceDescriptor loggingHandler = services.FirstOrDefault(e => e.ServiceType == typeof(IHttpMessageHandlerBuilderFilter));
            if (loggingHandler != null)
            {
                services.Remove(loggingHandler);
            }

            IHttpClientFactory factory = services.BuildServiceProvider().GetService<IHttpClientFactory>();

            HttpClient client = factory.CreateClient("test");
            Console.WriteLine("Created client");
            while (true)
            {
                Console.WriteLine($"Sending request");
                HttpResponseMessage response = null;
                try
                {
                    response = await client.GetAsync(urlString).ConfigureAwait(false);
                    Console.WriteLine($"HttpStatusCode={response?.StatusCode}, Connections={sp.CurrentConnections}");
                }
                catch (Exception e)
                {
                    Console.WriteLine($"Error={e.Message}, Connections={sp.CurrentConnections}");
                }

                Thread.Sleep(1000);
            }
        }

@Dragonsangel
Copy link

I would like to add myself to this issue as we are experiencing the same issue, but from an entirely different direction.
We use the WaitAndRetryAsync-Policy on some of our typed HttpClients and started noticing that we were getting socket exhaustion on some of our servers.
It seems that when an 5xx-Error is returned, the retry mechanism opens up a new connection for each retry. These connections also persist and are never closed from our side.
Since we do not have a custom MaxConnectionsPerServer, we are not explicitly running into the deadlock issue, but the core of the problem remains the same.

I will try it out with the httpMessageResponse.Dispose() to see if that eliminates the problem, but would rather help in resolving the issue.

For this I created a self contained testing solution, it can be found here: https://github.com/Dragonsangel/PollyConnectionsRemainOpen
Reproduction instructions are in the readme.md

@Tratcher
Copy link
Member

So the big unanswered question here is: Why/how is the additional DelegatingHandler causing the connections to remain open, needing an explicit HttpResponseMessage.Dispose() ?

When you're working with HttpClient it buffers all responses by default (HttpContent.LoadIntoBufferAsync) and this frees up the underlying connection for the next request. However, when you're working inside the HttpClient with a DelegatingHandler then all responses are in a streaming state and if you don't drain or dispose that stream then you leak an active connection.

@JunTaoLuo JunTaoLuo transferred this issue from dotnet/extensions Dec 3, 2020
@JunTaoLuo JunTaoLuo added investigate feature-httpclientfactory Includes: HttpClientFactory (some bugs also in Extensions repo) labels Dec 3, 2020
@JunTaoLuo JunTaoLuo added this to the Next sprint planning milestone Dec 3, 2020
@ghost
Copy link

ghost commented Dec 3, 2020

Thanks for contacting us.
We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@Tratcher
Copy link
Member

Tratcher commented Sep 3, 2021

Notes:

  • The HttpResponseMessage needs to be buffered, drained, or disposed if you're going to retry the request, otherwise you leak connections.
  • Polly is a generic policy library, it's not aware of HttpRequest/ResponseMessage lifetimes or the need to dispose them. E.g. WaitAndRetryAsync can't automatically dispose the prior result. Polly.Extensions.Http is aware of HttpRequest/ResponseMessage but doesn't participate in the retry, only the decision on if a retry is recommended.
  • onRetry => response?.Result.Dispose(); is a legitimate solution.
  • Is there a way for PolicyHttpRequestHandler to see that a retry is happening and dispose the prior result? It's only passed Polly's Context, but that doesn't seem to contain the prior result.
    response = await policy.ExecuteAsync((c, ct) => SendCoreAsync(request, c, ct), context, cancellationToken).ConfigureAwait(false);

    @reisenberger any suggestions on how PolicyHttpRequestHandler can get access to the prior HttpResponseMessage result when someone uses WaitAndRetryAsync?

@martincostello
Copy link
Member

(@)reisenberger any suggestions on how PolicyHttpRequestHandler can get access to the prior HttpResponseMessage result when someone uses WaitAndRetryAsync?

@Tratcher Nice find. Dylan's been away from Polly for a while (App-vNext/Polly#798) and isn't getting GitHub notifications. Feel free to ping me if anything Polly-related pops up in the future, and I'll see if I can help out.

@Tratcher
Copy link
Member

Tratcher commented Sep 9, 2021

Thanks Martin, what do you think of the fix?

@martincostello
Copy link
Member

I had a look and it seemed reasonable to me.

I'd maybe have been 60:40 as to whether I'd have used HttpRequest.Properties vs. Context's own implementation of IDictionary to store the previous response, but as the HTTP pipeline is adding it it's probably best to keep it on the message and keep it "HTTP" rather than use the Context to make it "Polly".

wtgodbe pushed a commit that referenced this issue Sep 9, 2021
Co-authored-by: Chris R <Tratcher@outlook.com>
@ghost ghost locked as resolved and limited conversation to collaborators Oct 9, 2021
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-httpclientfactory Includes: HttpClientFactory (some bugs also in Extensions repo) investigate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants