Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize access to HttpContext #1313

Closed
SergeyKanzhelev opened this issue Mar 29, 2017 · 35 comments
Closed

Synchronize access to HttpContext #1313

SergeyKanzhelev opened this issue Mar 29, 2017 · 35 comments
Labels

Comments

@SergeyKanzhelev
Copy link
Contributor

HttpContext class is not thread safe. So accessing it from telemetry initializers on the different threads have to be synchronized. Today enabling Application Insights on highly async web site with many tasks running in parallel may cause failures or hangs.

000000aa`844d7fe0 00007ffc`97b8d704 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)+0x10c
000000aa`844d8050 00007ffc`39d0cb71 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].TryGetValue(System.__Canon, System.__Canon ByRef)+0x14
000000aa`844d8090 00007ffc`39d0cae6 Microsoft_AspNetCore_Http_Features!Microsoft.AspNetCore.Http.Features.FeatureCollection.get_Item(System.Type)+0x41
000000aa`844d80d0 00007ffc`39f1afd2 Microsoft_AspNetCore_Http_Features!Microsoft.AspNetCore.Http.Features.FeatureCollection.Get[[System.__Canon, mscorlib]]()+0x36
000000aa`844d8110 00007ffc`39f1ae38 Microsoft_AspNetCore_Http_Features!Microsoft.AspNetCore.Http.Features.FeatureReferences`1[[System.__Canon, mscorlib]].UpdateCached[[System.__Canon, mscorlib],[System.__Cano
non>, Int32, Boolean)+0x72
000000aa`844d8180 00007ffc`39f1a3f6 Microsoft_AspNetCore_Http!Microsoft.AspNetCore.Http.Features.RequestCookiesFeature.get_Cookies()+0xc8
@SergeyKanzhelev
Copy link
Contributor Author

cc: @pakrym @davidfowl

@SergeyKanzhelev
Copy link
Contributor Author

TL;DR; We have a few ideas how to make Application Insights safer, but to really fix the issue we need to have a guidance and/or API in ASP.NET for locking the HttpContext object access.

So the question is how to fix the issue without running into the problem of parallel access to the HttpContext between Application Insights, other tracing/telemetry providers and customer code. Basically we have three scenarios for Application Insights:

  1. Two telemetry items related to the same HttpContext being traced in parallel. Here we have two options:
    1. Context data collected by built in Telemetry Initializers
    2. Context data collected by customer's written Telemetry Initializer
  2. Telemetry item being traced in parallel to customer code that access HttpContext
  3. Telemetry item being traced in parallel to the other telemetry provider that access the HttpContext

@Tratcher recommended to lock on HttpContext itself and encourage developers to do the same every time they access HttpContext

@davidfowl points that the problem is that Application Insights collects data at unspecified time without clear indication of this work. So it's hard for the external developer to know that synchronization required.

Application Insights needs to access HttpContext to get values from the request cookies, request headers, url, etc as well as response status code. Application Insights is designed to run Telemetry Initializers synchronously when telemetry item is tracked so thread context can be associated with the telemetry item. So those potentially will run asynchronously.

However most of the built in telemetry initializers will try to get data from RequestTelemetry associated with the current request and only after that will attempt to read if from HttpContext. Since RequestTelemetry is thread safe - making sure we have prepopulated it in the very beginning and very end of request execution will minimize the chance of parallel access to HttpContext. We will still have a risk to have parallel access in the beginning and the end, but the chance of it will be less.

However we haven't eliminated the problem of customer's written telemetry initializers. We need to provide a guidance on how to write telemetry initializer that needs access to HttpContext to be thread safe and not conflict with the rest of application code as well as other tracing and telemetry providers.

@Tratcher I do not think that we can guide people to lock on HttpContext. It is not a good practice to lock on public object and it can lead to a deadlock. So we need to have some shared locking object for HttpContext. It doesn't make sense to keep it Application Insights specific, it should be part of ASP.NET. Do you have better suggestions to eliminate this problem completely?

@pakrym
Copy link
Contributor

pakrym commented Mar 29, 2017

@pakrym
Copy link
Contributor

pakrym commented Mar 29, 2017

Other issue is that retrieving RequestTelemetry from HttpContext is not thread safe, so even if we pre-cache values we need synchronize FeatureCollection access

@SergeyKanzhelev
Copy link
Contributor Author

@pakrym - there will be a code that reads it at first place to initialize the RequestTelemetry and code that populates response properties in the end of request. Yes, it will minimize the risk. But it will not solve the problem.

Also one may want to populate certain context properties on limited set of items (like dependencies) without updating the RequestTelemetry.

@SergeyKanzhelev
Copy link
Contributor Author

@pakrym this is a bummer! FeatureCollection is not thread safe as well?

@pakrym
Copy link
Contributor

pakrym commented Mar 29, 2017

@SergeyKanzhelev Yes, it's a Dictionary. Nothing in HttpContext is.

@SergeyKanzhelev
Copy link
Contributor Author

Do you have a proposal on how we can enable our scenarios in a safe manner? Should we implement our analog of IHttpContextAccessor that will store RequestTelemetry in AsyncLocal?

Even implementing of our own context propagation will keep the problem of parallel access to HttpContext. We cannot advice developers to lock on HttpContext or anything Application Insights specific. So there should be guidance from ASP.NET team.

@pakrym
Copy link
Contributor

pakrym commented Mar 29, 2017

We can register a scoped service in DI and store RequestTelemetry there, DI is thread safe.

@SergeyKanzhelev
Copy link
Contributor Author

This is what is was before 2.0, correct? Do you remember why we decided to switch to feature?

Any comments on what guidance to provide to the customers in terms of HttpContext synchronization in case they want a custom context being associated with the telemetry?

@pakrym
Copy link
Contributor

pakrym commented Mar 29, 2017

I think my idea was that we were already getting HttpContext so we can just use it and avoid going through DI. Not sure about exact guidance yet.

@davidfowl
Copy link

I'm going to take a look at this.

@davidfowl
Copy link

@SergeyKanzhelev based on @pakrym's suggestion, are you going to run telemetry initializers earlier in the pipeline?

@SergeyKanzhelev
Copy link
Contributor Author

@davidfowl we definitely will. And it will help with the standard data collection. Do you think all tracing providers will need to do the same? Read context in the beginning and never try to re-read it again to be on the safe side?

What about reading the response code? Is there a "safe" place to read it?

Now consider someone wants to add a telemetry initializer that appends the custom context property to all telemetry items processed by request. Let's say this property is calculated based on database call in the beginning of every controller. The logical thing is to store this context property in HttpContext so it will be associated with the current request. It is also logical to write a telemetry initializer that reads from HttpContext. And this initializer cannot run too early - the value of this property wasn't yet calculated. So we need to provide a guidance on how to implement this telemetry initializer so it will be safe to run.

Have you considered to at least make the features collection thread safe? I'm not sure it will help 100%.

@davidfowl
Copy link

Do you think all tracing providers will need to do the same? Read context in the beginning and never try to re-read it again to be on the safe side?

Beginning of the request might be too early in some cases. I wanted to suggest an API that takes a snapshot of the request at a particular time (a sorta Begin/End pattern).

[HttpGet]
public Task DoSomethingAsync()
{
      using (telemetryClient.BeginOperation())
      {
          var httpClient = new HttpClient();
          var urls = new [] { "http://www.facebook.com", "www.twitter.com" };
          var tasks = new Task[urls.Length];
          var i = 0;
          foreach (var url in urls)
          {
              tasks[i++] = DoRequest(url, httpClient, telemetryClient);
          }
          await Task.WhenAll(tasks);
      }
}

public Task DoHttpRequest(string url, HttpClient httpClient, TelemetryClient telemetryClient)
{
     try
     {
          await httpClient.GetAsync(url);
     }
     catch(Exception ex)
     {
        telemetryClient.TrackException(ex);
     }
}

Something like the above where the BeginOperation takes a snapshot of the request and runs the initializers, at which point TelemetryClient.Track* doesn't need to but captures other data. Would something like that be feasible?

Have you considered to at least make the features collection thread safe? I'm not sure it will help 100%.

Yes, I spent last week looking at what it would take to make the feature collection thread safe. It doesn't solve the problem fully and is the only thing we could realistically do. It's also something that can be implemented in each server so it can't be centralized. The reason it happens so often in this particular case is because WebListener uses a dictionary, Kestrel on the other hand implements it's own feature collection by using fields for known types and a list for unknown features.

Even after doing that, features themselves may not be thread safe so the guarantees are very weak. We can say that setting or getting a feature from the collection is safe (by hunting down our implementations and making them atomic) but after getting the feature there's no guarantee it is safe.

@bragma
Copy link

bragma commented May 23, 2017

It seems a serious problem to me. I am using AI in production but have not switched to package 2.x yet. Do you suggest to wait until this is solved?

@DamianEdwards
Copy link
Member

DamianEdwards commented Jun 23, 2017

@dnduffy @SergeyKanzhelev @davidfowl @pakrym do we need to meet about this? If we need to make changes in ASP.NET Core to help address this we're running out of time very fast for 2.0

@DamianEdwards
Copy link
Member

@glennc

@dnduffy
Copy link
Member

dnduffy commented Jun 27, 2017

@DamianEdwards I tried to IM you but just missed you I think, I'll read through this and refresh myself on the proposals in the thread.

@dnduffy
Copy link
Member

dnduffy commented Jun 27, 2017

Having read everything over I'm drawn to the discussion about why it was switched to HttpContext in the first place, that sounds like a decision to revisit that has the least impact. @pakrym @SergeyKanzhelev @davidfowl If there was a working DI based approach previously why don't we simply revert to that approach?

@davidfowl It sounds to me like the feature collection ought to be thread safe even if that isn't the end-all-be-all solution for AppInsights.

What do you guys think is the least impact vs. best long term solution?

@andrasg
Copy link

andrasg commented Sep 8, 2017

@SergeyKanzhelev, @dnduffy I think the reason why DI was not preferred for storing RequestTelemetry is that the request scoped DI container gets destroyed (request ended) before all telemetry information is processed. At least that's what I remember from my experimentations with telemetryinitializers.

@kstreith
Copy link

What is the status of this? We have held off upgrading because of the poor performance associated with the lock.

@allevyMS
Copy link

Also interested in the status of this, its blocking us from using app insights instrumentation. Locking the context is not an option for us.

@davidfowl
Copy link

Why is that?

@allevyMS
Copy link

I am afraid of performance issues at scale, with a high volume of requests using a lock on an unknown amount of telemetry calls scares me

@kstreith
Copy link

I concur with this. We are very happy with performance pre-lock. Going from pre-lock to lock, we saw requests per second served by a single box drop to 1/3 of the original requests per second we could serve on a box. Put another way, the lock cost us 2/3rds of the requests per second we could get out of a single box. We did this performance testing many months ago. If needed I could try and find time this week to re-create the results.

@davidfowl
Copy link

This lock is per request. I'd like to see that performance data. How many concurrent operations did you have access the http context in parallel that it could cause that level of regression....

@kstreith
Copy link

I re-ran our tests seeing about a 40% drop, vs my original assertion of 66% drop in requests per second. However, I also looked at latency reported from our load test tool (locust.io). We are getting around 290ms at 50% percentile with 2.0.0 of the library. If we rev to 2.1.1 with the lock, we are getting around 2200ms at 50% percentile, so around a 10x increase in latency. That holds up at 95% percentile, with 630ms before and 9100ms after, so still at least a 10x increase in latency. The only change there is the upgrade of the application insights library to 2.1.1 (e.g. the lock). I am working on creating a file new project in VS that re-creates the problem. I will post when I'm able to get a small reproduction case working that I can push publicly.

@davidfowl
Copy link

So the lock is here https://github.com/Microsoft/ApplicationInsights-aspnetcore/blob/ea2f64f4bbf1e1415161627c019618d0fa18ba44/src/Microsoft.ApplicationInsights.AspNetCore/TelemetryInitializers/TelemetryInitializerBase.cs#L36.

@kstreith Maybe the lock is too big? Or maybe there are slow TelemetryInitializers in the pipeline? Are you making lots of outbound requests in parallel in your application?

@kstreith
Copy link

Our app does use HttpClient to make 1-3 outbound requests per inbound request directed to us. We do those calls async but we await before starting the next one.

@kstreith
Copy link

I ran ./wrk -t10 -c400 -d60s against our application. Here are the results when using v2.0.0 of the application insights asp.net core library:

Running 1m test @ https://redacted/
10 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 588.65ms 465.83ms 2.00s 82.06%
Req/Sec 27.55 19.52 168.00 78.75%
14888 requests in 1.00m, 23.58MB read
Socket errors: connect 0, read 3, write 0, timeout 1947
Requests/sec: 247.74
Transfer/sec: 401.85KB

If I all I do to our application is rev the application insights library from 2.0.0 to 2.1.1, I now get the following performance with the same wrk command:

Running 1m test @ https://redacted/
10 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.10s 455.00ms 2.00s 67.81%
Req/Sec 13.75 8.90 60.00 75.20%
6883 requests in 1.00m, 11.28MB read
Socket errors: connect 0, read 0, write 0, timeout 6386
Requests/sec: 114.53
Transfer/sec: 192.28KB

So, we can see the request per seconds drops in half and the latency doubles. You can't see it here, but I use the live stream feature of application insights and in the 2.0.0, I'm using 125% CPU in the livestream. When using 2.1.1 of the library, I'm only using 52% of the CPU. So, just upgrading the library from 2.0.0 to 2.1.1 I see significant increase in latency, lower RPS and lower CPU utilization. I also tried running wrk with only -t1 with the 2.1.1 library and I see the exact same results. I appreciate the responses @davidfowl , any thoughts?

@benaadams
Copy link

benaadams commented Jul 25, 2018

What happens if you try 2.4.0-beta3 ? microsoft/ApplicationInsights-aspnetcore#690 (comment)

@kstreith
Copy link

@benaadams With 2.4.0-beta3, performance is still bad, basically equivalent to 2.1.1

Running 1m test @ https://redacted
10 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.51s 428.32ms 2.00s 74.48%
Req/Sec 14.26 9.09 70.00 73.13%
7249 requests in 1.00m, 11.89MB read
Socket errors: connect 0, read 0, write 0, timeout 5819
Requests/sec: 120.61
Transfer/sec: 202.54KB

@cijothomas cijothomas transferred this issue from microsoft/ApplicationInsights-aspnetcore Nov 15, 2019
@github-actions
Copy link

This issue is stale because it has been open 300 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions
Copy link

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests