Synchronize access to HttpContext #1313

SergeyKanzhelev · 2017-03-29T16:55:55Z

HttpContext class is not thread safe. So accessing it from telemetry initializers on the different threads have to be synchronized. Today enabling Application Insights on highly async web site with many tasks running in parallel may cause failures or hangs.

000000aa`844d7fe0 00007ffc`97b8d704 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)+0x10c
000000aa`844d8050 00007ffc`39d0cb71 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].TryGetValue(System.__Canon, System.__Canon ByRef)+0x14
000000aa`844d8090 00007ffc`39d0cae6 Microsoft_AspNetCore_Http_Features!Microsoft.AspNetCore.Http.Features.FeatureCollection.get_Item(System.Type)+0x41
000000aa`844d80d0 00007ffc`39f1afd2 Microsoft_AspNetCore_Http_Features!Microsoft.AspNetCore.Http.Features.FeatureCollection.Get[[System.__Canon, mscorlib]]()+0x36
000000aa`844d8110 00007ffc`39f1ae38 Microsoft_AspNetCore_Http_Features!Microsoft.AspNetCore.Http.Features.FeatureReferences`1[[System.__Canon, mscorlib]].UpdateCached[[System.__Canon, mscorlib],[System.__Cano
non>, Int32, Boolean)+0x72
000000aa`844d8180 00007ffc`39f1a3f6 Microsoft_AspNetCore_Http!Microsoft.AspNetCore.Http.Features.RequestCookiesFeature.get_Cookies()+0xc8

The text was updated successfully, but these errors were encountered:

SergeyKanzhelev · 2017-03-29T16:56:09Z

cc: @pakrym @davidfowl

SergeyKanzhelev · 2017-03-29T19:09:33Z

TL;DR; We have a few ideas how to make Application Insights safer, but to really fix the issue we need to have a guidance and/or API in ASP.NET for locking the HttpContext object access.

So the question is how to fix the issue without running into the problem of parallel access to the HttpContext between Application Insights, other tracing/telemetry providers and customer code. Basically we have three scenarios for Application Insights:

Two telemetry items related to the same HttpContext being traced in parallel. Here we have two options:
1. Context data collected by built in Telemetry Initializers
2. Context data collected by customer's written Telemetry Initializer
Telemetry item being traced in parallel to customer code that access HttpContext
Telemetry item being traced in parallel to the other telemetry provider that access the HttpContext

@Tratcher recommended to lock on HttpContext itself and encourage developers to do the same every time they access HttpContext

@davidfowl points that the problem is that Application Insights collects data at unspecified time without clear indication of this work. So it's hard for the external developer to know that synchronization required.

Application Insights needs to access HttpContext to get values from the request cookies, request headers, url, etc as well as response status code. Application Insights is designed to run Telemetry Initializers synchronously when telemetry item is tracked so thread context can be associated with the telemetry item. So those potentially will run asynchronously.

However most of the built in telemetry initializers will try to get data from RequestTelemetry associated with the current request and only after that will attempt to read if from HttpContext. Since RequestTelemetry is thread safe - making sure we have prepopulated it in the very beginning and very end of request execution will minimize the chance of parallel access to HttpContext. We will still have a risk to have parallel access in the beginning and the end, but the chance of it will be less.

However we haven't eliminated the problem of customer's written telemetry initializers. We need to provide a guidance on how to write telemetry initializer that needs access to HttpContext to be thread safe and not conflict with the rest of application code as well as other tracing and telemetry providers.

@Tratcher I do not think that we can guide people to lock on HttpContext. It is not a good practice to lock on public object and it can lead to a deadlock. So we need to have some shared locking object for HttpContext. It doesn't make sense to keep it Application Insights specific, it should be part of ASP.NET. Do you have better suggestions to eliminate this problem completely?

pakrym · 2017-03-29T19:13:26Z

Storing values to RequestTelemetry in https://github.com/Microsoft/ApplicationInsights-aspnetcore/blob/develop/src/Microsoft.ApplicationInsights.AspNetCore/DiagnosticListeners/Implementation/HostingDiagnosticListener.cs#L38 would make sure there is no other code running in parallel.

pakrym · 2017-03-29T19:17:55Z

Other issue is that retrieving RequestTelemetry from HttpContext is not thread safe, so even if we pre-cache values we need synchronize FeatureCollection access

SergeyKanzhelev · 2017-03-29T19:18:24Z

@pakrym - there will be a code that reads it at first place to initialize the RequestTelemetry and code that populates response properties in the end of request. Yes, it will minimize the risk. But it will not solve the problem.

Also one may want to populate certain context properties on limited set of items (like dependencies) without updating the RequestTelemetry.

SergeyKanzhelev · 2017-03-29T19:19:08Z

@pakrym this is a bummer! FeatureCollection is not thread safe as well?

pakrym · 2017-03-29T19:23:17Z

@SergeyKanzhelev Yes, it's a Dictionary. Nothing in HttpContext is.

SergeyKanzhelev · 2017-03-29T19:32:14Z

Do you have a proposal on how we can enable our scenarios in a safe manner? Should we implement our analog of IHttpContextAccessor that will store RequestTelemetry in AsyncLocal?

Even implementing of our own context propagation will keep the problem of parallel access to HttpContext. We cannot advice developers to lock on HttpContext or anything Application Insights specific. So there should be guidance from ASP.NET team.

pakrym · 2017-03-29T20:23:30Z

We can register a scoped service in DI and store RequestTelemetry there, DI is thread safe.

SergeyKanzhelev · 2017-03-29T20:29:09Z

This is what is was before 2.0, correct? Do you remember why we decided to switch to feature?

Any comments on what guidance to provide to the customers in terms of HttpContext synchronization in case they want a custom context being associated with the telemetry?

pakrym · 2017-03-29T20:49:25Z

I think my idea was that we were already getting HttpContext so we can just use it and avoid going through DI. Not sure about exact guidance yet.

davidfowl · 2017-03-31T16:08:44Z

I'm going to take a look at this.

davidfowl · 2017-04-04T16:22:51Z

@SergeyKanzhelev based on @pakrym's suggestion, are you going to run telemetry initializers earlier in the pipeline?

SergeyKanzhelev · 2017-04-04T17:19:14Z

@davidfowl we definitely will. And it will help with the standard data collection. Do you think all tracing providers will need to do the same? Read context in the beginning and never try to re-read it again to be on the safe side?

What about reading the response code? Is there a "safe" place to read it?

Now consider someone wants to add a telemetry initializer that appends the custom context property to all telemetry items processed by request. Let's say this property is calculated based on database call in the beginning of every controller. The logical thing is to store this context property in HttpContext so it will be associated with the current request. It is also logical to write a telemetry initializer that reads from HttpContext. And this initializer cannot run too early - the value of this property wasn't yet calculated. So we need to provide a guidance on how to implement this telemetry initializer so it will be safe to run.

Have you considered to at least make the features collection thread safe? I'm not sure it will help 100%.

davidfowl · 2017-04-04T17:38:01Z

Do you think all tracing providers will need to do the same? Read context in the beginning and never try to re-read it again to be on the safe side?

Beginning of the request might be too early in some cases. I wanted to suggest an API that takes a snapshot of the request at a particular time (a sorta Begin/End pattern).

[HttpGet]
public Task DoSomethingAsync()
{
      using (telemetryClient.BeginOperation())
      {
          var httpClient = new HttpClient();
          var urls = new [] { "http://www.facebook.com", "www.twitter.com" };
          var tasks = new Task[urls.Length];
          var i = 0;
          foreach (var url in urls)
          {
              tasks[i++] = DoRequest(url, httpClient, telemetryClient);
          }
          await Task.WhenAll(tasks);
      }
}

public Task DoHttpRequest(string url, HttpClient httpClient, TelemetryClient telemetryClient)
{
     try
     {
          await httpClient.GetAsync(url);
     }
     catch(Exception ex)
     {
        telemetryClient.TrackException(ex);
     }
}

Something like the above where the BeginOperation takes a snapshot of the request and runs the initializers, at which point TelemetryClient.Track* doesn't need to but captures other data. Would something like that be feasible?

Have you considered to at least make the features collection thread safe? I'm not sure it will help 100%.

Yes, I spent last week looking at what it would take to make the feature collection thread safe. It doesn't solve the problem fully and is the only thing we could realistically do. It's also something that can be implemented in each server so it can't be centralized. The reason it happens so often in this particular case is because WebListener uses a dictionary, Kestrel on the other hand implements it's own feature collection by using fields for known types and a list for unknown features.

Even after doing that, features themselves may not be thread safe so the guarantees are very weak. We can say that setting or getting a feature from the collection is safe (by hunting down our implementations and making them atomic) but after getting the feature there's no guarantee it is safe.

bragma · 2017-05-23T10:52:10Z

It seems a serious problem to me. I am using AI in production but have not switched to package 2.x yet. Do you suggest to wait until this is solved?

DamianEdwards · 2017-06-23T23:14:28Z

@dnduffy @SergeyKanzhelev @davidfowl @pakrym do we need to meet about this? If we need to make changes in ASP.NET Core to help address this we're running out of time very fast for 2.0

DamianEdwards · 2017-06-27T00:27:00Z

@glennc

dnduffy · 2017-06-27T02:31:51Z

@DamianEdwards I tried to IM you but just missed you I think, I'll read through this and refresh myself on the proposals in the thread.

dnduffy · 2017-06-27T06:18:26Z

Having read everything over I'm drawn to the discussion about why it was switched to HttpContext in the first place, that sounds like a decision to revisit that has the least impact. @pakrym @SergeyKanzhelev @davidfowl If there was a working DI based approach previously why don't we simply revert to that approach?

@davidfowl It sounds to me like the feature collection ought to be thread safe even if that isn't the end-all-be-all solution for AppInsights.

What do you guys think is the least impact vs. best long term solution?

andrasg · 2017-09-08T12:21:26Z

@SergeyKanzhelev, @dnduffy I think the reason why DI was not preferred for storing RequestTelemetry is that the request scoped DI container gets destroyed (request ended) before all telemetry information is processed. At least that's what I remember from my experimentations with telemetryinitializers.

kstreith · 2018-05-18T23:33:01Z

What is the status of this? We have held off upgrading because of the poor performance associated with the lock.

allevyMS · 2018-07-20T22:26:54Z

Also interested in the status of this, its blocking us from using app insights instrumentation. Locking the context is not an option for us.

davidfowl · 2018-07-21T04:23:54Z

Why is that?

allevyMS · 2018-07-23T23:27:57Z

I am afraid of performance issues at scale, with a high volume of requests using a lock on an unknown amount of telemetry calls scares me

kstreith · 2018-07-23T23:40:42Z

I concur with this. We are very happy with performance pre-lock. Going from pre-lock to lock, we saw requests per second served by a single box drop to 1/3 of the original requests per second we could serve on a box. Put another way, the lock cost us 2/3rds of the requests per second we could get out of a single box. We did this performance testing many months ago. If needed I could try and find time this week to re-create the results.

davidfowl · 2018-07-24T05:49:24Z

This lock is per request. I'd like to see that performance data. How many concurrent operations did you have access the http context in parallel that it could cause that level of regression....

kstreith · 2018-07-24T20:07:55Z

I re-ran our tests seeing about a 40% drop, vs my original assertion of 66% drop in requests per second. However, I also looked at latency reported from our load test tool (locust.io). We are getting around 290ms at 50% percentile with 2.0.0 of the library. If we rev to 2.1.1 with the lock, we are getting around 2200ms at 50% percentile, so around a 10x increase in latency. That holds up at 95% percentile, with 630ms before and 9100ms after, so still at least a 10x increase in latency. The only change there is the upgrade of the application insights library to 2.1.1 (e.g. the lock). I am working on creating a file new project in VS that re-creates the problem. I will post when I'm able to get a small reproduction case working that I can push publicly.

davidfowl · 2018-07-24T20:12:14Z

So the lock is here https://github.com/Microsoft/ApplicationInsights-aspnetcore/blob/ea2f64f4bbf1e1415161627c019618d0fa18ba44/src/Microsoft.ApplicationInsights.AspNetCore/TelemetryInitializers/TelemetryInitializerBase.cs#L36.

@kstreith Maybe the lock is too big? Or maybe there are slow TelemetryInitializers in the pipeline? Are you making lots of outbound requests in parallel in your application?

kstreith · 2018-07-24T20:46:26Z

Our app does use HttpClient to make 1-3 outbound requests per inbound request directed to us. We do those calls async but we await before starting the next one.

kstreith · 2018-07-25T21:09:09Z

I ran ./wrk -t10 -c400 -d60s against our application. Here are the results when using v2.0.0 of the application insights asp.net core library:

Running 1m test @ https://redacted/
10 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 588.65ms 465.83ms 2.00s 82.06%
Req/Sec 27.55 19.52 168.00 78.75%
14888 requests in 1.00m, 23.58MB read
Socket errors: connect 0, read 3, write 0, timeout 1947
Requests/sec: 247.74
Transfer/sec: 401.85KB

If I all I do to our application is rev the application insights library from 2.0.0 to 2.1.1, I now get the following performance with the same wrk command:

Running 1m test @ https://redacted/
10 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.10s 455.00ms 2.00s 67.81%
Req/Sec 13.75 8.90 60.00 75.20%
6883 requests in 1.00m, 11.28MB read
Socket errors: connect 0, read 0, write 0, timeout 6386
Requests/sec: 114.53
Transfer/sec: 192.28KB

So, we can see the request per seconds drops in half and the latency doubles. You can't see it here, but I use the live stream feature of application insights and in the 2.0.0, I'm using 125% CPU in the livestream. When using 2.1.1 of the library, I'm only using 52% of the CPU. So, just upgrading the library from 2.0.0 to 2.1.1 I see significant increase in latency, lower RPS and lower CPU utilization. I also tried running wrk with only -t1 with the 2.1.1 library and I see the exact same results. I appreciate the responses @davidfowl , any thoughts?

benaadams · 2018-07-25T21:18:52Z

What happens if you try 2.4.0-beta3 ? microsoft/ApplicationInsights-aspnetcore#690 (comment)

kstreith · 2018-07-26T13:36:28Z

@benaadams With 2.4.0-beta3, performance is still bad, basically equivalent to 2.1.1

Running 1m test @ https://redacted
10 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.51s 428.32ms 2.00s 74.48%
Req/Sec 14.26 9.09 70.00 73.13%
7249 requests in 1.00m, 11.89MB read
Socket errors: connect 0, read 0, write 0, timeout 5819
Requests/sec: 120.61
Transfer/sec: 202.54KB

github-actions · 2021-09-25T00:04:12Z

This issue is stale because it has been open 300 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2022-07-31T00:09:45Z

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

cijothomas transferred this issue from microsoft/ApplicationInsights-aspnetcore Nov 15, 2019

github-actions bot added the stale label Sep 25, 2021

github-actions bot closed this as completed Oct 3, 2021

cijothomas removed the stale label Oct 3, 2021

cijothomas reopened this Oct 3, 2021

This was referenced Mar 9, 2022

How do I inject IHttpContextAccessor into custom TelemetryInitializer , from a 4.8 framework web application? #2551

Closed

Incorrect example TelemetryInitializer MicrosoftDocs/azure-docs#89467

Closed

github-actions bot added the stale label Jul 31, 2022

github-actions bot closed this as completed Aug 8, 2022

Synchronize access to HttpContext #1313

Synchronize access to HttpContext #1313

Comments

SergeyKanzhelev commented Mar 29, 2017

SergeyKanzhelev commented Mar 29, 2017

SergeyKanzhelev commented Mar 29, 2017

pakrym commented Mar 29, 2017

pakrym commented Mar 29, 2017

SergeyKanzhelev commented Mar 29, 2017

SergeyKanzhelev commented Mar 29, 2017

pakrym commented Mar 29, 2017

SergeyKanzhelev commented Mar 29, 2017

pakrym commented Mar 29, 2017

SergeyKanzhelev commented Mar 29, 2017

pakrym commented Mar 29, 2017

davidfowl commented Mar 31, 2017

davidfowl commented Apr 4, 2017

SergeyKanzhelev commented Apr 4, 2017

davidfowl commented Apr 4, 2017

bragma commented May 23, 2017

DamianEdwards commented Jun 23, 2017 • edited Loading

DamianEdwards commented Jun 27, 2017

dnduffy commented Jun 27, 2017

dnduffy commented Jun 27, 2017

andrasg commented Sep 8, 2017

kstreith commented May 18, 2018

allevyMS commented Jul 20, 2018

davidfowl commented Jul 21, 2018

allevyMS commented Jul 23, 2018

kstreith commented Jul 23, 2018

davidfowl commented Jul 24, 2018

kstreith commented Jul 24, 2018

davidfowl commented Jul 24, 2018

kstreith commented Jul 24, 2018

kstreith commented Jul 25, 2018

benaadams commented Jul 25, 2018 • edited Loading

kstreith commented Jul 26, 2018

github-actions bot commented Sep 25, 2021

github-actions bot commented Jul 31, 2022

DamianEdwards commented Jun 23, 2017 •

edited

Loading

benaadams commented Jul 25, 2018 •

edited

Loading