-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronize access to HttpContext #1313
Comments
cc: @pakrym @davidfowl |
TL;DR; We have a few ideas how to make Application Insights safer, but to really fix the issue we need to have a guidance and/or API in ASP.NET for locking the HttpContext object access. So the question is how to fix the issue without running into the problem of parallel access to the
@Tratcher recommended to lock on @davidfowl points that the problem is that Application Insights collects data at unspecified time without clear indication of this work. So it's hard for the external developer to know that synchronization required. Application Insights needs to access However most of the built in telemetry initializers will try to get data from However we haven't eliminated the problem of customer's written telemetry initializers. We need to provide a guidance on how to write telemetry initializer that needs access to @Tratcher I do not think that we can guide people to lock on |
Storing values to |
Other issue is that retrieving |
@pakrym - there will be a code that reads it at first place to initialize the Also one may want to populate certain context properties on limited set of items (like dependencies) without updating the |
@pakrym this is a bummer! |
@SergeyKanzhelev Yes, it's a |
Do you have a proposal on how we can enable our scenarios in a safe manner? Should we implement our analog of Even implementing of our own context propagation will keep the problem of parallel access to |
We can register a scoped service in DI and store |
This is what is was before 2.0, correct? Do you remember why we decided to switch to feature? Any comments on what guidance to provide to the customers in terms of |
I think my idea was that we were already getting |
I'm going to take a look at this. |
@SergeyKanzhelev based on @pakrym's suggestion, are you going to run telemetry initializers earlier in the pipeline? |
@davidfowl we definitely will. And it will help with the standard data collection. Do you think all tracing providers will need to do the same? Read context in the beginning and never try to re-read it again to be on the safe side? What about reading the response code? Is there a "safe" place to read it? Now consider someone wants to add a telemetry initializer that appends the custom context property to all telemetry items processed by request. Let's say this property is calculated based on database call in the beginning of every controller. The logical thing is to store this context property in Have you considered to at least make the features collection thread safe? I'm not sure it will help 100%. |
Beginning of the request might be too early in some cases. I wanted to suggest an API that takes a snapshot of the request at a particular time (a sorta Begin/End pattern). [HttpGet]
public Task DoSomethingAsync()
{
using (telemetryClient.BeginOperation())
{
var httpClient = new HttpClient();
var urls = new [] { "http://www.facebook.com", "www.twitter.com" };
var tasks = new Task[urls.Length];
var i = 0;
foreach (var url in urls)
{
tasks[i++] = DoRequest(url, httpClient, telemetryClient);
}
await Task.WhenAll(tasks);
}
}
public Task DoHttpRequest(string url, HttpClient httpClient, TelemetryClient telemetryClient)
{
try
{
await httpClient.GetAsync(url);
}
catch(Exception ex)
{
telemetryClient.TrackException(ex);
}
} Something like the above where the
Yes, I spent last week looking at what it would take to make the feature collection thread safe. It doesn't solve the problem fully and is the only thing we could realistically do. It's also something that can be implemented in each server so it can't be centralized. The reason it happens so often in this particular case is because WebListener uses a dictionary, Kestrel on the other hand implements it's own feature collection by using fields for known types and a list for unknown features. Even after doing that, features themselves may not be thread safe so the guarantees are very weak. We can say that setting or getting a feature from the collection is safe (by hunting down our implementations and making them atomic) but after getting the feature there's no guarantee it is safe. |
It seems a serious problem to me. I am using AI in production but have not switched to package 2.x yet. Do you suggest to wait until this is solved? |
@dnduffy @SergeyKanzhelev @davidfowl @pakrym do we need to meet about this? If we need to make changes in ASP.NET Core to help address this we're running out of time very fast for 2.0 |
@DamianEdwards I tried to IM you but just missed you I think, I'll read through this and refresh myself on the proposals in the thread. |
Having read everything over I'm drawn to the discussion about why it was switched to HttpContext in the first place, that sounds like a decision to revisit that has the least impact. @pakrym @SergeyKanzhelev @davidfowl If there was a working DI based approach previously why don't we simply revert to that approach? @davidfowl It sounds to me like the feature collection ought to be thread safe even if that isn't the end-all-be-all solution for AppInsights. What do you guys think is the least impact vs. best long term solution? |
@SergeyKanzhelev, @dnduffy I think the reason why DI was not preferred for storing |
What is the status of this? We have held off upgrading because of the poor performance associated with the lock. |
Also interested in the status of this, its blocking us from using app insights instrumentation. Locking the context is not an option for us. |
Why is that? |
I am afraid of performance issues at scale, with a high volume of requests using a lock on an unknown amount of telemetry calls scares me |
I concur with this. We are very happy with performance pre-lock. Going from pre-lock to lock, we saw requests per second served by a single box drop to 1/3 of the original requests per second we could serve on a box. Put another way, the lock cost us 2/3rds of the requests per second we could get out of a single box. We did this performance testing many months ago. If needed I could try and find time this week to re-create the results. |
This lock is per request. I'd like to see that performance data. How many concurrent operations did you have access the http context in parallel that it could cause that level of regression.... |
I re-ran our tests seeing about a 40% drop, vs my original assertion of 66% drop in requests per second. However, I also looked at latency reported from our load test tool (locust.io). We are getting around 290ms at 50% percentile with 2.0.0 of the library. If we rev to 2.1.1 with the lock, we are getting around 2200ms at 50% percentile, so around a 10x increase in latency. That holds up at 95% percentile, with 630ms before and 9100ms after, so still at least a 10x increase in latency. The only change there is the upgrade of the application insights library to 2.1.1 (e.g. the lock). I am working on creating a file new project in VS that re-creates the problem. I will post when I'm able to get a small reproduction case working that I can push publicly. |
@kstreith Maybe the lock is too big? Or maybe there are slow TelemetryInitializers in the pipeline? Are you making lots of outbound requests in parallel in your application? |
Our app does use HttpClient to make 1-3 outbound requests per inbound request directed to us. We do those calls async but we await before starting the next one. |
I ran ./wrk -t10 -c400 -d60s against our application. Here are the results when using v2.0.0 of the application insights asp.net core library: Running 1m test @ https://redacted/ If I all I do to our application is rev the application insights library from 2.0.0 to 2.1.1, I now get the following performance with the same wrk command: Running 1m test @ https://redacted/ So, we can see the request per seconds drops in half and the latency doubles. You can't see it here, but I use the live stream feature of application insights and in the 2.0.0, I'm using 125% CPU in the livestream. When using 2.1.1 of the library, I'm only using 52% of the CPU. So, just upgrading the library from 2.0.0 to 2.1.1 I see significant increase in latency, lower RPS and lower CPU utilization. I also tried running wrk with only -t1 with the 2.1.1 library and I see the exact same results. I appreciate the responses @davidfowl , any thoughts? |
What happens if you try 2.4.0-beta3 ? microsoft/ApplicationInsights-aspnetcore#690 (comment) |
@benaadams With 2.4.0-beta3, performance is still bad, basically equivalent to 2.1.1 Running 1m test @ https://redacted |
This issue is stale because it has been open 300 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. |
HttpContext
class is not thread safe. So accessing it from telemetry initializers on the different threads have to be synchronized. Today enabling Application Insights on highly async web site with many tasks running in parallel may cause failures or hangs.The text was updated successfully, but these errors were encountered: