-
Notifications
You must be signed in to change notification settings - Fork 527
CPU usage spikes with new Sockets transport #2694
Comments
Pinging @pakrym just because the high exclusive sample sizes from DefaultPipeReader.ReadAsync and DefaultPipeReader.Advance inside ProcessRequests. My guess is that the client is sending request data slowly enough that Kestrel tries to repeatedly reparse small segments of the request start line and headers, and for some reason, the libuv transport does a better job of batching data in order to reduce calls to the socket receive callback and thereby also the http parser. @cwe1ss You could verify this guess by doing a tracing instead of a sampling based profile and looking at the invocation count of DefaultPipeReader.ReadAsync with the libuv vs socket transport given the similar load where the socket transport is performing worse. I don't know if PerfView can collect method invocation counts, but this is easy with dotTrace. Alternatively, if you could give us sample server and client apps that repro these Socket transport CPU usage spikes, that would be even better. |
@halter73 thank you very much for your response! I did find a few exceptions in our logs regarding slow reading of the request body. However, there have only been ~ 15 exceptions in total in the last 7 days, so much less than the number of CPU spikes we've had (every few minutes per app). So it's hard to say if those timeouts were a potential cause or just a sign of high CPU load during these spikes.
Also, I did not get any of these exceptions since I switched back to Libuv. Everything runs quite smoothly since the switch: Note that I currently also have another issue with Application Insights that results in slowly increasing CPU usage / a memory leak - hence the small rise in response duration at the end. (Using the latest and greatest can be pretty tough some times...) However, I've disabled Application Insights for one of my apps and I've had the same CPU spikes with I guess it will be really hard to reproduce this in a sample app. 😞 I do have a dotTrace license so I could do some profiling with it, however, since I'm using Service Fabric I can't easily launch my apps with dotTrace on the server. I quickly tried remote profiling but it only showed full .NET applications. My experience with dotTrace is very limited unfortunately. Do you know of an easy way to do this with Service Fabric / .NET Core? Your help is much appreciated. |
Having same issue on Ubunu 16.04. Local Windows test passed without any problems. After many concurrent requests we see that 1 or 2 of 8 cores fully utilizing, even if request already processed. Utilizing CPU completes only if we see in logs: Connection id "0HLF6G2D3H2RB" sending FIN. All tracing data related to reading from Pipes and AsyncTaskBuilder. At 100 rps we see: |
@pakrym are you reproduce it ? |
Also having this problem in Azure App Services. Is the workaround for now to |
Calling |
I have attempted to bring in the .NET Core 2.1.2 |
For AppInights are you using version 2.4.0? microsoft/ApplicationInsights-aspnetcore#690 (comment) |
We have a fix for this in our next patch release (2.1.4, September.) |
I am having this same issue with the Sockets transport in Debian9 with sdk 2.1.4, .net core 2.1.2 |
There was a bug in System.IO.Pipelines that caused ReadAsync calls not to "block" in some cases resulting in a tight loop and CPU spike until more data was written by the client. |
Thanks for looking into this @pakrym! Is this fixed with the most recent updates (ASP.NET 2.1.4, System.IO.Pipelines 4.5.1) or will this be included in a future release? |
@cwe1ss yes, the fix is in 2.1.4/4.5.1. |
Fixed in 2.1.4. |
Whew! Stuck on this for two weeks, thanks for the update! Problems went away after upgrading, see https://stackoverflow.com/questions/52561063 |
This may sound dumb, but fixed in version 2.1.4 of what, exactly? The .net core runtime? sdk? some other library? |
@MatthewLymer The fix is in the ASP.NET Core runtime 2.1.4. You can get it via installing the latest 2.1 SDK or install the .NET Core and ASP.NET Core runtimes directly. |
I upgraded the runtime on my Ubuntu 16.04 boxes to `dotnet core runtime 2.1.5. Libuv seems to work fine on either 2.1.2 and 2.1.5, but Sockets seems to be bad on either (I don't have test results for 2.1.2, but they're just as bad). Is there something else I need to do to have the performance issue resolved?
EDIT: internal and external refer to different applications, though both using same version of .net core |
@MatthewLymer Could you also install ASP.NET Core runtime there as well? Min version 2.1.4, but since 2.1.5 is also out, just install that. You want both runtimes to be the latest patch in general. Sorry I have misguided you above. Amended the post with this. |
@muratg I'll give that a try, I am at a bit of a loss to understand what the different runtimes are for, my |
@MatthewLymer It's all about layering. ASP.NET Core runtime depends on .NET Core runtime, and it carries ASP.NET optimized binaries for its target platform. If you install the SDK, it brings in all the runtimes. But some folks run .NET workloads without ASP.NET and they may prefer not to bring in ASP.NET at all. |
@MatthewLymer If you decide you want to install just the runtime instead of the full SDK, you can find all the install links/instructions for ASP.NET Core here. And here are the Ubuntu 16.04 specific install instructions. If you do this
Also, if you run |
@halter73 I suspect you're right about the self-contained app (I'll check tomorrow). It seems very confusing to me to have two-ways of achieving the same thing (self contained vs runtime), does this exist simply for backwards compatibility sake and the runtime is the way to go in the future? When I build my server images I just ended up installing the regular dotnet core runtime as my application just worked. There wasn't any documentation that indicated that installing a different runtime with the same version number would fix performance crippling bugs, so I went with the minimal installation necessary to get my application running. In my csproj when I import via nuget the |
@MatthewLymer Whether or not your app is self-contained usually depends on what parameters you pass into If you omit the |
@halter73 I am definitely omitting the If I run this on a server w/ the AspNetCore runtime then these assemblies are not used (and instead newer better ones)? If so, is it because these newer binaries are implicitly loaded by the runtime when starting an application, or is there something else that ensures the non-packages ones are used? If I were to stay with the regular dotnet core runtime, would it be possible to update my project to include the fixed version of |
@MatthewLymer Check this out: https://docs.microsoft.com/en-us/dotnet/core/deploying/ for more details. Thanks. |
Thanks for the info, sorry for derailing the issue |
@MatthewLymer we all waiting you for new AWS graphs with aspnetcore-runtime and sockets transport |
related commit, fyi |
I'm having a weird issue with the new "Sockets" transport and I hope you might have some ideas!
I'm running the latest .NET Core 2.1.1 / ASP.NET Core 2.1.1. I have a single-node Service Fabric cluster for development purposes on Azure (Windows, 2 cores, 16GB RAM) and it runs ~ 60 .NET Core processes (self-contained,
win10-x64
). About half of them are ASP.NET Core applications, the rest is regular console applications with background services. The applications are communicating via the Service Fabric HTTP reverse proxy on localhost (usingHttpClient
).As it's a dev/demo system, there's almost no load on it. However, when using the new "Sockets" transport, I get high CPU usage (50% - 100%) on the ASP.NET core apps for quite a few seconds every few minutes which results in a very unstable system.
As you can see in the overall CPU usage of the machine, the system became much more stable after I switched back to
Libuv
. I afterwards triedSockets()
with one app and CPU usage became unstable again.I've been running PerfView and it shows that the high CPU usage happens around these calls:
Here's some screenshots. Calls by name:
Call tree:
Process CPU usage with libuv:
As I've said there's almost no load on the app/system.
Process CPU usage with Sockets transport:
Some spikes just take 1-2 seconds, others take a lot longer.
Any ideas on a possible cause?
PS: I don't know much about PerfView so I hope that what I've done is not misleading.
The text was updated successfully, but these errors were encountered: