Skip to content
This repository has been archived by the owner on Dec 18, 2018. It is now read-only.

CPU usage spikes with new Sockets transport #2694

Closed
cwe1ss opened this issue Jun 29, 2018 · 33 comments
Closed

CPU usage spikes with new Sockets transport #2694

cwe1ss opened this issue Jun 29, 2018 · 33 comments

Comments

@cwe1ss
Copy link

cwe1ss commented Jun 29, 2018

I'm having a weird issue with the new "Sockets" transport and I hope you might have some ideas!

I'm running the latest .NET Core 2.1.1 / ASP.NET Core 2.1.1. I have a single-node Service Fabric cluster for development purposes on Azure (Windows, 2 cores, 16GB RAM) and it runs ~ 60 .NET Core processes (self-contained, win10-x64). About half of them are ASP.NET Core applications, the rest is regular console applications with background services. The applications are communicating via the Service Fabric HTTP reverse proxy on localhost (using HttpClient).

As it's a dev/demo system, there's almost no load on it. However, when using the new "Sockets" transport, I get high CPU usage (50% - 100%) on the ASP.NET core apps for quite a few seconds every few minutes which results in a very unstable system.

As you can see in the overall CPU usage of the machine, the system became much more stable after I switched back to Libuv. I afterwards tried Sockets() with one app and CPU usage became unstable again.
cpu-overall

I've been running PerfView and it shows that the high CPU usage happens around these calls:

module Microsoft.AspNetCore.Server.Kestrel.Core <<Microsoft.AspNetCore.Server.Kestrel.Core!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__186`1[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context].MoveNext()>>
module system.io.pipelines <<system.io.pipelines!System.IO.Pipelines.Pipe+DefaultPipeReader.ReadAsync(value class System.Threading.CancellationToken)>>
module system.io.pipelines <<system.io.pipelines!System.IO.Pipelines.Pipe+DefaultPipeReader.AdvanceTo(value class System.SequencePosition,value class System.SequencePosition)>>

Here's some screenshots. Calls by name:
callsbyname

Call tree:
calltree-1
calltree-2
calltree-3

Process CPU usage with libuv:
As I've said there's almost no load on the app/system.
cpu-usage-libuv

Process CPU usage with Sockets transport:
Some spikes just take 1-2 seconds, others take a lot longer.
cpu-usage-sockets

Any ideas on a possible cause?

PS: I don't know much about PerfView so I hope that what I've done is not misleading.

@halter73
Copy link
Member

Pinging @pakrym just because the high exclusive sample sizes from DefaultPipeReader.ReadAsync and DefaultPipeReader.Advance inside ProcessRequests.

My guess is that the client is sending request data slowly enough that Kestrel tries to repeatedly reparse small segments of the request start line and headers, and for some reason, the libuv transport does a better job of batching data in order to reduce calls to the socket receive callback and thereby also the http parser.

@cwe1ss You could verify this guess by doing a tracing instead of a sampling based profile and looking at the invocation count of DefaultPipeReader.ReadAsync with the libuv vs socket transport given the similar load where the socket transport is performing worse. I don't know if PerfView can collect method invocation counts, but this is easy with dotTrace.

Alternatively, if you could give us sample server and client apps that repro these Socket transport CPU usage spikes, that would be even better.

@cwe1ss
Copy link
Author

cwe1ss commented Jul 2, 2018

@halter73 thank you very much for your response!

I did find a few exceptions in our logs regarding slow reading of the request body. However, there have only been ~ 15 exceptions in total in the last 7 days, so much less than the number of CPU spikes we've had (every few minutes per app). So it's hard to say if those timeouts were a potential cause or just a sign of high CPU load during these spikes.

Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException
Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.

Also, I did not get any of these exceptions since I switched back to Libuv. Everything runs quite smoothly since the switch:
response-duration

Note that I currently also have another issue with Application Insights that results in slowly increasing CPU usage / a memory leak - hence the small rise in response duration at the end. (Using the latest and greatest can be pretty tough some times...) However, I've disabled Application Insights for one of my apps and I've had the same CPU spikes with Sockets so I don't think they're related.

I guess it will be really hard to reproduce this in a sample app. 😞 I do have a dotTrace license so I could do some profiling with it, however, since I'm using Service Fabric I can't easily launch my apps with dotTrace on the server. I quickly tried remote profiling but it only showed full .NET applications. My experience with dotTrace is very limited unfortunately. Do you know of an easy way to do this with Service Fabric / .NET Core?

Your help is much appreciated.

@ZOXEXIVO
Copy link

ZOXEXIVO commented Jul 10, 2018

Having same issue on Ubunu 16.04. Local Windows test passed without any problems.

After many concurrent requests we see that 1 or 2 of 8 cores fully utilizing, even if request already processed. Utilizing CPU completes only if we see in logs:

Connection id "0HLF6G2D3H2RB" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLF6G2D3H2RB" stopped.

All tracing data related to reading from Pipes and AsyncTaskBuilder.

At 100 rps we see:

image

@jamiegs
Copy link

jamiegs commented Jul 20, 2018

I've also ran into this issue.
For our application, the requests should all be fairly quick because they're all within a single AWS region, not over the internet.

The application only handles a couple different request types, and only handles probably around 5 requests per second.

It's running in a Debian 9 container in ECS.

This CPU graph is averaged between 3 containers.
Container CPU Graph. It's going above 100% cpu because it's allowed to burst over the CPU reservation we have assigned to it.

I tried adjusting the MinRequestBodyDataRate to 1 byte per second, with a 10 second grace period and it didn't have any noticeable change. Switching to libuv does appear to fix the problem.

Here's a flamegraph sampiling over 45 seconds.
Flame Graph

@lenew
Copy link

lenew commented Jul 21, 2018

Also have this problem. cpu usage is stable after switch back to Transport.Libuv.
image
This image shows the cpu usage(%) 2.1.1 default transport and libuv.

@pakrym pakrym self-assigned this Jul 23, 2018
@pakrym pakrym added the investigate Investigation item label Jul 23, 2018
@ZOXEXIVO
Copy link

ZOXEXIVO commented Aug 1, 2018

@pakrym are you reproduce it ?

@kspearrin
Copy link

Also having this problem in Azure App Services. Is the workaround for now to .UseLibuv() from program main?

@halter73
Copy link
Member

halter73 commented Aug 6, 2018

Calling .UseLibuv() is a workaround for this issue.

@nforysinski
Copy link

I have attempted to bring in the .UseLibuv() workaround to fix problems we might be able to attribute to this. Upon bringing the nuget package in and applying the workaround, deploying to an azure web app, and letting it go we see almost full 100% cpu utilization on the app service plan. The same utilization is not seen running it locally.

.NET Core 2.1.2
all latest Microsoft.AspNetCore packages
all latest Microsoft.ApplicationInsights packages

@benaadams
Copy link
Contributor

For AppInights are you using version 2.4.0? microsoft/ApplicationInsights-aspnetcore#690 (comment)

@muratg
Copy link
Contributor

muratg commented Aug 24, 2018

We have a fix for this in our next patch release (2.1.4, September.)

@muratg muratg removed the investigate Investigation item label Aug 24, 2018
@ZOXEXIVO
Copy link

ZOXEXIVO commented Aug 25, 2018

@muratg, @pakrym any commit/text description of problem ?

@angelMachin
Copy link

I am having this same issue with the Sockets transport in Debian9 with sdk 2.1.4, .net core 2.1.2
Fixed by switching to libuv.

@pakrym
Copy link
Contributor

pakrym commented Sep 11, 2018

There was a bug in System.IO.Pipelines that caused ReadAsync calls not to "block" in some cases resulting in a tight loop and CPU spike until more data was written by the client.

@cwe1ss
Copy link
Author

cwe1ss commented Sep 12, 2018

Thanks for looking into this @pakrym! Is this fixed with the most recent updates (ASP.NET 2.1.4, System.IO.Pipelines 4.5.1) or will this be included in a future release?

@pakrym
Copy link
Contributor

pakrym commented Sep 12, 2018

@cwe1ss yes, the fix is in 2.1.4/4.5.1.

@muratg
Copy link
Contributor

muratg commented Sep 12, 2018

Fixed in 2.1.4.

@calebickler
Copy link

Whew! Stuck on this for two weeks, thanks for the update! Problems went away after upgrading, see https://stackoverflow.com/questions/52561063

@MatthewLymer
Copy link

This may sound dumb, but fixed in version 2.1.4 of what, exactly? The .net core runtime? sdk? some other library?

@muratg
Copy link
Contributor

muratg commented Oct 9, 2018

@MatthewLymer The fix is in the ASP.NET Core runtime 2.1.4. You can get it via installing the latest 2.1 SDK or install the .NET Core and ASP.NET Core runtimes directly.

@MatthewLymer
Copy link

MatthewLymer commented Oct 15, 2018

I upgraded the runtime on my Ubuntu 16.04 boxes to `dotnet core runtime 2.1.5.

Libuv seems to work fine on either 2.1.2 and 2.1.5, but Sockets seems to be bad on either (I don't have test results for 2.1.2, but they're just as bad).

Is there something else I need to do to have the performance issue resolved?

dotnet --info

Host (useful for support):
  Version: 2.1.5
  Commit:  290303f510

.NET Core SDKs installed:
  No SDKs were found.

.NET Core runtimes installed:
  Microsoft.NETCore.App 2.1.5 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

image

EDIT: internal and external refer to different applications, though both using same version of .net core

@muratg
Copy link
Contributor

muratg commented Oct 15, 2018

@MatthewLymer Could you also install ASP.NET Core runtime there as well? Min version 2.1.4, but since 2.1.5 is also out, just install that. You want both runtimes to be the latest patch in general.

Sorry I have misguided you above. Amended the post with this.

@MatthewLymer
Copy link

MatthewLymer commented Oct 15, 2018

@muratg I'll give that a try, I am at a bit of a loss to understand what the different runtimes are for, my
(aspnetcore) app seems to behave properly with the regular dotnet core runtime, is there any documentation for what the aspnetcore runtime does specifically, and why kestrel would presumably behave differently from the two?

@muratg
Copy link
Contributor

muratg commented Oct 15, 2018

@MatthewLymer It's all about layering. ASP.NET Core runtime depends on .NET Core runtime, and it carries ASP.NET optimized binaries for its target platform.

If you install the SDK, it brings in all the runtimes. But some folks run .NET workloads without ASP.NET and they may prefer not to bring in ASP.NET at all.

@halter73
Copy link
Member

halter73 commented Oct 15, 2018

@MatthewLymer If you decide you want to install just the runtime instead of the full SDK, you can find all the install links/instructions for ASP.NET Core here. And here are the Ubuntu 16.04 specific install instructions.

If you do this dotnet --info will output will include the 2.1.5 Microsoft.AspNetCore.App runtime:

dotnet --info
A compatible SDK version for global.json version: [2.2.100-preview2-009404] from [/home/shalter/aspnet/KestrelHttpServer/global.json] was not found

Host (useful for support):
  Version: 2.1.5
  Commit:  290303f510

.NET Core SDKs installed:
  No SDKs were found.

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.5 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.5 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.5 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

Also, if you run lsof -p <PID of your "dotnet exec" process> you should see that Microsoft.AspNetCore.Kestrel.Transport.Sockets.dll is loaded from the /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.1.5/ folder. I would double check this is the case so you know you're using a version of the dll that contains the fix for this issue. Currently, I suspect you're loading the dll from a self-contained app.

Here's a guide with more details

@MatthewLymer
Copy link

MatthewLymer commented Oct 15, 2018

@halter73 I suspect you're right about the self-contained app (I'll check tomorrow). It seems very confusing to me to have two-ways of achieving the same thing (self contained vs runtime), does this exist simply for backwards compatibility sake and the runtime is the way to go in the future?

When I build my server images I just ended up installing the regular dotnet core runtime as my application just worked. There wasn't any documentation that indicated that installing a different runtime with the same version number would fix performance crippling bugs, so I went with the minimal installation necessary to get my application running.

In my csproj when I import via nuget the Microsoft.AspNetCore.App, does this not dictate that I will be executing the self contained binary? Or is there some mechanism behind the scenes that says it'll actually use another assembly all together?

@halter73
Copy link
Member

@MatthewLymer Whether or not your app is self-contained usually depends on what parameters you pass into dotnet publish. If you use the -r flag (e.g. dotnet publish -c release -r ubuntu.16.04-x64), you'll see that your app along the entire runtime is published to ./bin/release/netcoreapp2.1/ubuntu.16.04-x64/publish. For the default web template, the size of this directory totals to about 96MB! The nice thing is that if you make sure ./bin/release/netcoreapp2.1/ubuntu.16.04-x64/publish/myapp has the executable bit set you can just deploy it on any Ubuntu 16.04 x64 machine without needing to install the .NET runtime or the ASP.NET runtime.

If you omit the -r flag (e.g. dotnet publish -c release), you'll see just your app is published to ./bin/release/netcoreapp2.1/publish/. For the default web template, the size of this directory totals to about 250KB. You can use this directory to run your app on any machine with the compatible runtime installed no matter what OS or bitness. Also if you update your server with the latest patch, every app running on the server that depends on the system-installed runtime gets the update without redeploying.

@MatthewLymer
Copy link

@halter73 I am definitely omitting the -r flag and I do get about ~50 Microsoft.AspNetCore.* assemblies there totalling ~4.5mb.

If I run this on a server w/ the AspNetCore runtime then these assemblies are not used (and instead newer better ones)? If so, is it because these newer binaries are implicitly loaded by the runtime when starting an application, or is there something else that ensures the non-packages ones are used?

If I were to stay with the regular dotnet core runtime, would it be possible to update my project to include the fixed version of System.IO.Pipelines and get all this goodness in a more obvious (to me) manner?

@muratg
Copy link
Contributor

muratg commented Oct 16, 2018

@MatthewLymer Check this out: https://docs.microsoft.com/en-us/dotnet/core/deploying/ for more details. Thanks.

@MatthewLymer
Copy link

Thanks for the info, sorry for derailing the issue

@ZOXEXIVO
Copy link

@MatthewLymer we all waiting you for new AWS graphs with aspnetcore-runtime and sockets transport

@MatthewLymer
Copy link

Woop woop! Migrating to aspnetcore 2.1.5 made things better than before!

image

@ZOXEXIVO
Copy link

ZOXEXIVO commented Oct 30, 2018

related commit, fyi
dotnet/corefx@995dea0
@pakrym, thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests