Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Regression in few TechEmpower benchmarks #1668

Closed
adamsitnik opened this issue May 7, 2021 · 29 comments
Closed

[Windows] Regression in few TechEmpower benchmarks #1668

adamsitnik opened this issue May 7, 2021 · 29 comments

Comments

@adamsitnik
Copy link
Contributor

I wanted to provide some nice benchmark results for @maryamariyan who is currently writing the blog post about Microsoft.Extensions* improvements we did for .NET 6.

To my surprise, the CachingPlatform-intel-win has regressed from 290k+ to 270k RPS.

Plaintext-intel-win has also regressed from 540k to 500k:

obraz

I've taken a very quick look at the profiles and noticed something that I've not seen before: a LOT of time spent in very expensive event logging:

obraz

Repro:

git clone https://github.com/aspnet/Benchmarks.git repro
crank --config .\repro\scenarios\platform.benchmarks.yml --scenario caching --profile aspnet-perf-win --application.collect true

@sebastienros @davidfowl : do we know about the regressions and what has caused them? is the expensive logging always enabled, or just when profiling? how can I disable it?

cc @jeffhandley

@davidfowl
Copy link
Member

I'm pretty sure it's only enabled when you turn profiling on.

@benaadams
Copy link
Contributor

What's the stack before eventsource? What method is doing the emitting?

@adamsitnik
Copy link
Contributor Author

@benaadams
obraz

@benaadams
Copy link
Contributor

Looks like runtime issue? FrameworkEventSource.Log.ThreadPoolEnqueueWorkObject which is gated on loggingEnabled && FrameworkEventSource.Log.IsEnabled() ThreadPoolWorkQueue.cs#L514-L515

Which is set as ThreadPoolWorkQueue.cs#L440-L443

loggingEnabled = FrameworkEventSource.Log.IsEnabled(EventLevel.Verbose, FrameworkEventSource.Keywords.ThreadPool | FrameworkEventSource.Keywords.ThreadTransfer);

I did have issue with VS triggering machine-wide verbose listing on FrameworkEventSource dotnet/runtime#49592; but I assume that's not the issue on Citrine?

@sebastienros
Copy link
Member

@adamsitnik the drop in March you are pointing at is due to the INTEL micro code updates for Specter that were pushed by windows update.

@benaadams
Copy link
Contributor

benaadams commented May 7, 2021

Still probably shouldn't be logging ThreadPool events at EventLevel.Verbose :)

@danmoseley
Copy link

How much micro benchmark coverage do we have for Extensions - especially logging?

@benaadams
Copy link
Contributor

How much micro benchmark coverage do we have for Extensions - especially logging?

From the traces this is specifically EventSource which looks like people are running benchmarks for dotnet/runtime#52092, though don't know if they are in the perf repo (as it probably should be logging anything for these events unless it was switched on in Perfview specifically)

@adamsitnik
Copy link
Contributor Author

@sebastienros how can I disable the events to get a clean trace file?

@adamsitnik
Copy link
Contributor Author

How much micro benchmark coverage do we have for Extensions

I would say that it's OK, we have dedicated micro benchmarks for Caching, DI, Http, Logging and Primitives.

especially logging?

35 benchmarks: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/Microsoft.Extensions.Logging

but here the problematic logging is ETW logging, not Microsoft.Extensions.Logging

@sebastienros
Copy link
Member

@adamsitnik if this can be set as perfview arguments, here is the one to change:

--[JOB].collectArguments <arguments>                                     

Native traces arguments, default is "BufferSizeMB=1024;CircularMB=4096;TplEvents=None", other suggested values: "BufferSizeMB=1024;CircularMB=4096;ThreadTime", "BufferSizeMB=1024;CircularMB=4096;GcOnly"

@benaadams
Copy link
Contributor

So the defaults shouldn't be triggering the EventLevel.Verbose, FrameworkEventSource.Keywords.ThreadPool | FrameworkEventSource.Keywords.ThreadTransfer might be accidently triggered?

@davidfowl
Copy link
Member

cc @brianrob

@brianrob
Copy link

Yeah, this looks like it's due to the fact that profiling is enabled. You can confirm that something else isn't causing this by explicitly disabling these events. With Perfview, try /ClrEvents:None /TPLEvents:None. If it still shows up, then something else is enabling them.

@benaadams, there is an incoming change to address the issue that you saw with VS, that will limit the collection to just devenv.exe.

@benaadams
Copy link
Contributor

Yeah, this looks like it's due to the fact that profiling is enabled.

Should it be recording every queued ThreadPool item by default? (e.g. verbose); As that will be millions of events per second?

@brianrob
Copy link

No, it should not be. I think that this environment uses a cached version of PerfView that used to do this, but the latest version should not.

@adamsitnik
Copy link
Contributor Author

With Perfview, try /ClrEvents:None /TPLEvents:None

I have tried that:

crank --config .\scenarios\platform.benchmarks.yml --scenario caching --profile aspnet-citrine-win --application.collect true --application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents=None"

But for some reason, it ends up with more than 90% of broken stacks:

obraz

I've tried to extend the rundown time, but it did not help:

 --application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents=None;MinRundownTime=120"

All I get are native call stacks.

I think that this environment uses a cached version of PerfView that used to do this, but the latest version should not.

@sebastienros could you please update PerfView to latest version?

@sebastienros
Copy link
Member

Updated from 2.0.66 to 2.0.68.

@brianrob
Copy link

@adamsitnik, sorry that's because the command I gave you shutoff unwind info publish. Try /ClrEvents:JitSymbols. Also, it looks like @sebastienros has updated to the latest version of PerfView, so it might be worth just trying the default collection args.

@adamsitnik
Copy link
Contributor Author

Updated from 2.0.66 to 2.0.68

After the update, I get a warning about 90%+ stacks being BROKEN. After solving the symbols, I can see that EventSource.WriteEvent is gone from the trace (I don't know if for real or due to broken stacks), but there is a different problem related to ETW.

More than 17% of total CPU time is spent in System.private.corelib!System.Int32 Interop+Advapi32::EventWriteTransfer_PInvoke(System.Int64, System.Diagnostics.Tracing.EventDescriptor&amp;, System.Guid*, System.Guid*, System.Int32, System.Diagnostics.Tracing.EventProvider+EventData*) which spends most of the time in ntoskrnl!ExRaiseDatatypeMisalignment

@brianrob is this expected?

obraz

The versions:

| .NET Core SDK Version | 6.0.100-preview.5.21260.9       |
| ASP.NET Core Version  | 6.0.0-preview.5.21261.1+743828a |
| .NET Runtime Version  | 6.0.0-preview.5.21260.8+05b646c |

@adamsitnik
Copy link
Contributor Author

With TplEvents=None;ClrEvents:JitSymbols I still get a complain about 90%+ stacks being BROKEN, but ETW is gone from the trace.

--application.collectArguments "BufferSizeMB=1024;CircularMB=4096;TplEvents=None;ClrEvents:JitSymbols"

@adamsitnik
Copy link
Contributor Author

@sebastienros could you please increase the timeout? I wanted to check if it reproes on Citrine, but I am getting the following error:

[03:06:17.163] Downloading trace file application.05-11-15-03-53.etl.zip ...
[03:07:57.239] The trace was not captured on the server: System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.IO.Compression.DeflateStream.CopyToStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   at System.IO.Compression.DeflateStream.CopyToStream.CopyFromSourceToDestinationAsync()
   at System.Net.Http.DecompressionHandler.DecompressedContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.Crank.Controller.WebUtils.DownloadFileWithProgressAsync(HttpClient httpClient, String uri, String serverJobUri, String destinationFileName) in /_/src/Microsoft.Crank.Controller/WebUtils.cs:line 52
   at Microsoft.Crank.Controller.JobConnection.DownloadTraceAsync() in /_/src/Microsoft.Crank.Controller/JobConnection.cs:line 665

@sebastienros
Copy link
Member

asp-perf-win didn't have the PerfView update, now it has it.

@sebastienros
Copy link
Member

@adamsitnik this is my log, can you share yours to understand the timelines?

[09:18:38.405] Stopping job 'load' ...
[09:18:39.523] Deleting job 'load' ...
[09:18:39.570] Stopping job 'application' ...
[09:18:40.721] Server is collecting trace file, this can take up to 1 minute
..........................................................................................................................
[09:20:47.191] Downloading trace file application.05-11-09-18-40.etl.zip ...
445,290 KB / 445,290 KB (100%)
[09:22:25.443] Deleting job 'application' ...
[09:22:25.481] Stopping job 'db' ...
[09:22:27.703] Deleting job 'db' ...

@sebastienros
Copy link
Member

Tip, use --[JOB].collectStartup false to exclude the cache population phase that happens before the app accepts requests. It takes around 10s and will free up the trace by much I assume.

@adamsitnik
Copy link
Contributor Author

I've updated crank and run the benchmarks again. I am still seeing the same issue:

obraz

@benaadams
Copy link
Contributor

Tracing is throwing exceptions for some reason? https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nf-ntddk-exraisedatatypemisalignment

The ExRaiseDatatypeMisalignment routine can be used with structured exception handling to throw a driver-determined exception for a misaligned data type that occurs when a driver processes I/O requests.

@adamsitnik
Copy link
Contributor Author

I've narrowed down the regression to dotnet/runtime#50778 and opened an issue in the runtime repo: dotnet/runtime#52640

I am not closing this issue yet as it would be great to clarify why ExRaiseDatatypeMisalignment happens to take so much time on Windows (cc @brianrob)

@DamianEdwards
Copy link
Member

Closing this issue due to age. Feel free to reopen if this is still a priority.

@DamianEdwards DamianEdwards closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants