-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surprising performance regression on Unix VMs #59145
Comments
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsFor the following very simple benchmark: [Benchmark]
public ConcurrentBag<int> ConcurrentBag() => new ConcurrentBag<int>(); We can observe a huge perf drop for Linux VMs (bare metal machines are not affected). In the following table all
Initially I thought that it might be caused by the fact that I was using VMs with a single physical core ( ReproCreate git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter 'System.Collections.CtorDefaultSize<Int32>.ConcurrentBag' @jkotas @stephentoub @danmoseley @kouvel In my opinion we need to investigate it and understand the reason why it has regressed before we ship .NET 6.0 as it might be a syndrome of a bigger problem related to Unix VMs.
|
I've just run it on CentOS VM and confirmed that it was not a one time thing: BenchmarkDotNet=v0.13.1.1603-nightly, OS=centos 8
AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core
.NET SDK=6.0.100-rc.1.21417.19
[Host] : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
Job-PVNFCA : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
-------------------- Histogram --------------------
[284.808 ns ; 327.384 ns) | @@@
[327.384 ns ; 366.386 ns) | @@@@@@@@@
[366.386 ns ; 409.337 ns) | @@@@@@
[409.337 ns ; 424.316 ns) |
[424.316 ns ; 467.938 ns) | @@
---------------------------------------------------
BenchmarkDotNet=v0.13.1.1603-nightly, OS=centos 8
AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core
.NET SDK=6.0.100-rc.1.21417.19
[Host] : .NET 6.0.0 (6.0.21.41701), X64 RyuJIT
Job-EUZAQF : .NET 6.0.0 (6.0.21.41701), X64 RyuJIT
-------------------- Histogram --------------------
[5.727 us ; 6.274 us) | @@@@
[6.274 us ; 6.800 us) | @@@@@@@@@@
[6.800 us ; 7.074 us) | @
[7.074 us ; 7.599 us) | @@@@
[7.599 us ; 7.958 us) | @
---------------------------------------------------
|
For a |
All that ThreadLocal depends on finalizer for cleanup. My guess is that the finalizer thread is not able to cleanup the ThreadLocals fast enough in this config due to #56956 or some other subtle change, the ThreadLocal instances accumulate and that makes the microbenchmark much slower. |
Tagging subscribers to this area: @mangod9 Issue DetailsFor the following very simple benchmark: [Benchmark]
public ConcurrentBag<int> ConcurrentBag() => new ConcurrentBag<int>(); We can observe a huge perf drop for Linux VMs (bare metal machines are not affected). In the following table all
Initially I thought that it might be caused by the fact that I was using VMs with a single physical core ( ReproCreate git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter 'System.Collections.CtorDefaultSize<Int32>.ConcurrentBag' @jkotas @stephentoub @danmoseley @kouvel In my opinion we need to investigate it and understand the reason why it has regressed before we ship .NET 6.0 as it might be a syndrome of a bigger problem related to Unix VMs.
|
I believe that is what's happening. When the finalizer/dispose returns an ID, the next ID to try may reset to that ID, which may be a low ID. After that ID is reused, the next constructor has to do more work inside the lock due to the linear lookup for a free ID from a low starting point. The lock would be held for a bit longer and finalization would slow down. Eventually, a balance may be struck where the number of IDs used and IDs queued for freeing are relatively much higher and the perf would be generally slower. The test seems to be very sensitive to timing and VM configuration. The underlying issue was preexisting and #56956 added a tiny amount of code inside the locks, and along with other variables the timings may have changed enough to create some kind of feedback loop. Eliminating the linear search seems to fix the issue, and also seems to make the results more stable. On my local VM configured in a similar way on an Intel processor, there was about a 50% regression before and the fix seems to significantly improve the perf. I'll put up a fix for 7.0. It doesn't seem severe enough to port to 6.0, as the degenerate situation seems unlikely to linger or show up in a significant way in real-world cases. AMD EPYC 7452 1-core 2-thread Debian VM5.0:
6.0 after fix:
Intel i7-8700 1-core 2-thread Ubuntu VM:5.0:
6.0 after fix:
|
- Replaced the linear search for a free ID with a pair of collections that operate in O(1) time for insertion and removal - See dotnet#59145 (comment) for more information - Fixes dotnet#59145
…se (#59300) - Replaced the linear search for a free ID with a pair of collections that operate in O(1) time for insertion and removal - See #59145 (comment) for more information - Fixes #59145
For the following very simple benchmark:
We can observe a huge perf drop for Linux VMs (bare metal machines are not affected).
In the following table all
AMD EPYC 7452
machines are Azure VMs, everything else is bare metal:Initially I thought that it might be caused by the fact that I was using VMs with a single physical core (
Standard_D2a_v4
Azure VMs with AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core), but for RHEL I used a VM with 2 physical (4 logical) cores (Standard_D4as_v4) and the regression can be observed there as well. The regression does not affect Windows Server (also running on Standard_D2a_v4` Azure VM).Repro
Create
Standard_D2a_v4
Azure VM with CentOS|Rhel|SLES|OpenSUSE-> SSH -> install git and python3 and:@jkotas @stephentoub @danmoseley @kouvel In my opinion we need to investigate it and understand the reason why it has regressed before we ship .NET 6.0 as it might be a syndrome of a bigger problem related to Unix VMs.
The text was updated successfully, but these errors were encountered: