Surprising performance regression on Unix VMs #59145

adamsitnik · 2021-09-15T10:22:46Z

For the following very simple benchmark:

[Benchmark]
public ConcurrentBag<int> ConcurrentBag() => new ConcurrentBag<int>();

We can observe a huge perf drop for Linux VMs (bare metal machines are not affected).

In the following table all AMD EPYC 7452 machines are Azure VMs, everything else is bare metal:

Result	Ratio	Modality	Operating System	Bit	Processor Name
Faster	1.73	several?	Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster	1.87		Windows 10.0.20348	X64	AMD EPYC 7452
Slower	0.50	several?	Windows 10.0.20348	X64	AMD EPYC 7452
Faster	1.28		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	0.88	several?	Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)
Same	1.00	several?	Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)
Same	1.12		Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	1.20	bimodal	Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Same	0.97	several?	Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same	0.92	bimodal	Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz
Slower	0.74	several?	Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz
Slower	0.04		centos 8	X64	AMD EPYC 7452
Slower	0.03		debian 10	X64	AMD EPYC 7452
Slower	0.22	bimodal	rhel 7	X64	AMD EPYC 7452
Slower	0.22	bimodal	sles 15	X64	AMD EPYC 7452
Slower	0.03	several?	opensuse-leap 15.3	X64	AMD EPYC 7452
Faster	1.20	bimodal	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	0.91	several?	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower	0.79		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Faster	2.85		ubuntu 16.04	Arm64	Unknown processor
Same	1.01	bimodal	Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz
Faster	1.29	bimodal	Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz
Same	0.81	multimodal	Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster	1.35	several?	Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	1.05	multimodal	Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz
Same	1.02		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)
Faster	1.44	several?	macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)
Same	1.13	bimodal	macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)

Initially I thought that it might be caused by the fact that I was using VMs with a single physical core (Standard_D2a_v4 Azure VMs with AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core), but for RHEL I used a VM with 2 physical (4 logical) cores (Standard_D4as_v4) and the regression can be observed there as well. The regression does not affect Windows Server (also running on Standard_D2a_v4` Azure VM).

Repro

Create Standard_D2a_v4 Azure VM with CentOS|Rhel|SLES|OpenSUSE-> SSH -> install git and python3 and:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter 'System.Collections.CtorDefaultSize<Int32>.ConcurrentBag'

@jkotas @stephentoub @danmoseley @kouvel In my opinion we need to investigate it and understand the reason why it has regressed before we ship .NET 6.0 as it might be a syndrome of a bigger problem related to Unix VMs.

The text was updated successfully, but these errors were encountered:

ghost · 2021-09-15T10:22:52Z

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

For the following very simple benchmark:

[Benchmark]
public ConcurrentBag<int> ConcurrentBag() => new ConcurrentBag<int>();

We can observe a huge perf drop for Linux VMs (bare metal machines are not affected).

In the following table all AMD EPYC 7452 machines are Azure VMs, everything else is bare metal:

Result	Ratio	Modality	Operating System	Bit	Processor Name
Faster	1.73	several?	Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster	1.87		Windows 10.0.20348	X64	AMD EPYC 7452
Slower	0.50	several?	Windows 10.0.20348	X64	AMD EPYC 7452
Faster	1.28		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	0.88	several?	Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)
Same	1.00	several?	Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)
Same	1.12		Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	1.20	bimodal	Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Same	0.97	several?	Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same	0.92	bimodal	Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz
Slower	0.74	several?	Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz
Slower	0.04		centos 8	X64	AMD EPYC 7452
Slower	0.03		debian 10	X64	AMD EPYC 7452
Slower	0.22	bimodal	rhel 7	X64	AMD EPYC 7452
Slower	0.22	bimodal	sles 15	X64	AMD EPYC 7452
Slower	0.03	several?	opensuse-leap 15.3	X64	AMD EPYC 7452
Faster	1.20	bimodal	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	0.91	several?	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower	0.79		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Faster	2.85		ubuntu 16.04	Arm64	Unknown processor
Same	1.01	bimodal	Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz
Faster	1.29	bimodal	Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz
Same	0.81	multimodal	Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster	1.35	several?	Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	1.05	multimodal	Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz
Same	1.02		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)
Faster	1.44	several?	macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)
Same	1.13	bimodal	macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)

Initially I thought that it might be caused by the fact that I was using VMs with a single physical core (Standard_D2a_v4 Azure VMs with AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core), but for RHEL I used a VM with 2 physical (4 logical) cores (Standard_D4as_v4) and the regression can be observed there as well. The regression does not affect Windows Server (also running on Standard_D2a_v4` Azure VM).

Repro

Create Standard_D2a_v4 Azure VM with CentOS|Rhel|SLES|OpenSUSE-> SSH -> install git and python3 and:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter 'System.Collections.CtorDefaultSize<Int32>.ConcurrentBag'

@jkotas @stephentoub @danmoseley @kouvel In my opinion we need to investigate it and understand the reason why it has regressed before we ship .NET 6.0 as it might be a syndrome of a bigger problem related to Unix VMs.

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Collections`, `os-linux`, `tenet-performance`
Milestone:	6.0.0

adamsitnik · 2021-09-15T10:26:23Z

I've just run it on CentOS VM and confirmed that it was not a one time thing:

 BenchmarkDotNet=v0.13.1.1603-nightly, OS=centos 8                                                                                           
 AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core                                                                                   
 .NET SDK=6.0.100-rc.1.21417.19                                                                                                              
   [Host]     : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT                                                                                     
   Job-PVNFCA : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT                


 -------------------- Histogram --------------------
 [284.808 ns ; 327.384 ns) | @@@                    
 [327.384 ns ; 366.386 ns) | @@@@@@@@@              
 [366.386 ns ; 409.337 ns) | @@@@@@                 
 [409.337 ns ; 424.316 ns) |                        
 [424.316 ns ; 467.938 ns) | @@                     
 ---------------------------------------------------

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Allocated
ConcurrentBag	365.1 ns	35.02 ns	40.33 ns	360.6 ns	304.3 ns	448.4 ns	0.0018	0.0009	128 B

BenchmarkDotNet=v0.13.1.1603-nightly, OS=centos 8         
AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core 
.NET SDK=6.0.100-rc.1.21417.19                            
  [Host]     : .NET 6.0.0 (6.0.21.41701), X64 RyuJIT      
  Job-EUZAQF : .NET 6.0.0 (6.0.21.41701), X64 RyuJIT      

 -------------------- Histogram -------------------- 
 [5.727 us ; 6.274 us) | @@@@                        
 [6.274 us ; 6.800 us) | @@@@@@@@@@                  
 [6.800 us ; 7.074 us) | @                           
 [7.074 us ; 7.599 us) | @@@@                        
 [7.599 us ; 7.958 us) | @                           
 ---------------------------------------------------

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
ConcurrentBag	6.660 us	0.4721 us	0.5437 us	6.525 us	5.874 us	7.695 us	0.0021	0.0013	0.0004	128 B

adamsitnik · 2021-09-15T12:28:34Z

For a ConcurrentBag<string> the regression reproduces only on Linux VMs with 1 physical core.

jkotas · 2021-09-15T13:41:19Z

All that new ConcurrentBag<int> does is new ThreadLocal<WorkStealingQueue>(). Do you see the same regression if you just run new ThreadLocal? #56956 is potentially related. cc @davidwrighton

ThreadLocal depends on finalizer for cleanup. My guess is that the finalizer thread is not able to cleanup the ThreadLocals fast enough in this config due to #56956 or some other subtle change, the ThreadLocal instances accumulate and that makes the microbenchmark much slower.

ghost · 2021-09-18T01:40:47Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

For the following very simple benchmark:

[Benchmark]
public ConcurrentBag<int> ConcurrentBag() => new ConcurrentBag<int>();

We can observe a huge perf drop for Linux VMs (bare metal machines are not affected).

In the following table all AMD EPYC 7452 machines are Azure VMs, everything else is bare metal:

Result	Ratio	Modality	Operating System	Bit	Processor Name
Faster	1.73	several?	Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster	1.87		Windows 10.0.20348	X64	AMD EPYC 7452
Slower	0.50	several?	Windows 10.0.20348	X64	AMD EPYC 7452
Faster	1.28		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	0.88	several?	Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)
Same	1.00	several?	Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)
Same	1.12		Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	1.20	bimodal	Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Same	0.97	several?	Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Same	0.92	bimodal	Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz
Slower	0.74	several?	Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz
Slower	0.04		centos 8	X64	AMD EPYC 7452
Slower	0.03		debian 10	X64	AMD EPYC 7452
Slower	0.22	bimodal	rhel 7	X64	AMD EPYC 7452
Slower	0.22	bimodal	sles 15	X64	AMD EPYC 7452
Slower	0.03	several?	opensuse-leap 15.3	X64	AMD EPYC 7452
Faster	1.20	bimodal	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	0.91	several?	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower	0.79		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Faster	2.85		ubuntu 16.04	Arm64	Unknown processor
Same	1.01	bimodal	Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz
Faster	1.29	bimodal	Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz
Same	0.81	multimodal	Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores
Faster	1.35	several?	Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz
Same	1.05	multimodal	Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz
Same	1.02		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)
Faster	1.44	several?	macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)
Same	1.13	bimodal	macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)

Initially I thought that it might be caused by the fact that I was using VMs with a single physical core (Standard_D2a_v4 Azure VMs with AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core), but for RHEL I used a VM with 2 physical (4 logical) cores (Standard_D4as_v4) and the regression can be observed there as well. The regression does not affect Windows Server (also running on Standard_D2a_v4` Azure VM).

Repro

Create Standard_D2a_v4 Azure VM with CentOS|Rhel|SLES|OpenSUSE-> SSH -> install git and python3 and:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter 'System.Collections.CtorDefaultSize<Int32>.ConcurrentBag'

@jkotas @stephentoub @danmoseley @kouvel In my opinion we need to investigate it and understand the reason why it has regressed before we ship .NET 6.0 as it might be a syndrome of a bigger problem related to Unix VMs.

Author:	adamsitnik
Assignees:	kouvel
Labels:	`area-System.Collections`, `area-System.Threading`, `os-linux`, `tenet-performance`
Milestone:	6.0.0

kouvel · 2021-09-18T02:14:34Z

My guess is that the finalizer thread is not able to cleanup the ThreadLocals fast enough

I believe that is what's happening. When the finalizer/dispose returns an ID, the next ID to try may reset to that ID, which may be a low ID. After that ID is reused, the next constructor has to do more work inside the lock due to the linear lookup for a free ID from a low starting point. The lock would be held for a bit longer and finalization would slow down. Eventually, a balance may be struck where the number of IDs used and IDs queued for freeing are relatively much higher and the perf would be generally slower.

The test seems to be very sensitive to timing and VM configuration. The underlying issue was preexisting and #56956 added a tiny amount of code inside the locks, and along with other variables the timings may have changed enough to create some kind of feedback loop.

Eliminating the linear search seems to fix the issue, and also seems to make the results more stable. On my local VM configured in a similar way on an Intel processor, there was about a 50% regression before and the fix seems to significantly improve the perf.

I'll put up a fix for 7.0. It doesn't seem severe enough to port to 6.0, as the degenerate situation seems unlikely to linger or show up in a significant way in real-world cases.

AMD EPYC 7452 1-core 2-thread Debian VM

5.0:

-------------------- Histogram --------------------
[218.372 ns ; 308.494 ns) | @@@@@@@@@@@
[308.494 ns ; 389.712 ns) | @@@
[389.712 ns ; 479.833 ns) | @@@@@
[479.833 ns ; 552.070 ns) | @
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1.1603-nightly, OS=debian 10
AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core
.NET SDK=6.0.100-rc.1.21417.19
  [Host]     : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
  Job-RFGVOX : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
ConcurrentBag	331.3 ns	80.92 ns	93.19 ns	307.0 ns	219.5 ns	507.0 ns	0.0011	128 B

6.0 after fix:

-------------------- Histogram --------------------
[252.481 ns ; 262.758 ns) | @
[262.758 ns ; 273.738 ns) | @@@@@@@
[273.738 ns ; 283.887 ns) | @@@@@@@@
[283.887 ns ; 296.699 ns) | @@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1.1603-nightly, OS=debian 10
AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core
.NET SDK=6.0.100-rc.1.21417.19
  [Host]     : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-AVFKAA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
ConcurrentBag	276.5 ns	9.11 ns	10.50 ns	277.2 ns	257.6 ns	293.9 ns	0.0010	128 B

Intel i7-8700 1-core 2-thread Ubuntu VM:

5.0:

-------------------- Histogram --------------------
[  417.478 ns ;   589.771 ns) | @
[  589.771 ns ;   798.253 ns) | @@@@@@@@
[  798.253 ns ; 1,036.725 ns) | @@@@@@
[1,036.725 ns ; 1,255.634 ns) | @@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1.1603-nightly, OS=ubuntu 20.04
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 2 logical cores and 1 physical core
.NET SDK=6.0.100-rc.1.21417.19
  [Host]     : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
  Job-KTFNDP : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
ConcurrentBag	856.2 ns	187.2 ns	215.6 ns	849.9 ns	521.7 ns	1,229.2 ns	0.0219	0.0109	0.0027	128 B

6.0 after fix:

-------------------- Histogram --------------------
[321.533 ns ; 334.416 ns) | @@@@@@@
[334.416 ns ; 346.646 ns) | @@@@@@@@
[346.646 ns ; 365.368 ns) | @@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.13.1.1603-nightly, OS=ubuntu 20.04
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 2 logical cores and 1 physical core
.NET SDK=6.0.100-rc.1.21417.19
  [Host]     : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-DYJEHA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Allocated
ConcurrentBag	338.6 ns	11.41 ns	12.21 ns	337.7 ns	322.6 ns	362.8 ns	0.0203	0.0101	128 B

- Replaced the linear search for a free ID with a pair of collections that operate in O(1) time for insertion and removal - See dotnet#59145 (comment) for more information - Fixes dotnet#59145

…se (#59300) - Replaced the linear search for a free ID with a pair of collections that operate in O(1) time for insertion and removal - See #59145 (comment) for more information - Fixes #59145

adamsitnik added area-System.Collections os-linux Linux OS (any supported distro) tenet-performance Performance related issue labels Sep 15, 2021

adamsitnik added this to the 6.0.0 milestone Sep 15, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 15, 2021

jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 15, 2021

adamsitnik mentioned this issue Sep 17, 2021

.NET 6.0 Microbenchmarks Performance Study Report #59272

Closed

19 tasks

kouvel self-assigned this Sep 18, 2021

kouvel added the area-System.Threading label Sep 18, 2021

kouvel removed the area-System.Collections label Sep 18, 2021

kouvel modified the milestones: 6.0.0, 7.0.0 Sep 18, 2021

kouvel mentioned this issue Sep 18, 2021

Fix an occasional small perf issue in ThreadLocal constructor/dispose #59300

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 18, 2021

kouvel closed this as completed in #59300 Sep 30, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 30, 2021

ghost locked as resolved and limited conversation to collaborators Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surprising performance regression on Unix VMs #59145

Surprising performance regression on Unix VMs #59145

adamsitnik commented Sep 15, 2021

ghost commented Sep 15, 2021

Repro

adamsitnik commented Sep 15, 2021

adamsitnik commented Sep 15, 2021

jkotas commented Sep 15, 2021

ghost commented Sep 18, 2021

Repro

kouvel commented Sep 18, 2021

Surprising performance regression on Unix VMs #59145

Surprising performance regression on Unix VMs #59145

Comments

adamsitnik commented Sep 15, 2021

Repro

ghost commented Sep 15, 2021

Repro

adamsitnik commented Sep 15, 2021

adamsitnik commented Sep 15, 2021

jkotas commented Sep 15, 2021

ghost commented Sep 18, 2021

Repro

kouvel commented Sep 18, 2021

AMD EPYC 7452 1-core 2-thread Debian VM

Intel i7-8700 1-core 2-thread Ubuntu VM: