[Testing] Disable spin waiting when cpu quota limit is enabled #113184

eduardo-vp · 2025-03-06T00:09:01Z

dotnet-issue-labeler · 2025-03-06T00:09:08Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2025-03-06T00:09:10Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

…ProcessorCount' from https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-113184-merge-62432cd6a44a49b6bd/ComInterfaceGenerator.Tests/1/console.7e243876.log?helixlogtype=result

src/coreclr/utilcode/util.cpp

src/libraries/System.Private.CoreLib/src/System/Environment.cs

src/coreclr/vm/syncblk.cpp

src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs

src/mono/System.Private.CoreLib/src/System/Environment.Mono.cs

jkotas · 2025-03-12T20:30:02Z

src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs

+        private static bool DetermineShouldSpinWait()
+        {
+            return !Environment.IsSingleProcessor &&
+                        (!Environment.IsCpuQuotaLimited ||


Disabling spin waiting completely when CPU quota is set sounds too harsh to me. Are we going to do some measurements before we introduce this as the default?

Yes I believe the plan is to do some measurements and then consider merging the PR or not

Updated description to tag the relevant issue, which has more info. An issue with CPU quota limits is it's unpredictable whether spin-waiting would leave less CPU time for more useful work later. It's more of an issue where the limit is more easily reached. Based on what I saw before, in CPU quota-limited environments where throughput is maxed, spin-waiting reduces throughput, and at lower throughput levels (where the CPU quota limit is not as easily reached), spin-waiting can use some extra CPU to improve throughput.

An alternative may be to disable or reduce spin-waiting depending on component. Maybe the config var could be named something like ReduceSpinWaitingWhenCpuQuotaIsLimited, which defaults to true, to leave room for reducing spin-waiting rather than disabling it altogether.

Another consideration was that in CPU quota-limited environments, a metric of performance used often favors CPU efficiency, such as RPS at 50% CPU usage, or perhaps more generally RPS per vCPU used. With that kind of metric, typically anything more than very small amounts of spin-waiting would reduce efficiency. Many customers running in these types of environments disable spin-waiting in the thread pool since that tends to show up at the top.

Right, our spin waiting is tuned to maximize RPS on overloaded machine that's not a very realistic scenario.

I am curious what you are going to find out with your measurements. What do you plan to measure?

My empirical observation is that our synchronization primitives tend to spin wait a lot more compared to the non-.NET synchronization primitives (e.g. the locks that come with the OS). I would expect that some spin waiting is still beneficial at 50% CPU usage, but the sweet spot is likely a lot less than our current default (like somewhere between 2x - 10x less spinning).

It may be beneficial to reduce the spin-waiting by default, regardless of wherever we run in a CPU limited container or not. We have switched GC to DATAS by default for similar reasons.

Right, our spin waiting is tuned to maximize RPS on overloaded machine that's not a very realistic scenario.

In the thread pool, yes to some degree, but not really in locks. The spin-waiting duration in locks was based on a couple of things - context-switch overhead, such that a successful spin-wait would typically use less CPU than a context-switch, and latency of other threads waiting for the lock to acquire the lock. Issues arise when spin-waits fail, because their CPU usage is wasted. Shorter spin-waits are inherently more likely to fail, so there's some balance to be struck. For cases where spin-waits almost always fail (locks that are held for long-enough), the Lock type has a mechanism to almost disable spin-waiting. We don't have a good adaptive scheme yet for lock spin-waiting for those in-between cases.

My empirical observation is that our synchronization primitives tend to spin wait a lot more compared to the non-.NET synchronization primitives (e.g. the locks that come with the OS). I would expect that some spin waiting is still beneficial at 50% CPU usage, but the sweet spot is likely a lot less than our current default (like somewhere between 2x - 10x less spinning).
It may be beneficial to reduce the spin-waiting by default, regardless of wherever we run in a CPU limited container or not.

I think it's always beneficial to have a basic lock primitive that does not spin-wait at all, as it can be used for many purposes, including where performance is not of concern, or as a building block for other synchronization primitives when combined with a condition variable, where any spin-waiting may be more appropriate in the higher-level primitive, so it's conceivable that the OS lock primitives would not spin-wait by default.

I think it would have been reasonable to not spin-wait in .NET locks by default, with an option to spin-wait. .NET locks have been spin-waiting historically (perhaps because the old implementation would run into lock convoys if not for spin-waiting, though some spin-waiting was kept around afterwards as well), so we have to consider the tradeoffs. Decreasing spin-waiting in locks by a large factor would make those spin-waits much less likely to succeed, and other threads waiting for the lock would much more likely have to do a full wait anyway, which together significantly reduces the value of the spin-waits. The number of cases that would benefit from very short spin-waits would dwindle. That's not to say a very short spin-wait wouldn't be beneficial, there are always tradeoffs, but aside from some types of microbenchmarks and some perf-critical cases, I feel it would be difficult to show that that would be useful to do in general and better left to an API option while disabling spin-waiting by default.

The thread pool's spin-waiting is rather different, and there are some issues there. An adaptive spin-waiting scheme seems somewhat feasible there, though it also would not work in CPU quota-limited environments due to the unpredictability, or may at least require additional configuration, and is being looked into.

What do you plan to measure?

For this PR, Json, Fortunes, Orchard, with max and fixed lower throughputs using an appropriate client, with various CPU quota limits, and checking CPU usage / efficiency.

I think it would have been reasonable to not spin-wait in .NET locks by default, with an option to spin-wait.

I do not think it would be reasonable. It is impossible for users to manually set these defaults correctly at scale.

As far as I know, basic locks in all current OSes or programming languages out there have some spin waiting built-in by default, but it is generally less spin waiting than our defaults.

It is impossible for users to manually set these defaults correctly at scale.

It's also impossible for any one hard-coded lock spin-waiting scheme (in terms of spin-wait duration) to work similarly at different scales. The tradeoffs would depend on how much contention there is, how frequently locks are reacquired by the same thread, how long locks are held, etc.

As far as I know, basic locks in all current OSes or programming languages out there have some spin waiting built-in by default, but it is generally less spin waiting than our defaults.

A very-short-duration spin-wait may be beneficial in a scenario where the lock is held for a very short duration, is very lightly contended, say two threads, and maybe there is not much reacquiring of the lock by the same thread. It can be reasonable to optimize for that and deoptimize other cases, as always there will be tradeoffs.

jkotas · 2025-03-12T20:31:04Z

GC is major source of spin waiting. Are you going to apply this policy in the GC as well?

kouvel · 2025-03-13T01:17:27Z

GC is major source of spin waiting. Are you going to apply this policy in the GC as well?

We were going to start with the thread pool and locks, as the thread pool in particular shows up often. It could be expanded later to other components as appropriate.

src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs

dotnet-issue-labeler bot added area-Infrastructure-coreclr new-api-needs-documentation labels Mar 6, 2025

dotnet-policy-service bot assigned eduardo-vp Mar 6, 2025

eduardo-vp added 3 commits March 5, 2025 18:49

Add IsCpuQuotaLimited property

9b30f79

Add ShouldSpinWait and use it within Lock and PortableThreadPool

aa37b01

Disable spin waiting in Monitor

f34c266

eduardo-vp force-pushed the disable-spinwait-cpu-quota-limit branch 2 times, most recently from 439d1c7 to c6838a1 Compare March 6, 2025 03:19

Fix external linkage

998d435

eduardo-vp force-pushed the disable-spinwait-cpu-quota-limit branch from c6838a1 to 998d435 Compare March 6, 2025 05:38

eduardo-vp added 2 commits March 6, 2025 12:49

Use FCall instead of QCall

5dbbf84

Try fixing 'method GetIsCpuQuotaLimited should come before method Get…

e245c92

…ProcessorCount' from https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-113184-merge-62432cd6a44a49b6bd/ComInterfaceGenerator.Tests/1/console.7e243876.log?helixlogtype=result

eduardo-vp force-pushed the disable-spinwait-cpu-quota-limit branch from 5eea6fa to e245c92 Compare March 6, 2025 23:52

build-analysis bot mentioned this pull request Mar 7, 2025

[wasm][AOT] emcc : error - received SIGKILL (-9) #89402

Open

jkotas added area-System.Threading and removed area-Infrastructure-coreclr labels Mar 7, 2025

eduardo-vp added 3 commits March 7, 2025 13:21

Add config var

e62bcc6

Remove Mono changes, just return false from managed side

63087d1

Fix unreachable code (for some jobs)

d5aee9a

eduardo-vp force-pushed the disable-spinwait-cpu-quota-limit branch from d1d7d89 to d5aee9a Compare March 7, 2025 22:22

build-analysis bot mentioned this pull request Mar 8, 2025

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

3 tasks

kouvel reviewed Mar 10, 2025

View reviewed changes

src/coreclr/utilcode/util.cpp Show resolved Hide resolved

eduardo-vp added 2 commits March 10, 2025 14:44

Precisely determine if there's a cpu quota limit (No rounding up)

d86075d

Use double instead

8f28c78

kouvel reviewed Mar 12, 2025

View reviewed changes

jkotas reviewed Mar 12, 2025

View reviewed changes

Some PR feedback

c3c9bf7

kouvel reviewed Mar 13, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Mar 13, 2025

ProcessThreadTests.TestStartTimeProperty failure in CI #105526

Open

Use ReduceSpinWaitingWhenCpuQuotaIsLimited

af6d641

build-analysis bot mentioned this pull request Mar 14, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Testing] Disable spin waiting when cpu quota limit is enabled #113184

[Testing] Disable spin waiting when cpu quota limit is enabled #113184

eduardo-vp commented Mar 6, 2025 •

edited by kouvel

Loading

dotnet-issue-labeler bot commented Mar 6, 2025

dotnet-issue-labeler bot commented Mar 6, 2025

jkotas Mar 12, 2025

eduardo-vp Mar 12, 2025

kouvel Mar 13, 2025 •

edited

Loading

jkotas Mar 13, 2025

kouvel Mar 13, 2025

jkotas Mar 13, 2025

kouvel Mar 13, 2025 •

edited

Loading

kouvel Mar 13, 2025 •

edited

Loading

jkotas commented Mar 12, 2025

kouvel commented Mar 13, 2025

[Testing] Disable spin waiting when cpu quota limit is enabled #113184

Are you sure you want to change the base?

[Testing] Disable spin waiting when cpu quota limit is enabled #113184

Conversation

eduardo-vp commented Mar 6, 2025 • edited by kouvel Loading

dotnet-issue-labeler bot commented Mar 6, 2025

dotnet-issue-labeler bot commented Mar 6, 2025

jkotas Mar 12, 2025

Choose a reason for hiding this comment

eduardo-vp Mar 12, 2025

Choose a reason for hiding this comment

kouvel Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

jkotas Mar 13, 2025

Choose a reason for hiding this comment

kouvel Mar 13, 2025

Choose a reason for hiding this comment

jkotas Mar 13, 2025

Choose a reason for hiding this comment

kouvel Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

kouvel Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

jkotas commented Mar 12, 2025

kouvel commented Mar 13, 2025

eduardo-vp commented Mar 6, 2025 •

edited by kouvel

Loading

kouvel Mar 13, 2025 •

edited

Loading

kouvel Mar 13, 2025 •

edited

Loading

kouvel Mar 13, 2025 •

edited

Loading