Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing] Disable spin waiting when cpu quota limit is enabled #113184

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

eduardo-vp
Copy link
Member

@eduardo-vp eduardo-vp commented Mar 6, 2025

Fixes #112869

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@eduardo-vp eduardo-vp force-pushed the disable-spinwait-cpu-quota-limit branch 2 times, most recently from 439d1c7 to c6838a1 Compare March 6, 2025 03:19
private static bool DetermineShouldSpinWait()
{
return !Environment.IsSingleProcessor &&
(!Environment.IsCpuQuotaLimited ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling spin waiting completely when CPU quota is set sounds too harsh to me. Are we going to do some measurements before we introduce this as the default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe the plan is to do some measurements and then consider merging the PR or not

Copy link
Member

@kouvel kouvel Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated description to tag the relevant issue, which has more info. An issue with CPU quota limits is it's unpredictable whether spin-waiting would leave less CPU time for more useful work later. It's more of an issue where the limit is more easily reached. Based on what I saw before, in CPU quota-limited environments where throughput is maxed, spin-waiting reduces throughput, and at lower throughput levels (where the CPU quota limit is not as easily reached), spin-waiting can use some extra CPU to improve throughput.

An alternative may be to disable or reduce spin-waiting depending on component. Maybe the config var could be named something like ReduceSpinWaitingWhenCpuQuotaIsLimited, which defaults to true, to leave room for reducing spin-waiting rather than disabling it altogether.

Another consideration was that in CPU quota-limited environments, a metric of performance used often favors CPU efficiency, such as RPS at 50% CPU usage, or perhaps more generally RPS per vCPU used. With that kind of metric, typically anything more than very small amounts of spin-waiting would reduce efficiency. Many customers running in these types of environments disable spin-waiting in the thread pool since that tends to show up at the top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, our spin waiting is tuned to maximize RPS on overloaded machine that's not a very realistic scenario.

I am curious what you are going to find out with your measurements. What do you plan to measure?

My empirical observation is that our synchronization primitives tend to spin wait a lot more compared to the non-.NET synchronization primitives (e.g. the locks that come with the OS). I would expect that some spin waiting is still beneficial at 50% CPU usage, but the sweet spot is likely a lot less than our current default (like somewhere between 2x - 10x less spinning).

It may be beneficial to reduce the spin-waiting by default, regardless of wherever we run in a CPU limited container or not. We have switched GC to DATAS by default for similar reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, our spin waiting is tuned to maximize RPS on overloaded machine that's not a very realistic scenario.

In the thread pool, yes to some degree, but not really in locks. The spin-waiting duration in locks was based on a couple of things - context-switch overhead, such that a successful spin-wait would typically use less CPU than a context-switch, and latency of other threads waiting for the lock to acquire the lock. Issues arise when spin-waits fail, because their CPU usage is wasted. Shorter spin-waits are inherently more likely to fail, so there's some balance to be struck. For cases where spin-waits almost always fail (locks that are held for long-enough), the Lock type has a mechanism to almost disable spin-waiting. We don't have a good adaptive scheme yet for lock spin-waiting for those in-between cases.

My empirical observation is that our synchronization primitives tend to spin wait a lot more compared to the non-.NET synchronization primitives (e.g. the locks that come with the OS). I would expect that some spin waiting is still beneficial at 50% CPU usage, but the sweet spot is likely a lot less than our current default (like somewhere between 2x - 10x less spinning).
It may be beneficial to reduce the spin-waiting by default, regardless of wherever we run in a CPU limited container or not.

I think it's always beneficial to have a basic lock primitive that does not spin-wait at all, as it can be used for many purposes, including where performance is not of concern, or as a building block for other synchronization primitives when combined with a condition variable, where any spin-waiting may be more appropriate in the higher-level primitive, so it's conceivable that the OS lock primitives would not spin-wait by default.

I think it would have been reasonable to not spin-wait in .NET locks by default, with an option to spin-wait. .NET locks have been spin-waiting historically (perhaps because the old implementation would run into lock convoys if not for spin-waiting, though some spin-waiting was kept around afterwards as well), so we have to consider the tradeoffs. Decreasing spin-waiting in locks by a large factor would make those spin-waits much less likely to succeed, and other threads waiting for the lock would much more likely have to do a full wait anyway, which together significantly reduces the value of the spin-waits. The number of cases that would benefit from very short spin-waits would dwindle. That's not to say a very short spin-wait wouldn't be beneficial, there are always tradeoffs, but aside from some types of microbenchmarks and some perf-critical cases, I feel it would be difficult to show that that would be useful to do in general and better left to an API option while disabling spin-waiting by default.

The thread pool's spin-waiting is rather different, and there are some issues there. An adaptive spin-waiting scheme seems somewhat feasible there, though it also would not work in CPU quota-limited environments due to the unpredictability, or may at least require additional configuration, and is being looked into.

What do you plan to measure?

For this PR, Json, Fortunes, Orchard, with max and fixed lower throughputs using an appropriate client, with various CPU quota limits, and checking CPU usage / efficiency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would have been reasonable to not spin-wait in .NET locks by default, with an option to spin-wait.

I do not think it would be reasonable. It is impossible for users to manually set these defaults correctly at scale.

As far as I know, basic locks in all current OSes or programming languages out there have some spin waiting built-in by default, but it is generally less spin waiting than our defaults.

Copy link
Member

@kouvel kouvel Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is impossible for users to manually set these defaults correctly at scale.

It's also impossible for any one hard-coded lock spin-waiting scheme (in terms of spin-wait duration) to work similarly at different scales. The tradeoffs would depend on how much contention there is, how frequently locks are reacquired by the same thread, how long locks are held, etc.

Copy link
Member

@kouvel kouvel Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, basic locks in all current OSes or programming languages out there have some spin waiting built-in by default, but it is generally less spin waiting than our defaults.

A very-short-duration spin-wait may be beneficial in a scenario where the lock is held for a very short duration, is very lightly contended, say two threads, and maybe there is not much reacquiring of the lock by the same thread. It can be reasonable to optimize for that and deoptimize other cases, as always there will be tradeoffs.

@jkotas
Copy link
Member

jkotas commented Mar 12, 2025

GC is major source of spin waiting. Are you going to apply this policy in the GC as well?

@kouvel
Copy link
Member

kouvel commented Mar 13, 2025

GC is major source of spin waiting. Are you going to apply this policy in the GC as well?

We were going to start with the thread pool and locks, as the thread pool in particular shows up often. It could be expanded later to other components as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Disable spin-waiting in some cases when CPU quota is limited by default
3 participants