-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Testing] Disable spin waiting when cpu quota limit is enabled #113184
base: main
Are you sure you want to change the base?
[Testing] Disable spin waiting when cpu quota limit is enabled #113184
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
439d1c7
to
c6838a1
Compare
c6838a1
to
998d435
Compare
5eea6fa
to
e245c92
Compare
d1d7d89
to
d5aee9a
Compare
src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs
Outdated
Show resolved
Hide resolved
private static bool DetermineShouldSpinWait() | ||
{ | ||
return !Environment.IsSingleProcessor && | ||
(!Environment.IsCpuQuotaLimited || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling spin waiting completely when CPU quota is set sounds too harsh to me. Are we going to do some measurements before we introduce this as the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe the plan is to do some measurements and then consider merging the PR or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated description to tag the relevant issue, which has more info. An issue with CPU quota limits is it's unpredictable whether spin-waiting would leave less CPU time for more useful work later. It's more of an issue where the limit is more easily reached. Based on what I saw before, in CPU quota-limited environments where throughput is maxed, spin-waiting reduces throughput, and at lower throughput levels (where the CPU quota limit is not as easily reached), spin-waiting can use some extra CPU to improve throughput.
An alternative may be to disable or reduce spin-waiting depending on component. Maybe the config var could be named something like ReduceSpinWaitingWhenCpuQuotaIsLimited
, which defaults to true, to leave room for reducing spin-waiting rather than disabling it altogether.
Another consideration was that in CPU quota-limited environments, a metric of performance used often favors CPU efficiency, such as RPS at 50% CPU usage, or perhaps more generally RPS per vCPU used. With that kind of metric, typically anything more than very small amounts of spin-waiting would reduce efficiency. Many customers running in these types of environments disable spin-waiting in the thread pool since that tends to show up at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, our spin waiting is tuned to maximize RPS on overloaded machine that's not a very realistic scenario.
I am curious what you are going to find out with your measurements. What do you plan to measure?
My empirical observation is that our synchronization primitives tend to spin wait a lot more compared to the non-.NET synchronization primitives (e.g. the locks that come with the OS). I would expect that some spin waiting is still beneficial at 50% CPU usage, but the sweet spot is likely a lot less than our current default (like somewhere between 2x - 10x less spinning).
It may be beneficial to reduce the spin-waiting by default, regardless of wherever we run in a CPU limited container or not. We have switched GC to DATAS by default for similar reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, our spin waiting is tuned to maximize RPS on overloaded machine that's not a very realistic scenario.
In the thread pool, yes to some degree, but not really in locks. The spin-waiting duration in locks was based on a couple of things - context-switch overhead, such that a successful spin-wait would typically use less CPU than a context-switch, and latency of other threads waiting for the lock to acquire the lock. Issues arise when spin-waits fail, because their CPU usage is wasted. Shorter spin-waits are inherently more likely to fail, so there's some balance to be struck. For cases where spin-waits almost always fail (locks that are held for long-enough), the Lock
type has a mechanism to almost disable spin-waiting. We don't have a good adaptive scheme yet for lock spin-waiting for those in-between cases.
My empirical observation is that our synchronization primitives tend to spin wait a lot more compared to the non-.NET synchronization primitives (e.g. the locks that come with the OS). I would expect that some spin waiting is still beneficial at 50% CPU usage, but the sweet spot is likely a lot less than our current default (like somewhere between 2x - 10x less spinning).
It may be beneficial to reduce the spin-waiting by default, regardless of wherever we run in a CPU limited container or not.
I think it's always beneficial to have a basic lock primitive that does not spin-wait at all, as it can be used for many purposes, including where performance is not of concern, or as a building block for other synchronization primitives when combined with a condition variable, where any spin-waiting may be more appropriate in the higher-level primitive, so it's conceivable that the OS lock primitives would not spin-wait by default.
I think it would have been reasonable to not spin-wait in .NET locks by default, with an option to spin-wait. .NET locks have been spin-waiting historically (perhaps because the old implementation would run into lock convoys if not for spin-waiting, though some spin-waiting was kept around afterwards as well), so we have to consider the tradeoffs. Decreasing spin-waiting in locks by a large factor would make those spin-waits much less likely to succeed, and other threads waiting for the lock would much more likely have to do a full wait anyway, which together significantly reduces the value of the spin-waits. The number of cases that would benefit from very short spin-waits would dwindle. That's not to say a very short spin-wait wouldn't be beneficial, there are always tradeoffs, but aside from some types of microbenchmarks and some perf-critical cases, I feel it would be difficult to show that that would be useful to do in general and better left to an API option while disabling spin-waiting by default.
The thread pool's spin-waiting is rather different, and there are some issues there. An adaptive spin-waiting scheme seems somewhat feasible there, though it also would not work in CPU quota-limited environments due to the unpredictability, or may at least require additional configuration, and is being looked into.
What do you plan to measure?
For this PR, Json, Fortunes, Orchard, with max and fixed lower throughputs using an appropriate client, with various CPU quota limits, and checking CPU usage / efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would have been reasonable to not spin-wait in .NET locks by default, with an option to spin-wait.
I do not think it would be reasonable. It is impossible for users to manually set these defaults correctly at scale.
As far as I know, basic locks in all current OSes or programming languages out there have some spin waiting built-in by default, but it is generally less spin waiting than our defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is impossible for users to manually set these defaults correctly at scale.
It's also impossible for any one hard-coded lock spin-waiting scheme (in terms of spin-wait duration) to work similarly at different scales. The tradeoffs would depend on how much contention there is, how frequently locks are reacquired by the same thread, how long locks are held, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, basic locks in all current OSes or programming languages out there have some spin waiting built-in by default, but it is generally less spin waiting than our defaults.
A very-short-duration spin-wait may be beneficial in a scenario where the lock is held for a very short duration, is very lightly contended, say two threads, and maybe there is not much reacquiring of the lock by the same thread. It can be reasonable to optimize for that and deoptimize other cases, as always there will be tradeoffs.
GC is major source of spin waiting. Are you going to apply this policy in the GC as well? |
We were going to start with the thread pool and locks, as the thread pool in particular shows up often. It could be expanded later to other components as appropriate. |
Fixes #112869