-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the rate of thread injection for blocking due to sync-over-async #53471
Conversation
Tagging subscribers to this area: @mangod9 Issue DetailsFixes #52558
|
|
What do you think about doing this in monitor.wait as well? |
In my opinion, a monitor is too basic of a synchronization primitive to assume that waiting on one would always deserve compensating for. For instance, it's often not beneficial or preferable to add threads to compensate for threads blocking on waiting to acquire a lock. |
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Show resolved
Hide resolved
- Depends on dotnet/runtime#53471 - In the above change `ThreadCounts._data` was changed from `ulong` to `uint`. Its usage in SOS was reading 8 bytes and only using the lower 4 bytes. Updated to read 4 bytes instead. No functional change, just updated to match.
- Depends on dotnet/runtime#53471 - The change above added a new reason for thread adjustment (`CooperativeBlocking`), added here too
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.ThreadCounts.cs
Show resolved
Hide resolved
(uint)AppContextConfigHelper.GetInt32Config( | ||
"System.Threading.ThreadPool.Blocking.MaxDelayUntilFallbackMs", | ||
250, | ||
false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of policy in here, and a lot of knobs to go with it. Do you have a sense for how all of this is going to behave in the real-world, and if/how someone would utilize these knobs effectively? How did you arrive at this specific set and also the defaults employed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the criteria used:
- Have a good replacement for setting the
MinThreads
as a workaround- This would now be to set
ThreadsToAddWithoutDelay
to the equivalent and setMaxThreadsToAddBeforeFallback
to something higher to give some buffer for spikes that may need more threads MaxThreadsToAddBeforeFallback
could also be set to a large value to effectively unlimit the heuristic
- This would now be to set
- Use progressive delays to avoid creating too many threads too quickly
- Without that, it would be conceivable that
MaxThreadsToAddBeforeFallback
threads would be created in short order to respond to even a short sync-over-async IO before the IO even completes (if there are that many work items that would block on the async work) - The delay also helps to create a high watermark of how many threads were necessary last time to unblock, so that when there's a limit to how many work items would block, it would quickly release existing waiting threads to let other work be done meanwhile
- The larger the number of threads that get unblocked all at once, the higher the latency of their processing would be after unblocking. There's probably not a good solution to this.
- Ideally it would not require as many threads for an async IO completion to unblock waiting threads, sort of like on Windows where a separate pool of threads handles IO completions, needs some experimentation
- It's not always clear that adding more threads would help, more so in starvation-type cases
- Without that, it would be conceivable that
- No one set of defaults will work well for all cases, use conservative defaults to start with
- The current defaults are much more agressive than before
- The
MaxThreadsToAddBeforeFallback_ProcCountFactor
of10
came from a prior discussion where we felt that adding 10x the proc count relatively quickly may not be too bad - The defaults can be made more aggressive easily, but it would be difficult to make the defaults more conservative since apps that work well with the defaults may not after that without configuration
- The really bad cases where many 100s or even 1000s of threads need to be created will likely need to configure for the app's situation based on expected workload and how bad it can get, in order to work around the issue
- Make things sufficiently configurable
- It would have been nice to make configurable the delay threshold for detecting starvation and the delay used to add threads during continuous starvation. Now, for sync-over-async the delay and rate of progression in delays can be adjusted.
- Similarly to hill climbing config values, the config values don't have to be used but it can be helpful to enable the freedom to configure them
- I expect I would suggest most users running into bad blocking due to sync-over-async to configure
ThreadsToAddWithoutDelay
andMaxThreadsToAddBeforeFallback
, and perhapsMaxDelayUntilFallbackMs
to control the thread injection latency for spikes
- I intend to use the same config values (maybe with a couple of others) for improving the starvation heuristic similarly in the future
- Starvation is a bit different and may need a few quirks, but hopefully we can use something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to remove MaxThreadsToAddBeforeFallback
and renamed MaxDelayBeforeFallbackMs
to MaxDelayMs
in the latest commit. The max threads limit before falling back to starvation seems unnecessary, it would be unlimited for starvation anyway. Now the only time it would fall back to starvation is in low-memory situations.
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.ThreadCounts.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.cs
Show resolved
Hide resolved
…nd fix throughput numbers sent in events
Rebased to fix conflicts |
I believe I have addressed the feedback shared so far, any other feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming that some simple scenarios which were exhibiting deadlocks are now more responsive, and there don't seem to be any other regressions?
Also would be good to create a doc issue for the new configs.
Thanks! Yes some simple scenarios involving sync-over-async are much more responsive by default, and can be configured sufficiently well to work around high levels of blocking if necessary. I haven't seen any regressions in what I tested above. Filed dotnet/docs#24566 for updating docs. |
Looking forward to this! |
- Depends on dotnet/runtime#53471 - In the above change `ThreadCounts._data` was changed from `ulong` to `uint`. Its usage in SOS was reading 8 bytes and only using the lower 4 bytes. Updated to read 4 bytes instead. No functional change, just updated to match.
_minThreads
and_maxThreads
were being modified inside their own lock and used inside the hill climbing lock, so it made sense to merge the two locksNumThreadsGoal
fromThreadCounts
into its own field to simplify some code. The goal is an estimated target and doesn't need to be in perfect sync with the other values inThreadCounts
. The goal was already only modified inside a lock.