Improve the rate of thread injection for blocking due to sync-over-async #53471

kouvel · 2021-05-30T01:37:57Z

Fixes Improve the rate of thread injection for blocking due to sync-over-async #52558
Some miscellaneous changes:
- _minThreads and _maxThreads were being modified inside their own lock and used inside the hill climbing lock, so it made sense to merge the two locks
- Separated NumThreadsGoal from ThreadCounts into its own field to simplify some code. The goal is an estimated target and doesn't need to be in perfect sync with the other values in ThreadCounts. The goal was already only modified inside a lock.
- Removed some unnecessary volatile accesses to simplify

ghost · 2021-05-30T01:38:01Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Author:	kouvel
Assignees:	kouvel
Labels:	`area-System.Threading`
Milestone:	6.0.0

kouvel · 2021-05-30T02:08:38Z

Checked perf on thread pool overhead tests on x64 and arm64, no significant difference. Checked perf on ASP.NET platform benchmarks on x64, no significant difference.
Verified config vars are working as expected
Verified throttling rate of thread injection in low-memory situations with Windows job objects and Linux docker containers
Checked some cases involving interaction with starvation and hill climbing heuristics, verified behavior is appropriate. Solution is not quite ideal until we also fix the starvation heuristic, but I tried to make sure that the likelihood of a new issue is low.
Checked the relevant cases from https://github.com/davidfowl/AspNetCoreDiagnosticScenarios and verified that the thread pool is more responsive to compensate for the sync-over-async blocking work
The defaults for config vars are resulting from a reasonable guess arising from brief prior discussions on the topic, there are good reasons for the limits, but we are also not trying to make every real-world really-bad scenario involving sync-over-async work really-well by default. It's a realistic expectation that the really bad cases would involve some configuration. An expectation in those cases where sync-over-async is the only type of blocking happening on thread pool worker threads, is that the new config vars would work better than setting a high min worker thread count, because this solution uses cooperative blocking and can adjust active thread counts up and down appropriately. Setting a high min thread count on the other hand is not a great workaround for that problem because it causes that many threads to always be active, and that's not ideal.

davidfowl · 2021-05-30T05:03:00Z

What do you think about doing this in monitor.wait as well?

kouvel · 2021-05-30T06:22:48Z

What do you think about doing this in monitor.wait as well?

In my opinion, a monitor is too basic of a synchronization primitive to assume that waiting on one would always deserve compensating for. For instance, it's often not beneficial or preferable to add threads to compensate for threads blocking on waiting to acquire a lock.

src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs

- Depends on dotnet/runtime#53471 - In the above change `ThreadCounts._data` was changed from `ulong` to `uint`. Its usage in SOS was reading 8 bytes and only using the lower 4 bytes. Updated to read 4 bytes instead. No functional change, just updated to match.

- Depends on dotnet/runtime#53471 - The change above added a new reason for thread adjustment (`CooperativeBlocking`), added here too

src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.ThreadCounts.cs

stephentoub · 2021-06-01T16:05:06Z

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs

+                    (uint)AppContextConfigHelper.GetInt32Config(
+                        "System.Threading.ThreadPool.Blocking.MaxDelayUntilFallbackMs",
+                        250,
+                        false);


There's a lot of policy in here, and a lot of knobs to go with it. Do you have a sense for how all of this is going to behave in the real-world, and if/how someone would utilize these knobs effectively? How did you arrive at this specific set and also the defaults employed?

Some of the criteria used:

Have a good replacement for setting the MinThreads as a workaround

This would now be to set ThreadsToAddWithoutDelay to the equivalent and set MaxThreadsToAddBeforeFallback to something higher to give some buffer for spikes that may need more threads

MaxThreadsToAddBeforeFallback could also be set to a large value to effectively unlimit the heuristic

Use progressive delays to avoid creating too many threads too quickly

Without that, it would be conceivable that MaxThreadsToAddBeforeFallback threads would be created in short order to respond to even a short sync-over-async IO before the IO even completes (if there are that many work items that would block on the async work)

The delay also helps to create a high watermark of how many threads were necessary last time to unblock, so that when there's a limit to how many work items would block, it would quickly release existing waiting threads to let other work be done meanwhile

The larger the number of threads that get unblocked all at once, the higher the latency of their processing would be after unblocking. There's probably not a good solution to this.

Ideally it would not require as many threads for an async IO completion to unblock waiting threads, sort of like on Windows where a separate pool of threads handles IO completions, needs some experimentation

It's not always clear that adding more threads would help, more so in starvation-type cases

No one set of defaults will work well for all cases, use conservative defaults to start with

The current defaults are much more agressive than before

The MaxThreadsToAddBeforeFallback_ProcCountFactor of 10 came from a prior discussion where we felt that adding 10x the proc count relatively quickly may not be too bad

The defaults can be made more aggressive easily, but it would be difficult to make the defaults more conservative since apps that work well with the defaults may not after that without configuration

The really bad cases where many 100s or even 1000s of threads need to be created will likely need to configure for the app's situation based on expected workload and how bad it can get, in order to work around the issue

Make things sufficiently configurable

It would have been nice to make configurable the delay threshold for detecting starvation and the delay used to add threads during continuous starvation. Now, for sync-over-async the delay and rate of progression in delays can be adjusted.

Similarly to hill climbing config values, the config values don't have to be used but it can be helpful to enable the freedom to configure them

I expect I would suggest most users running into bad blocking due to sync-over-async to configure ThreadsToAddWithoutDelay and MaxThreadsToAddBeforeFallback, and perhaps MaxDelayUntilFallbackMs to control the thread injection latency for spikes

I intend to use the same config values (maybe with a couple of others) for improving the starvation heuristic similarly in the future

Starvation is a bit different and may need a few quirks, but hopefully we can use something similar

Decided to remove MaxThreadsToAddBeforeFallback and renamed MaxDelayBeforeFallbackMs to MaxDelayMs in the latest commit. The max threads limit before falling back to starvation seems unnecessary, it would be unlimited for starvation anyway. Now the only time it would fall back to starvation is in low-memory situations.

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.ThreadCounts.cs

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.cs

Fixes dotnet#52558

…nd fix throughput numbers sent in events

kouvel · 2021-06-02T17:29:44Z

Rebased to fix conflicts

kouvel · 2021-06-07T12:13:58Z

I believe I have addressed the feedback shared so far, any other feedback?

mangod9

LGTM, assuming that some simple scenarios which were exhibiting deadlocks are now more responsive, and there don't seem to be any other regressions?

Also would be good to create a doc issue for the new configs.

kouvel · 2021-06-08T20:00:03Z

LGTM, assuming that some simple scenarios which were exhibiting deadlocks are now more responsive, and there don't seem to be any other regressions?

Thanks! Yes some simple scenarios involving sync-over-async are much more responsive by default, and can be configured sufficiently well to work around high levels of blocking if necessary. I haven't seen any regressions in what I tested above.

Filed dotnet/docs#24566 for updating docs.

davidfowl · 2021-06-10T04:46:50Z

Looking forward to this!

- Depends on dotnet/runtime#53471 - In the above change `ThreadCounts._data` was changed from `ulong` to `uint`. Its usage in SOS was reading 8 bytes and only using the lower 4 bytes. Updated to read 4 bytes instead. No functional change, just updated to match.

kouvel added the area-System.Threading label May 30, 2021

kouvel added this to the 6.0.0 milestone May 30, 2021

kouvel requested review from davidfowl, janvorli and mangod9 May 30, 2021 01:37

kouvel self-assigned this May 30, 2021

kouvel requested a review from marek-safar as a code owner May 30, 2021 02:14

kouvel force-pushed the TpTaskWait branch from f6f6568 to e3a7d67 Compare May 30, 2021 09:42

benaadams reviewed May 30, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs Show resolved Hide resolved

kouvel force-pushed the TpTaskWait branch from 5697a2e to 0fd388a Compare May 30, 2021 20:29

kouvel mentioned this pull request May 31, 2021

Update ThreadCounts usage based on a change dotnet/diagnostics#2324

Merged

kouvel added a commit to kouvel/perfview that referenced this pull request May 31, 2021

Add a thread adjustment reason

358c6b0

- Depends on dotnet/runtime#53471 - The change above added a new reason for thread adjustment (`CooperativeBlocking`), added here too

kouvel mentioned this pull request May 31, 2021

Add a thread adjustment reason microsoft/perfview#1439

Merged