Add CancellationTokenSource pooling #876

MihaZupan · 2021-03-25T16:36:07Z

Saves all cancellation-related allocation overhead in YARP's control.
That is: the linked CTS and its internals - CTS Registrations, CTS CallbackNode and TimerQueueTimer.

We reuse CTSs, but also avoid resetting the underlying timer (CancelAfter) within time buckets to save some TimerQueueTimer allocations and registations with the TimerQueue.

Open questions:

This doesn't really have to be a configurable instance pool - we could save a few instructions by having it be static and hardcoding values
Majority of pooled CTSs will eventually end up on Gen 2, which means any cancellation will result in throwing away objects on Gen 2. Could this be a concern?
What are reasonable defaults? Right now it's using a 2 second precision, which means a request might be canceled after 98 seconds instead of 100. This seems very conservative, but increasing the window also wouldn't give us much.

For the HTTP-HTTP 100B scenario, this gives us:

~1% throughput increase
~5% fewer allocated bytes (4 objects/request)

On 5.0 the difference is bigger since CTS machinery in 6.0 was already improved.
I will look into exact numbers for 5.0 allocations and post profiles here.

MihaZupan · 2021-03-25T17:25:23Z

cc: @davidfowl @stephentoub

stephentoub · 2021-03-26T15:43:25Z

I was under the impression we were only going to do this on .NET 6 once dotnet/runtime#48492 is available. Not the case?

MihaZupan · 2021-03-26T19:11:00Z

once dotnet/runtime#48492 is available

For Kestrel, yes.

Yarp doesn't hand the CancellationTokens to user-code, so we can be sure there aren't any leftover registrations on the CTS.

Tratcher · 2021-03-30T04:54:48Z

Yarp doesn't hand the CancellationTokens to user-code, so we can be sure there aren't any leftover registrations on the CTS.

What about user provided DelegatingHandlers?

Tratcher · 2021-03-30T16:03:18Z

How about you walk us through this in today's design meeting?

MihaZupan · 2021-03-30T16:19:09Z

Sure thing

src/ReverseProxy/Utilities/TimeoutCtsPool.cs

Tratcher · 2021-03-30T22:28:55Z

Meeting notes: Please benchmark the simple version that pools CTS's but not timers. See what proportion of the gains we still get.

MihaZupan · 2021-04-08T23:42:13Z

Perf difference between the approaches is within the margin of error, not justifying the complexity of the original.

Allocations on 5.0.201 and 6.0.100-preview.3.21153.9:

For completeness, I tested it on 6.0.100-preview.4.21208.1 (which includes dotnet/aspnetcore#31466).
We are down to ~0 cancellation related allocations/request! 🎉

src/ReverseProxy/Utilities/PooledCTS.cs

davidfowl · 2021-04-09T00:22:11Z

src/ReverseProxy/Utilities/PooledCTS.cs

+            _registration.Dispose();
+            _registration = default;
+
+            // TODO: Replace CancelAfter(Timeout.Infinite) & IsCancellationRequested with TryReset in 6.0+


There's a race here right @stephentoub ?

Yes. There's a small risk that the timer has already fired and queued a work item to run the callback that will transition the CTS, but that the work item hasn't yet executed: if that happens, you could end up reusing that CTS for another operation and have that second operation quickly canceled when the timer's queued work item fires. TryReset handles that race condition by returning false if the work item was queued even if it hasn't yet run.

That means you have three options:

Only employ this pooling when building for .NET 6.

Accept that race condition might happen, which means you might sporadically cancel operations that didn't actually time out but rather a previous one did.

Only pool the CTS, and instead of using CancelAfter, use a Timer directly, then before calling IsCancellationRequested use Timer.Dispose(WaitHandle) or DisposeAsync to ensure all work associated with the timer has quiesced, and only then making a decision about IsCancellationRequested.

Using a Timer would force us to allocate 3 objects/request :/

What if we increased the race condition window to insert arbitrarily huge time here - say 10 seconds?
We can take a timestamp when calling CancelAfter and before returning to the pool. Something like MihaZupan@377bf07

Could use a single timer.

What if we increased the race condition window to insert arbitrarily huge time here - say 10 seconds?

It's still a race condition. I can almost guarantee at some point something will be erroneously canceled (just look at how many times we hit 30 or 60 second timeouts in networking tests in CI for things that should be very fast). We would need to be ok with that.

Personally, I prefer to just see this optimization done for .NET 6. It's one of the key goals of yarp: find places we can make the platform better, do so, and take advantage of it.

Personally, I prefer to just see this optimization done for .NET 6. It's one of the key goals of yarp: find places we can make the platform better, do so, and take advantage of it.

That's a good solution here. Perf is good, and this taught us how to get it, but we can't compromise reliability on older versions.

I'm fine with dropping this for < 6.

It's still a race condition. I can almost guarantee at some point something will be erroneously canceled

For argument's sake, would that matter? The conditions under which such a race would occur would mean that threads are taking many seconds before getting scheduled. Under such loads/resource exhaustion, everything else would be falling apart too - other timers wouldn't be getting scheduled, other cancellations/timeouts wouldn't fire or everything would be timing out, threadpool is on fire ... At that point, would it matter if a request is "cancelled by mistake"?

Tratcher · 2021-04-13T23:44:24Z

src/ReverseProxy/Utilities/PooledCTS.cs

+
+            cts._registration = linkedToken.UnsafeRegister(_linkedTokenCancelDelegate, cts);
+
+            cts._safeToReuseBeforeTimestamp = Stopwatch.GetTimestamp() + (long)((timeout.Ticks - SafeToReuseTicks) * _stopwatchTicksPerTimeSpanTick);


Do you get weird results if the timeout is less than SafeToReuseTicks (10s)?

10s is a arbitrary margin of safety. We should wait for the deterministic 6.0 API.

If the timeout is shorter than 10s, we would never reuse the token.

10s is arbitrary, but still huge - it means your app is already on fire.
As an extreme, would you be okay with a 90s safety on a 100s timeout - meaning we would only reuse the CTS if the request finished within 10 seconds?

It's hard to agree to a non-deterministic solution when we know a deterministic one is possible in 6.0. For a 1% improvement I'm content to wait for 6.0.

MihaZupan · 2021-04-23T16:21:50Z

Closing in favour of MihaZupan@6d9d972 for when we retarget for 6.0.

MihaZupan added this to the YARP 1.0.0-preview11 milestone Mar 25, 2021

MihaZupan self-assigned this Mar 25, 2021

MihaZupan requested review from alnikola and Tratcher as code owners March 25, 2021 16:36

Kahbazi reviewed Mar 30, 2021

View reviewed changes

src/ReverseProxy/Utilities/TimeoutCtsPool.cs Outdated Show resolved Hide resolved

MihaZupan added 2 commits April 8, 2021 22:13

Add TimeoutCtsPool

3e32341

Simplify!

c8cee13

MihaZupan force-pushed the timeout-cts-pool branch from c96f33e to c8cee13 Compare April 8, 2021 23:07

Tratcher approved these changes Apr 8, 2021

View reviewed changes

src/ReverseProxy/Utilities/PooledCTS.cs Outdated Show resolved Hide resolved

MihaZupan mentioned this pull request Apr 9, 2021

Pool CancellationTokenSources in 6.0+ #902

Closed

davidfowl reviewed Apr 9, 2021

View reviewed changes

Add a Timestamp check before returning a CTS to pool

54966e6

Tratcher reviewed Apr 13, 2021

View reviewed changes

MihaZupan closed this Apr 23, 2021

MihaZupan mentioned this pull request Oct 19, 2021

Pool CancellationTokenSources #1297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CancellationTokenSource pooling #876

Add CancellationTokenSource pooling #876

MihaZupan commented Mar 25, 2021 •

edited

Loading

MihaZupan commented Mar 25, 2021

stephentoub commented Mar 26, 2021

MihaZupan commented Mar 26, 2021

Tratcher commented Mar 30, 2021

Tratcher commented Mar 30, 2021

MihaZupan commented Mar 30, 2021

Tratcher commented Mar 30, 2021

MihaZupan commented Apr 8, 2021 •

edited

Loading

davidfowl Apr 9, 2021

stephentoub Apr 9, 2021

MihaZupan Apr 9, 2021

davidfowl Apr 9, 2021

stephentoub Apr 9, 2021

Tratcher Apr 9, 2021

MihaZupan Apr 9, 2021 •

edited

Loading

Tratcher Apr 13, 2021

MihaZupan Apr 14, 2021

Tratcher Apr 16, 2021

MihaZupan commented Apr 23, 2021


		cts._registration = linkedToken.UnsafeRegister(_linkedTokenCancelDelegate, cts);

		cts._safeToReuseBeforeTimestamp = Stopwatch.GetTimestamp() + (long)((timeout.Ticks - SafeToReuseTicks) * _stopwatchTicksPerTimeSpanTick);

Add CancellationTokenSource pooling #876

Add CancellationTokenSource pooling #876

Conversation

MihaZupan commented Mar 25, 2021 • edited Loading

MihaZupan commented Mar 25, 2021

stephentoub commented Mar 26, 2021

MihaZupan commented Mar 26, 2021

Tratcher commented Mar 30, 2021

Tratcher commented Mar 30, 2021

MihaZupan commented Mar 30, 2021

Tratcher commented Mar 30, 2021

MihaZupan commented Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MihaZupan Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MihaZupan commented Apr 23, 2021

MihaZupan commented Mar 25, 2021 •

edited

Loading

MihaZupan commented Apr 8, 2021 •

edited

Loading

MihaZupan Apr 9, 2021 •

edited

Loading