Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CancellationTokenSource pooling #876

Closed
wants to merge 3 commits into from

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Mar 25, 2021

Saves all cancellation-related allocation overhead in YARP's control.
That is: the linked CTS and its internals - CTS Registrations, CTS CallbackNode and TimerQueueTimer.

We reuse CTSs, but also avoid resetting the underlying timer (CancelAfter) within time buckets to save some TimerQueueTimer allocations and registations with the TimerQueue.

Open questions:

  • This doesn't really have to be a configurable instance pool - we could save a few instructions by having it be static and hardcoding values
  • Majority of pooled CTSs will eventually end up on Gen 2, which means any cancellation will result in throwing away objects on Gen 2. Could this be a concern?
  • What are reasonable defaults? Right now it's using a 2 second precision, which means a request might be canceled after 98 seconds instead of 100. This seems very conservative, but increasing the window also wouldn't give us much.

For the HTTP-HTTP 100B scenario, this gives us:

  • ~1% throughput increase
  • ~5% fewer allocated bytes (4 objects/request)

On 5.0 the difference is bigger since CTS machinery in 6.0 was already improved.
I will look into exact numbers for 5.0 allocations and post profiles here.

@MihaZupan MihaZupan added this to the YARP 1.0.0-preview11 milestone Mar 25, 2021
@MihaZupan MihaZupan self-assigned this Mar 25, 2021
@MihaZupan
Copy link
Member Author

cc: @davidfowl @stephentoub

@stephentoub
Copy link
Member

I was under the impression we were only going to do this on .NET 6 once dotnet/runtime#48492 is available. Not the case?

@MihaZupan
Copy link
Member Author

once dotnet/runtime#48492 is available

For Kestrel, yes.

Yarp doesn't hand the CancellationTokens to user-code, so we can be sure there aren't any leftover registrations on the CTS.

@Tratcher
Copy link
Member

Yarp doesn't hand the CancellationTokens to user-code, so we can be sure there aren't any leftover registrations on the CTS.

What about user provided DelegatingHandlers?

@Tratcher
Copy link
Member

How about you walk us through this in today's design meeting?

@MihaZupan
Copy link
Member Author

Sure thing

@Tratcher
Copy link
Member

Meeting notes: Please benchmark the simple version that pools CTS's but not timers. See what proportion of the gains we still get.

@MihaZupan
Copy link
Member Author

MihaZupan commented Apr 8, 2021

Perf difference between the approaches is within the margin of error, not justifying the complexity of the original.

Allocations on 5.0.201 and 6.0.100-preview.3.21153.9:

5before
5after

6before
6after

For completeness, I tested it on 6.0.100-preview.4.21208.1 (which includes dotnet/aspnetcore#31466).
We are down to ~0 cancellation related allocations/request! 🎉

src/ReverseProxy/Utilities/PooledCTS.cs Outdated Show resolved Hide resolved
_registration.Dispose();
_registration = default;

// TODO: Replace CancelAfter(Timeout.Infinite) & IsCancellationRequested with TryReset in 6.0+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race here right @stephentoub ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There's a small risk that the timer has already fired and queued a work item to run the callback that will transition the CTS, but that the work item hasn't yet executed: if that happens, you could end up reusing that CTS for another operation and have that second operation quickly canceled when the timer's queued work item fires. TryReset handles that race condition by returning false if the work item was queued even if it hasn't yet run.

That means you have three options:

  1. Only employ this pooling when building for .NET 6.
  2. Accept that race condition might happen, which means you might sporadically cancel operations that didn't actually time out but rather a previous one did.
  3. Only pool the CTS, and instead of using CancelAfter, use a Timer directly, then before calling IsCancellationRequested use Timer.Dispose(WaitHandle) or DisposeAsync to ensure all work associated with the timer has quiesced, and only then making a decision about IsCancellationRequested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a Timer would force us to allocate 3 objects/request :/

What if we increased the race condition window to insert arbitrarily huge time here - say 10 seconds?
We can take a timestamp when calling CancelAfter and before returning to the pool. Something like MihaZupan@377bf07

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a single timer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we increased the race condition window to insert arbitrarily huge time here - say 10 seconds?

It's still a race condition. I can almost guarantee at some point something will be erroneously canceled (just look at how many times we hit 30 or 60 second timeouts in networking tests in CI for things that should be very fast). We would need to be ok with that.

Personally, I prefer to just see this optimization done for .NET 6. It's one of the key goals of yarp: find places we can make the platform better, do so, and take advantage of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer to just see this optimization done for .NET 6. It's one of the key goals of yarp: find places we can make the platform better, do so, and take advantage of it.

That's a good solution here. Perf is good, and this taught us how to get it, but we can't compromise reliability on older versions.

Copy link
Member Author

@MihaZupan MihaZupan Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with dropping this for < 6.

It's still a race condition. I can almost guarantee at some point something will be erroneously canceled

For argument's sake, would that matter? The conditions under which such a race would occur would mean that threads are taking many seconds before getting scheduled. Under such loads/resource exhaustion, everything else would be falling apart too - other timers wouldn't be getting scheduled, other cancellations/timeouts wouldn't fire or everything would be timing out, threadpool is on fire ... At that point, would it matter if a request is "cancelled by mistake"?


cts._registration = linkedToken.UnsafeRegister(_linkedTokenCancelDelegate, cts);

cts._safeToReuseBeforeTimestamp = Stopwatch.GetTimestamp() + (long)((timeout.Ticks - SafeToReuseTicks) * _stopwatchTicksPerTimeSpanTick);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you get weird results if the timeout is less than SafeToReuseTicks (10s)?

10s is a arbitrary margin of safety. We should wait for the deterministic 6.0 API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timeout is shorter than 10s, we would never reuse the token.

10s is arbitrary, but still huge - it means your app is already on fire.
As an extreme, would you be okay with a 90s safety on a 100s timeout - meaning we would only reuse the CTS if the request finished within 10 seconds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to agree to a non-deterministic solution when we know a deterministic one is possible in 6.0. For a 1% improvement I'm content to wait for 6.0.

@MihaZupan
Copy link
Member Author

Closing in favour of MihaZupan@6d9d972 for when we retarget for 6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants