-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove partitioning from CancellationTokenSource #48251
Conversation
When CancellationTokenSource was original created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation). As long as I was refactoring a bunch of code, I fixed up a few other things along the way: - Avoided allocating while holding the instance's spin lock - Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method - Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case - Used existing throw helpers in a few more cases - Renamed a few methods, and made a few others to be local functions
70cfdc5
to
b81681f
Compare
cc: @kouvel, @adamsitnik, @carlossanlop, @jozkee |
private const int NotifyingCompleteState = 3; | ||
private const int NotCanceledState = 0; // default value of _state | ||
private const int NotifyingState = 1; | ||
private const int NotifyingCompleteState = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible this will break any debugger logic? Even if not, I could see this kind of change causing some confusion if someone manually debugging looked up the wrong version of the CTS source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any debugging code paying attention to CancellationTokenSource's private _state field and interpreting its value. In the rare case where we're aware of that (e.g. a few places in Task), we like to annotate it with a comment, and there's no such comment here.
/// Separated out into a separate instance to keep CancellationTokenSource smaller for the case where one is created but nothing is registered with it. | ||
/// This happens not infrequently, in particular when one is created for an operation that ends up completing synchronously / quickly. | ||
/// </remarks> | ||
internal sealed class Registrations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a perf benefit or something to sealing an internal class with no virtual methods? I see CallbackPartition was also sealed. I'm just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't any virtuals or interface implementations, not much. I'm just in the habit of sealing everything until reason dictates otherwise :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinst is a bit faster for sealed classes
Yup, for sealed types an is
can then be achieved similar to GetType() == typeof(Target).
For reference, in YARP, this change alone saves 4 allocations (456 bytes) per request |
When CancellationTokenSource was originally created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation).
As long as I was refactoring a bunch of code, I fixed up a few other things along the way: