Remove partitioning from CancellationTokenSource #48251

stephentoub · 2021-02-12T23:02:41Z

When CancellationTokenSource was originally created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation).

As long as I was refactoring a bunch of code, I fixed up a few other things along the way:

Avoided allocating while holding the instance's spin lock
Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method
Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case
Used existing throw helpers in a few more cases
Renamed a few methods, and made a few others to be local functions

Method	Toolchain	Mean	Ratio	Allocated
CreateTokenDispose	master	8.251 ns	1.00	64 B
CreateTokenDispose	pr	7.141 ns	0.87	48 B

CreateRegisterDispose	master	82.698 ns	1.00	352 B
CreateRegisterDispose	pr	62.206 ns	0.75	192 B

CreateLinkedTokenDispose	master	47.884 ns	1.00	80 B
CreateLinkedTokenDispose	pr	43.120 ns	0.90	64 B

CreateManyRegisterDispose	master	39,908,863.187 ns	1.00	359 B
CreateManyRegisterDispose	pr	35,791,001.099 ns	0.90	199 B

CreateManyRegisterMultipleDispose	master	226,725,400.000 ns	1.00	704 B
CreateManyRegisterMultipleDispose	pr	198,718,823.810 ns	0.88	544 B

CreateRegisterParallelDispose	master	133,322,908.333 ns	1.00	6,070 B
CreateRegisterParallelDispose	pr	289,767,730.645 ns	2.18	2,608 B

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Threading;
using System.Threading.Tasks;

[MemoryDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    private CancellationTokenSource _source = new CancellationTokenSource();

    [Benchmark]
    public CancellationToken CreateTokenDispose()
    {
        using (var cts = new CancellationTokenSource())
            return cts.Token;
    }

    [Benchmark]
    public void CreateRegisterDispose()
    {
        using (var cts = new CancellationTokenSource())
            cts.Token.Register(s => { }, null).Dispose();
    }

    [Benchmark]
    public CancellationToken CreateLinkedTokenDispose()
    {
        using (var cts = CancellationTokenSource.CreateLinkedTokenSource(_source.Token))
            return cts.Token;
    }

    [Benchmark]
    public void CreateManyRegisterDispose()
    {
        using (var cts = new CancellationTokenSource())
        {
            CancellationToken ct = cts.Token;
            for (int i = 0; i < 1_000_000; i++)
            {
                ct.Register(s => { }, null).Dispose();
            }
        }
    }

    [Benchmark]
    public void CreateManyRegisterMultipleDispose()
    {
        using (var cts = new CancellationTokenSource())
        {
            CancellationToken ct = cts.Token;
            for (int i = 0; i < 1_000_000; i++)
            {
                var ctr1 = ct.Register(s => { }, null);
                var ctr2 = ct.Register(s => { }, null);
                var ctr3 = ct.Register(s => { }, null);
                var ctr4 = ct.Register(s => { }, null);
                var ctr5 = ct.Register(s => { }, null);
                ctr5.Dispose();
                ctr4.Dispose();
                ctr3.Dispose();
                ctr2.Dispose();
                ctr1.Dispose();
            }
        }
    }

    [Benchmark]
    public void CreateRegisterParallelDispose()
    {
        using (var cts = new CancellationTokenSource())
        {
            async Task RegisterDisposeAsync()
            {
                for (int i = 0; i < 100_000; i++)
                {
                    var ctr = cts.Token.Register(s => { }, null);
                    await Task.Yield();
                    ctr.Dispose();
                }
            }

            Task.WaitAll(
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync()
                );
        }
    }
}

When CancellationTokenSource was original created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation). As long as I was refactoring a bunch of code, I fixed up a few other things along the way: - Avoided allocating while holding the instance's spin lock - Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method - Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case - Used existing throw helpers in a few more cases - Renamed a few methods, and made a few others to be local functions

stephentoub · 2021-02-16T02:56:12Z

cc: @kouvel, @adamsitnik, @carlossanlop, @jozkee

halter73 · 2021-02-18T18:54:16Z

src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs

-        private const int NotifyingCompleteState = 3;
+        private const int NotCanceledState = 0; // default value of _state
+        private const int NotifyingState = 1;
+        private const int NotifyingCompleteState = 2;


Is it possible this will break any debugger logic? Even if not, I could see this kind of change causing some confusion if someone manually debugging looked up the wrong version of the CTS source code.

I'm not aware of any debugging code paying attention to CancellationTokenSource's private _state field and interpreting its value. In the rare case where we're aware of that (e.g. a few places in Task), we like to annotate it with a comment, and there's no such comment here.

halter73 · 2021-02-18T19:10:16Z

src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs

+        /// Separated out into a separate instance to keep CancellationTokenSource smaller for the case where one is created but nothing is registered with it.
+        /// This happens not infrequently, in particular when one is created for an operation that ends up completing synchronously / quickly.
+        /// </remarks>
+        internal sealed class Registrations


Is there a perf benefit or something to sealing an internal class with no virtual methods? I see CallbackPartition was also sealed. I'm just curious.

If there aren't any virtuals or interface implementations, not much. I'm just in the habit of sealing everything until reason dictates otherwise :-)

isinst is a bit faster for sealed classes

isinst is a bit faster for sealed classes

Yup, for sealed types an is can then be achieved similar to GetType() == typeof(Target).

MihaZupan · 2021-03-04T12:57:56Z

For reference, in YARP, this change alone saves 4 allocations (456 bytes) per request

stephentoub added area-System.Threading tenet-performance Performance related issue labels Feb 12, 2021

stephentoub added this to the 6.0.0 milestone Feb 12, 2021

stephentoub force-pushed the tokenpartitions branch from 70cfdc5 to b81681f Compare February 12, 2021 23:05

runfoapp bot mentioned this pull request Feb 15, 2021

slicebuffers_success variant tests failing sporadically #47734

Closed

halter73 reviewed Feb 18, 2021

View reviewed changes

halter73 approved these changes Feb 18, 2021

View reviewed changes

stephentoub merged commit 481628f into dotnet:master Feb 18, 2021

stephentoub deleted the tokenpartitions branch February 18, 2021 22:00

ghost locked as resolved and limited conversation to collaborators Apr 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove partitioning from CancellationTokenSource #48251

Remove partitioning from CancellationTokenSource #48251

stephentoub commented Feb 12, 2021 •

edited

Loading

stephentoub commented Feb 16, 2021

halter73 Feb 18, 2021

stephentoub Feb 18, 2021

halter73 Feb 18, 2021

stephentoub Feb 18, 2021 •

edited

Loading

omariom Feb 19, 2021

stephentoub Feb 19, 2021

MihaZupan commented Mar 4, 2021

Remove partitioning from CancellationTokenSource #48251

Remove partitioning from CancellationTokenSource #48251

Conversation

stephentoub commented Feb 12, 2021 • edited Loading

stephentoub commented Feb 16, 2021

halter73 Feb 18, 2021

Choose a reason for hiding this comment

stephentoub Feb 18, 2021

Choose a reason for hiding this comment

halter73 Feb 18, 2021

Choose a reason for hiding this comment

stephentoub Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

omariom Feb 19, 2021

Choose a reason for hiding this comment

stephentoub Feb 19, 2021

Choose a reason for hiding this comment

MihaZupan commented Mar 4, 2021

stephentoub commented Feb 12, 2021 •

edited

Loading

stephentoub Feb 18, 2021 •

edited

Loading