Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove partitioning from CancellationTokenSource #48251

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 12, 2021

When CancellationTokenSource was originally created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation).

As long as I was refactoring a bunch of code, I fixed up a few other things along the way:

  • Avoided allocating while holding the instance's spin lock
  • Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method
  • Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case
  • Used existing throw helpers in a few more cases
  • Renamed a few methods, and made a few others to be local functions
Method Toolchain Mean Ratio Allocated
CreateTokenDispose master 8.251 ns 1.00 64 B
CreateTokenDispose pr 7.141 ns 0.87 48 B
CreateRegisterDispose master 82.698 ns 1.00 352 B
CreateRegisterDispose pr 62.206 ns 0.75 192 B
CreateLinkedTokenDispose master 47.884 ns 1.00 80 B
CreateLinkedTokenDispose pr 43.120 ns 0.90 64 B
CreateManyRegisterDispose master 39,908,863.187 ns 1.00 359 B
CreateManyRegisterDispose pr 35,791,001.099 ns 0.90 199 B
CreateManyRegisterMultipleDispose master 226,725,400.000 ns 1.00 704 B
CreateManyRegisterMultipleDispose pr 198,718,823.810 ns 0.88 544 B
CreateRegisterParallelDispose master 133,322,908.333 ns 1.00 6,070 B
CreateRegisterParallelDispose pr 289,767,730.645 ns 2.18 2,608 B
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Threading;
using System.Threading.Tasks;

[MemoryDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    private CancellationTokenSource _source = new CancellationTokenSource();

    [Benchmark]
    public CancellationToken CreateTokenDispose()
    {
        using (var cts = new CancellationTokenSource())
            return cts.Token;
    }

    [Benchmark]
    public void CreateRegisterDispose()
    {
        using (var cts = new CancellationTokenSource())
            cts.Token.Register(s => { }, null).Dispose();
    }

    [Benchmark]
    public CancellationToken CreateLinkedTokenDispose()
    {
        using (var cts = CancellationTokenSource.CreateLinkedTokenSource(_source.Token))
            return cts.Token;
    }

    [Benchmark]
    public void CreateManyRegisterDispose()
    {
        using (var cts = new CancellationTokenSource())
        {
            CancellationToken ct = cts.Token;
            for (int i = 0; i < 1_000_000; i++)
            {
                ct.Register(s => { }, null).Dispose();
            }
        }
    }

    [Benchmark]
    public void CreateManyRegisterMultipleDispose()
    {
        using (var cts = new CancellationTokenSource())
        {
            CancellationToken ct = cts.Token;
            for (int i = 0; i < 1_000_000; i++)
            {
                var ctr1 = ct.Register(s => { }, null);
                var ctr2 = ct.Register(s => { }, null);
                var ctr3 = ct.Register(s => { }, null);
                var ctr4 = ct.Register(s => { }, null);
                var ctr5 = ct.Register(s => { }, null);
                ctr5.Dispose();
                ctr4.Dispose();
                ctr3.Dispose();
                ctr2.Dispose();
                ctr1.Dispose();
            }
        }
    }

    [Benchmark]
    public void CreateRegisterParallelDispose()
    {
        using (var cts = new CancellationTokenSource())
        {
            async Task RegisterDisposeAsync()
            {
                for (int i = 0; i < 100_000; i++)
                {
                    var ctr = cts.Token.Register(s => { }, null);
                    await Task.Yield();
                    ctr.Dispose();
                }
            }

            Task.WaitAll(
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync(),
                RegisterDisposeAsync()
                );
        }
    }
}

@stephentoub stephentoub added this to the 6.0.0 milestone Feb 12, 2021
When CancellationTokenSource was original created, the expectation was that a majority use case would be lots of threads in parallel registering and unregistering handlers. This led to a design where CTS internally partitions its registrations to minimize contention between threads contending on its internal data structures. While that certainly comes up in practice, a much more common case is just one thread registering and unregistering at a time as a CancellationToken unique to a particular operation (e.g. a linked token source) is passed down through it, with various levels of the chain registering and unregistering from that non-concurrently-used token source. And having such partitioning results in non-trivial allocation overheads, in particular for a short-lived CTS with which only one or a few registrations are employed in its lifetime. This change removes that partitioning scheme; all scenarios end up with less memory allocation, and non-concurrent scenarios end up measurably faster... scenarios where there is contention do take a measurable hit, but given that's the rare case, it's believed to be the right trade-off (when in doubt, it's also the simpler implementation).

As long as I was refactoring a bunch of code, I fixed up a few other things along the way:
- Avoided allocating while holding the instance's spin lock
- Made WaitForCallbackAsync into a polling async method rather than an async-over-sync method
- Changed the state values to be 0-based to avoid needing to initialize _state to something other than 0 in the common case
- Used existing throw helpers in a few more cases
- Renamed a few methods, and made a few others to be local functions
@stephentoub
Copy link
Member Author

cc: @kouvel, @adamsitnik, @carlossanlop, @jozkee

private const int NotifyingCompleteState = 3;
private const int NotCanceledState = 0; // default value of _state
private const int NotifyingState = 1;
private const int NotifyingCompleteState = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this will break any debugger logic? Even if not, I could see this kind of change causing some confusion if someone manually debugging looked up the wrong version of the CTS source code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any debugging code paying attention to CancellationTokenSource's private _state field and interpreting its value. In the rare case where we're aware of that (e.g. a few places in Task), we like to annotate it with a comment, and there's no such comment here.

/// Separated out into a separate instance to keep CancellationTokenSource smaller for the case where one is created but nothing is registered with it.
/// This happens not infrequently, in particular when one is created for an operation that ends up completing synchronously / quickly.
/// </remarks>
internal sealed class Registrations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a perf benefit or something to sealing an internal class with no virtual methods? I see CallbackPartition was also sealed. I'm just curious.

Copy link
Member Author

@stephentoub stephentoub Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren't any virtuals or interface implementations, not much. I'm just in the habit of sealing everything until reason dictates otherwise :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinst is a bit faster for sealed classes

Yup, for sealed types an is can then be achieved similar to GetType() == typeof(Target).

@stephentoub stephentoub merged commit 481628f into dotnet:master Feb 18, 2021
@stephentoub stephentoub deleted the tokenpartitions branch February 18, 2021 22:00
@MihaZupan
Copy link
Member

For reference, in YARP, this change alone saves 4 allocations (456 bytes) per request

@ghost ghost locked as resolved and limited conversation to collaborators Apr 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants