Enable pooling for async ValueTask/ValueTask<T> methods

Today `async ValueTask/ValueTask<T>` methods use builders that special-case the synchronously completing case (to just return a `default(ValueTask)` or `new ValueTask<T>(result))` but defer to the equivalent of `async Task/Task<T>` for when the operation completes asynchronously. This, however, doesn't take advantage of the fact that value tasks can wrap arbitrary `IValueTaskSource/IValueTaskSource<T>` implementations. This commit gives `AsyncValueTaskMethodBuilder` and `AsyncValueTaskMethodBuilder<T>` the ability to use pooled `IValueTaskSource/IValueTaskSource<T>` instances, such that calls to an `async ValueTask/ValueTask<T>` method incur 0 allocations (ammortized) as long as there's a cached object available. Currently the pooling is behind a feature flag, requiring opt-in via the DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS environment variable (setting it to "true" or "1"). This is done for a few reasons: - There's a breaking change here, in that while we say/document that `ValueTask/ValueTask<T>`s are more limited in what they can be used for, nothing in the implementation actually stops a `ValueTask` that was wrapping a `Task` from being used as permissively as `Task`, e.g. if you were to `await` such a `ValueTask` multiple times, it would happen to work, even though we say "never do that". This change means that anyone who was relying on such undocumented behaviors will now be broken. I think this is a reasonable thing to do in a major release, but I also want feedback and a lot of runway on it. - Policy around pooling. Pooling is always a difficult thing to tune. Right now I've chosen a policy that limits the number of pooled objects per state machine type to an arbitrary multiple of the processor count, and that tries to strike a balance between contention and garbage by using a spin lock and if there's any contention while trying to get or return a pooled object, the cache is ignored. We will need to think hard about what policy to use here. It's also possible it could be tuned per state machine, e.g. by having an attribute that's looked up via reflection when creating the cache for a state machine, but that'll add a lot of first-access overhead to any `async ValueTask/ValueTask<T>` method. For now, it's tunable via the `DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKSLIMIT` environment variable, which may be set to the maximum number of pooled objects desired per state machine. - Performance validation. Preliminary numbers show that this accomplishes its goal, having on-par throughput with the current implementation but with significantly less allocation. That needs to be validated at scale and across a variety of workloads. - Diagnostics. There are several diagnostics-related abilities available for `async Task/Task<T>` methods that are not included here when using `async ValueTask/ValueTask<T>` when pooling. We need to evaluate these (e.g. tracing) and determine which pieces need to be enabled and which we're fine omitting. Before shipping .NET 5, we could choose to flip the polarity of the switch (making it opt-out rather than opt-in), remove the fallback altogether and just have it be always on, or revert this change, all based on experimentation and data we receive between now and then.
dotnet · Oct 23, 2019 · c85cc72 · c85cc72
1 parent 96832c8
commit c85cc72
Show file tree

Hide file tree

Showing 7 changed files with 654 additions and 139 deletions.
diff --git a/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs b/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs
@@ -62,7 +62,7 @@ public static void Start<TStateMachine>(ref TStateMachine stateMachine) where TS
             }
         }
 
-        public static void SetStateMachine(IAsyncStateMachine stateMachine, Task task)
+        public static void SetStateMachine(IAsyncStateMachine stateMachine, Task? task)
         {
             if (stateMachine == null)
             {

diff --git a/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncTaskCache.cs b/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncTaskCache.cs
@@ -12,16 +12,36 @@ namespace System.Runtime.CompilerServices
     internal static class AsyncTaskCache
     {
         /// <summary>A cached Task{Boolean}.Result == true.</summary>
-        internal static readonly Task<bool> s_trueTask = CreateCacheableTask(true);
+        internal static readonly Task<bool> s_trueTask = CreateCacheableTask(result: true);
         /// <summary>A cached Task{Boolean}.Result == false.</summary>
-        internal static readonly Task<bool> s_falseTask = CreateCacheableTask(false);
+        internal static readonly Task<bool> s_falseTask = CreateCacheableTask(result: false);
         /// <summary>The cache of Task{Int32}.</summary>
         internal static readonly Task<int>[] s_int32Tasks = CreateInt32Tasks();
         /// <summary>The minimum value, inclusive, for which we want a cached task.</summary>
         internal const int InclusiveInt32Min = -1;
         /// <summary>The maximum value, exclusive, for which we want a cached task.</summary>
         internal const int ExclusiveInt32Max = 9;
 
+        /// <summary>true if we should use reusable boxes for async completions of ValueTask methods; false if we should use tasks.</summary>
+        /// <remarks>
+        /// We rely on tiered compilation turning this into a const and doing dead code elimination to make checks on this efficient.
+        /// It's also required for safety that this value never changes once observed, as Unsafe.As casts are employed based on its value.
+        /// </remarks>
+        internal static readonly bool s_valueTaskPoolingEnabled = GetPoolAsyncValueTasksSwitch();
+        /// <summary>Maximum number of boxes that are allowed to be cached per state machine type.</summary>
+        internal static readonly int s_valueTaskPoolingCacheSize = GetPoolAsyncValueTasksLimitValue();
+
+        private static bool GetPoolAsyncValueTasksSwitch()
+        {
+            string? value = Environment.GetEnvironmentVariable("DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS");
+            return value != null && (bool.IsTrueStringIgnoreCase(value) || value.Equals("1"));
+        }
+
+        private static int GetPoolAsyncValueTasksLimitValue() =>
+            int.TryParse(Environment.GetEnvironmentVariable("DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKSLIMIT"), out int result) && result > 0 ?
+                result :
+                Environment.ProcessorCount * 4; // arbitrary default value
+
         /// <summary>Creates a non-disposable task.</summary>
         /// <typeparam name="TResult">Specifies the result type.</typeparam>
         /// <param name="result">The result for the task.</param>

diff --git a/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncTaskMethodBuilder.cs b/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncTaskMethodBuilder.cs
@@ -21,15 +21,12 @@ public struct AsyncTaskMethodBuilder
         /// <summary>A cached VoidTaskResult task used for builders that complete synchronously.</summary>
         private static readonly Task<VoidTaskResult> s_cachedCompleted = AsyncTaskMethodBuilder<VoidTaskResult>.s_defaultResultTask;
 
-        /// <summary>The generic builder object to which this non-generic instance delegates.</summary>
-        private AsyncTaskMethodBuilder<VoidTaskResult> m_builder; // mutable struct: must not be readonly. Debugger depends on the exact name of this field.
+        /// <summary>The lazily-initialized built task.</summary>
+        private Task<VoidTaskResult>? m_task; // Debugger depends on the exact name of this field.
 
         /// <summary>Initializes a new <see cref="AsyncTaskMethodBuilder"/>.</summary>
         /// <returns>The initialized <see cref="AsyncTaskMethodBuilder"/>.</returns>
-        public static AsyncTaskMethodBuilder Create() =>
-            // m_builder should be initialized to AsyncTaskMethodBuilder<VoidTaskResult>.Create(), but on coreclr
-            // that Create() is a nop, so we can just return the default here.
-            default;
+        public static AsyncTaskMethodBuilder Create() => default;
 
         /// <summary>Initiates the builder's execution with the associated state machine.</summary>
         /// <typeparam name="TStateMachine">Specifies the type of the state machine.</typeparam>
@@ -44,7 +41,7 @@ public void Start<TStateMachine>(ref TStateMachine stateMachine) where TStateMac
         /// <exception cref="System.ArgumentNullException">The <paramref name="stateMachine"/> argument was null (Nothing in Visual Basic).</exception>
         /// <exception cref="System.InvalidOperationException">The builder is incorrectly initialized.</exception>
         public void SetStateMachine(IAsyncStateMachine stateMachine) =>
-            m_builder.SetStateMachine(stateMachine);
+            AsyncMethodBuilderCore.SetStateMachine(stateMachine, task: null);
 
         /// <summary>
         /// Schedules the specified state machine to be pushed forward when the specified awaiter completes.
@@ -57,7 +54,7 @@ public void AwaitOnCompleted<TAwaiter, TStateMachine>(
             ref TAwaiter awaiter, ref TStateMachine stateMachine)
             where TAwaiter : INotifyCompletion
             where TStateMachine : IAsyncStateMachine =>
-            m_builder.AwaitOnCompleted(ref awaiter, ref stateMachine);
+            AsyncTaskMethodBuilder<VoidTaskResult>.AwaitOnCompleted(ref awaiter, ref stateMachine, ref m_task);
 
         /// <summary>
         /// Schedules the specified state machine to be pushed forward when the specified awaiter completes.
@@ -66,19 +63,32 @@ public void AwaitOnCompleted<TAwaiter, TStateMachine>(
         /// <typeparam name="TStateMachine">Specifies the type of the state machine.</typeparam>
         /// <param name="awaiter">The awaiter.</param>
         /// <param name="stateMachine">The state machine.</param>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
         public void AwaitUnsafeOnCompleted<TAwaiter, TStateMachine>(
             ref TAwaiter awaiter, ref TStateMachine stateMachine)
             where TAwaiter : ICriticalNotifyCompletion
             where TStateMachine : IAsyncStateMachine =>
-            m_builder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine);
+            AsyncTaskMethodBuilder<VoidTaskResult>.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine, ref m_task);
 
         /// <summary>Gets the <see cref="System.Threading.Tasks.Task"/> for this builder.</summary>
         /// <returns>The <see cref="System.Threading.Tasks.Task"/> representing the builder's asynchronous operation.</returns>
         /// <exception cref="System.InvalidOperationException">The builder is not initialized.</exception>
         public Task Task
         {
             [MethodImpl(MethodImplOptions.AggressiveInlining)]
-            get => m_builder.Task;
+            get => m_task ?? InitializeTaskAsPromise();
+        }
+
+        /// <summary>
+        /// Initializes the task, which must not yet be initialized.  Used only when the Task is being forced into
+        /// existence when no state machine is needed, e.g. when the builder is being synchronously completed with
+        /// an exception, when the builder is being used out of the context of an async method, etc.
+        /// </summary>
+        [MethodImpl(MethodImplOptions.NoInlining)]
+        private Task<VoidTaskResult> InitializeTaskAsPromise()
+        {
+            Debug.Assert(m_task == null);
+            return m_task = new Task<VoidTaskResult>();
         }
 
         /// <summary>
@@ -87,7 +97,20 @@ public Task Task
         /// </summary>
         /// <exception cref="System.InvalidOperationException">The builder is not initialized.</exception>
         /// <exception cref="System.InvalidOperationException">The task has already completed.</exception>
-        public void SetResult() => m_builder.SetResult(s_cachedCompleted); // Using s_cachedCompleted is faster than using s_defaultResultTask.
+        public void SetResult()
+        {
+            // Get the currently stored task, which will be non-null if get_Task has already been accessed.
+            // If there isn't one, store the supplied completed task.
+            if (m_task is null)
+            {
+                m_task = s_cachedCompleted;
+            }
+            else
+            {
+                // Otherwise, complete the task that's there.
+                AsyncTaskMethodBuilder<VoidTaskResult>.SetExistingTaskResult(m_task, default!);
+            }
+        }
 
         /// <summary>
         /// Completes the <see cref="System.Threading.Tasks.Task"/> in the
@@ -97,7 +120,8 @@ public Task Task
         /// <exception cref="System.ArgumentNullException">The <paramref name="exception"/> argument is null (Nothing in Visual Basic).</exception>
         /// <exception cref="System.InvalidOperationException">The builder is not initialized.</exception>
         /// <exception cref="System.InvalidOperationException">The task has already completed.</exception>
-        public void SetException(Exception exception) => m_builder.SetException(exception);
+        public void SetException(Exception exception) =>
+            AsyncTaskMethodBuilder<VoidTaskResult>.SetException(exception, ref m_task);
 
         /// <summary>
         /// Called by the debugger to request notification when the first wait operation
@@ -106,7 +130,8 @@ public Task Task
         /// <param name="enabled">
         /// true to enable notification; false to disable a previously set notification.
         /// </param>
-        internal void SetNotificationForWaitCompletion(bool enabled) => m_builder.SetNotificationForWaitCompletion(enabled);
+        internal void SetNotificationForWaitCompletion(bool enabled) =>
+            AsyncTaskMethodBuilder<VoidTaskResult>.SetNotificationForWaitCompletion(enabled, ref m_task);
 
         /// <summary>
         /// Gets an object that may be used to uniquely identify this builder to the debugger.
@@ -116,6 +141,7 @@ public Task Task
         /// It must only be used by the debugger and tracing purposes, and only in a single-threaded manner
         /// when no other threads are in the middle of accessing this property or this.Task.
         /// </remarks>
-        internal object ObjectIdForDebugger => m_builder.ObjectIdForDebugger;
+        internal object ObjectIdForDebugger =>
+            m_task ??= AsyncTaskMethodBuilder<VoidTaskResult>.CreateWeaklyTypedStateMachineBox();
     }
 }
-Original file line number
+Diff line change
@@ Expand Up @@
                 }
             }
-            public static void SetStateMachine(IAsyncStateMachine stateMachine, Task task)
+            public static void SetStateMachine(IAsyncStateMachine stateMachine, Task? task)
             {
                 if (stateMachine == null)
                 {
@@ Expand Down @@