-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: opportunities for improving async state machine codegen #6916
Comments
The degenerate switch case will commonly happen when "awaits" are in a "try" region. We dispatch the execution according to the current state to the appropriate resume point. However, we cannot dispatch directly to resume points that are inside "try", so we dispatch those to a label right before the "try" and then sub-dispatch inside the try. As a result you may have contiguous segments of the outer jump table filled with the same label. - That would be the label before the try. This pattern does not seem likely to happen in code written by hand, but could be frequent enough in synthetic switches. I guess we could do better when lowering switches. |
Related to dotnet/coreclr#7914 Where the offending pattern was observed in the IL generated for the Kestrel web server and was recommended as an easy improvement. Becasue of cascaded dispatching into "try" regions , it is not uncommon to see async state machine to contain degenerate switches like IL_0008: switch ( IL_003b, IL_003b, IL_003b, IL_003b, IL_003b, IL_003b, IL_003b, IL_0729) Note numerous cases all leading to the same target. We can trivially emit such switches as just a range-check (that switch would need to do anyways), without any actual "switching". Since all cases lead to the same label anyways, there is no need to switch once we know the value is in range of the switch. Fixes:#14878
Related to dotnet/coreclr#7914 Where the offending pattern was observed in the IL generated for the Kestrel web server and was recommended as an easy improvement. Becasue of cascaded dispatching into "try" regions , it is not uncommon to see async state machine to contain degenerate switches like IL_0008: switch ( IL_003b, IL_003b, IL_003b, IL_003b, IL_003b, IL_003b, IL_003b, IL_0729) Note numerous cases all leading to the same target. We can trivially emit such switches as just a range-check (that switch would need to do anyways), without any actual "switching". Since all cases lead to the same label anyways, there is no need to switch once we know the value is in range of the switch. Fixes:#14878
Seeing same from TechEmpower aspnetcore Platform benchmark public async ValueTask OnReadCompletedAsync()
{
await Writer.FlushAsync();
} G_M39073_IG17:
488BCE mov rcx, rsi
480FBF55E0 movsx rdx, word ptr [rbp-20H]
49BB9010BF3BFA7F0000 mov r11, 0x7FFA3BBF1090
48B89010BF3BFA7F0000 mov rax, 0x7FFA3BBF1090
3909 cmp dword ptr [rcx], ecx
FF10 call qword ptr [rax]IValueTaskSource`1:GetResult(short):struct:this
8BF8 mov edi, eax
G_M39073_IG18:
40887DB8 mov byte ptr [rbp-48H], dil
G_M39073_IG19:
488B5510 mov rdx, bword ptr [rbp+10H] ; rdx load
C74208FEFFFFFF mov dword ptr [rdx+8], -2
488B5510 mov rdx, bword ptr [rbp+10H] ; rdx unchanged?
3912 cmp dword ptr [rdx], edx
488B5510 mov rdx, bword ptr [rbp+10H] ; rdx unchanged?
488D7210 lea rsi, bword ptr [rdx+16]
807E0100 cmp byte ptr [rsi+1], 0
7430 je SHORT G_M39073_IG21
3936 cmp dword ptr [rsi], esi
488D4E08 lea rcx, bword ptr [rsi+8]
48BA882626F79D010000 mov rdx, 0x19DF7262688
488B12 mov rdx, gword ptr [rdx]
48833900 cmp gword ptr [rcx], 0
7507 jne SHORT G_M39073_IG20
E8A7A87C5F call CORINFO_HELP_CHECKED_ASSIGN_REF
EB13 jmp SHORT G_M39073_IG22
G_M39073_IG20:
C6458000 mov byte ptr [rbp-80H], 0
480FBE5580 movsx rdx, byte ptr [rbp-80H]
E807B8A14D call AsyncTaskMethodBuilder`1:SetExistingTaskResult(struct):this
EB03 jmp SHORT G_M39073_IG22 |
See also dotnet/coreclr#21973, where we might be able to achieve write-through like codegen by CSEing reads of DNER locals. |
I hit a similar situation too, with async methods that return
Benchmark codeusing BenchmarkDotNet.Attributes;
using System.Runtime.ExceptionServices;
public class ShallowRecursionBenchmark
{
public const uint N = 20;
[Benchmark]
public int Plain()
{
return Impl(N);
static int Impl(uint n)
{
if (n == 0)
{
return 1;
}
return Impl(n - 1) + Impl(n - 1);
}
}
[Benchmark]
public int PlainWithTryCatch()
{
return Impl(N);
static int Impl(uint n)
{
try
{
if (n == 0)
{
return 1;
}
return Impl(n - 1) + Impl(n - 1);
}
catch (Exception e)
{
ExceptionDispatchInfo.Throw(e);
return 0;
}
}
}
[Benchmark]
public int ValueTask()
{
return Impl(N).GetAwaiter().GetResult();
static async ValueTask<int> Impl(uint n)
{
if (n == 0)
{
return 1;
}
return await Impl(n - 1) + await Impl(n - 1);
}
}
} |
There are some idiomatic patterns in the async state machines created by the C# compiler that cause the jit to generate sub-optimal code.
A motivating example is the method
<RequestProcessingAsync>d__2[Context][Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context]:MoveNext():this
from Kestrel (source code).MoveNext
method is a value class method that holds a pointer to the associatedFrame
class. Frame field references inMoveNext
thus must go through one extra level of indirection. The JIT also spills thethis
pointer inMoveNext
, leading to code sequences like the following:The jit should be able to CSE some of these redundant loads. See #6913 for a small repro case. The C# compiler could help the jit by caching
<>4__this
in a local, and perhaps by inserting an early null check.Most of the zeroing comes from structs with GC references. Possible mitigation routes: more aggressive copy (and reverse-copy) prop of structs, tracking struct field lifetimes, or stack packing.
ALWAYS_INLINE
case pulls in a call to a method with an aggressive inlining attribute. We might want to reconsider such cases and force the profitability heuristics to kick in instead. For example:Here there are only 3 targets and there's no interleaving.
More detailed notes and various jit artifacts can be found on this gist.
category:cq
theme:optimization
skill-level:expert
cost:extra-large
The text was updated successfully, but these errors were encountered: