JIT: Fix placement of `GT_START_NOGC` for tailcalls in face of bulk copy with write barrier calls #105551

jakobbotsch · 2024-07-26T10:27:41Z

When the JIT generates code for a tailcall it must generate code to write the arguments into the incoming parameter area. Since the GC ness of the arguments of the tailcall may not match the GC ness of the parameters, we have to disable GC before we start writing these. This is done by finding the earliest GT_PUTARG_STK node and placing the start of the NOGC region right before it.

In addition, there is logic to take care of potential overlap between the arguments and parameters. For example, if the call has an operand that uses one of the parameters, then we must take care that we do not override that parameter with the tailcall argument before the use of it. To do so, we sometimes may need to introduce copies from the parameter locals to locals on the stack frame.

This used to work fine, however, with #101761 we started transforming block copies into managed calls in certain scenarios. It was possible for the JIT to decide to introduce a copy to a local and for this transformation to then kick in. This would cause us to end up with the managed helper call after starting the nogc region. In checked builds this would hit an assert during GC scan; in release builds, it would end up with corrupted data.

The fix here is to make sure we insert the GT_START_NOGC after all the potential temporary copies we may introduce as part of the tailcall logic.

There was an additional assumption that the first PUTARG_STK operand was the earliest one in execution order. That is not guaranteed, so this change stops relying on that as well by introducing a new LIR::FirstNode and using that to determine the earliest PUTARG_STK node.

Fix #102370
Fix #104123
Fix #105441

I will backport this to preview 7. For preview 6, a workaround of setting DOTNET_TailCallOpt=0 and DOTNET_ReadyToRun=0 can be utilized.

Codegen diff in a test case:

@@ -1,99 +1,99 @@
 ; Assembly listing for method Program:Foo(System.ValueTuple`2[System.Action`1[Program+LargeStruct],Program+LargeStruct]) (Tier1)
 ; Emitting BLENDED_CODE for X64 with AVX - Unix
 ; Tier1 code
 ; optimized code
 ; rbp based frame
 ; fully interruptible
 ; Final local variable assignments
 ;
 ;  V00 arg0         [V00,T01] (  2,  2   )  struct (104) [rbp+0x10]  do-not-enreg[SF] single-def <System.ValueTuple`2[System.Action`1[Program+LargeStruct],Program+LargeStruct]>
 ;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;  V02 rat0         [V02,T00] (  3,  6   )  struct (104) [rbp-0x68]  do-not-enreg[SF] must-init "Fast tail call lowering is creating a new local variable" <System.ValueTuple`2[System.Action`1[Program+LargeStruct],Program+LargeStruct]>
 ;
 ; Lcl frame size = 112
 
 G_M54833_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
        sub      rsp, 112
        lea      rbp, [rsp+0x70]
        xor      eax, eax
        mov      qword ptr [rbp-0x68], rax
        vxorps   xmm8, xmm8, xmm8
        vmovdqu  ymmword ptr [rbp-0x60], ymm8
        vmovdqu  ymmword ptr [rbp-0x40], ymm8
        vmovdqu  ymmword ptr [rbp-0x20], ymm8
                                                 ;; size=36 bbWeight=1 PerfScore 9.33
 G_M54833_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-       nop
-                                                ;; size=1 bbWeight=1 PerfScore 0.25
-G_M54833_IG03:        ; bbWeight=1, nogc, extend
        lea      rdi, bword ptr [rbp-0x68]
        ; byrRegs +[rdi]
        lea      rsi, [rbp+0x10]
        mov      edx, 104
        call     [CORINFO_HELP_BULK_WRITEBARRIER]
        ; byrRegs -[rdi]
        ; gcr arg pop 0
+       nop
+                                                ;; size=20 bbWeight=1 PerfScore 4.50
+G_M54833_IG03:        ; bbWeight=1, nogc, extend
        lea      rdi, [rbp+0x10]
        lea      rsi, [rbp+0x18]
        mov      rcx, gword ptr [rsi]
        ; gcrRegs +[rcx]
        mov      gword ptr [rbp+0x10], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x18], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x20], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x28], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x30], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x38], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x40], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x48], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x50], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x58], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x60], rcx
        add      rsi, 8
        add      rdi, 8
        mov      rcx, gword ptr [rsi]
        mov      gword ptr [rbp+0x68], rcx
        mov      rdi, gword ptr [rbp-0x68]
        ; gcrRegs +[rdi]
        mov      rdi, gword ptr [rdi+0x08]
        mov      rax, gword ptr [rbp-0x68]
        ; gcrRegs +[rax]
-                                                ;; size=211 bbWeight=1 PerfScore 50.75
+                                                ;; size=192 bbWeight=1 PerfScore 46.50
 G_M54833_IG04:        ; bbWeight=1, epilog, nogc, extend
        add      rsp, 112
        pop      rbp
        tail.jmp [rax+0x18]System.Action`1[Program+LargeStruct]:Invoke(Program+LargeStruct):this
                                                 ;; size=9 bbWeight=1 PerfScore 2.75
 
 ; Total bytes of code 257, prolog size 36, PerfScore 63.08, instruction count 68, allocated bytes for code 257 (MethodHash=bbe929ce) for method Program:Foo(System.ValueTuple`2[System.Action`1[Program+LargeStruct],Program+LargeStruct]) (Tier1)
 ; ============================================================

…opy with write barrier calls When the JIT generates code for a tailcall it must generate code to write the arguments into the incoming parameter area. Since the GC ness of the arguments of the tailcall may not match the GC ness of the parameters, we have to disable GC before we start writing these. This is done by finding the earliest `GT_PUTARG_STK` node and placing the start of the NOGC region right before it. In addition, there is logic to take care of potential overlap between the arguments and parameters. For example, if the call has an operand that uses one of the parameters, then we must take care that we do not override that parameter with the tailcall argument before the use of it. To do so, we sometimes may need to introduce copies from the parameter locals to locals on the stack frame. This used to work fine, however, with dotnet#101761 we started transforming block copies into managed calls in certain scenarios. It was possible for the JIT to decide to introduce a copy to a local and for this transformation to then kick in. This would cause us to end up with the managed helper call after starting the nogc region. In checked builds this would hit an assert during GC scan; in release builds, it would end up with corrupted data. The fix here is to make sure we insert the `GT_START_NOGC` after all the potential temporary copies we may introduce as part of the tailcat stll logic. There was an additional assumption that the first `PUTARG_STK` operand was the earliest one in execution order. That is not guaranteed, so this change stops relying on that as well by introducing a new `LIR::FirstNode` and using that to determine the earliest `PUTARG_STK` node. Fix dotnet#102370 Fix dotnet#104123 Fix dotnet#105441

jakobbotsch · 2024-07-26T14:50:37Z

cc @dotnet/jit-contrib PTAL @EgorBo

superpmi-diffs/replay failing because the windows-arm64 collection failed (the build step timed out right before finishing). I kicked off a new run.

Diffs

AndyAyersMS

LGTM.

Any idea why the TailCallOpt config is set up how it is? Seems like we could use an enable range there to allow bisecting in cases like we've seen recently.

hoyosjs · 2024-07-26T17:27:30Z

/backport to release/9.0-preview7

github-actions · 2024-07-26T17:27:42Z

Started backporting to release/9.0-preview7: https://github.com/dotnet/runtime/actions/runs/10115236657

jakobbotsch · 2024-07-26T19:35:52Z

LGTM.

Any idea why the TailCallOpt config is set up how it is? Seems like we could use an enable range there to allow bisecting in cases like we've seen recently.

No idea.. I also don't know why we have both DOTNET_FastTailCalls and DOTNET_TailCallOpt.

For bisections related to optimizations I usually find that DOTNET_JitOnlyOptimizeRange will do the job, although sometimes it's nice to have something more fine-grained.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 26, 2024

dotnet-policy-service bot assigned jakobbotsch Jul 26, 2024

Nit

4aa09b2

jakobbotsch mentioned this pull request Jul 26, 2024

JIT: Unnecessarily large copies to resolve parameter/argument conflicts for tailcalls #105552

Open

jakobbotsch marked this pull request as ready for review July 26, 2024 14:49

This was referenced Jul 26, 2024

Roslyn analyzer throws error AD0001 NullReferenceException dotnet/dnceng#3305

Open

.NET 9 Preview 3 occassionally fails to build in arm64 with Object reference not set to an instance of an object errors dotnet/roslyn-analyzers#7349

Closed

AndyAyersMS approved these changes Jul 26, 2024

View reviewed changes

EgorBo approved these changes Jul 26, 2024

View reviewed changes

github-actions bot mentioned this pull request Jul 26, 2024

[release/9.0-preview7] JIT: Fix placement of GT_START_NOGC for tailcalls in face of bulk copy with write barrier calls #105572

Merged

2 tasks

AndyAyersMS merged commit 99c9f5b into dotnet:main Jul 26, 2024
101 of 108 checks passed

JulieLeeMSFT added this to the 9.0.0 milestone Jul 26, 2024

jakobbotsch deleted the fix-102370 branch July 26, 2024 19:34

github-actions bot locked and limited conversation to collaborators Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Fix placement of `GT_START_NOGC` for tailcalls in face of bulk copy with write barrier calls #105551

JIT: Fix placement of `GT_START_NOGC` for tailcalls in face of bulk copy with write barrier calls #105551

jakobbotsch commented Jul 26, 2024 •

edited

Loading

jakobbotsch commented Jul 26, 2024 •

edited

Loading

AndyAyersMS left a comment

hoyosjs commented Jul 26, 2024

github-actions bot commented Jul 26, 2024

jakobbotsch commented Jul 26, 2024

JIT: Fix placement of GT_START_NOGC for tailcalls in face of bulk copy with write barrier calls #105551

JIT: Fix placement of GT_START_NOGC for tailcalls in face of bulk copy with write barrier calls #105551

Conversation

jakobbotsch commented Jul 26, 2024 • edited Loading

jakobbotsch commented Jul 26, 2024 • edited Loading

AndyAyersMS left a comment

Choose a reason for hiding this comment

hoyosjs commented Jul 26, 2024

github-actions bot commented Jul 26, 2024

jakobbotsch commented Jul 26, 2024

JIT: Fix placement of `GT_START_NOGC` for tailcalls in face of bulk copy with write barrier calls #105551

JIT: Fix placement of `GT_START_NOGC` for tailcalls in face of bulk copy with write barrier calls #105551

jakobbotsch commented Jul 26, 2024 •

edited

Loading

jakobbotsch commented Jul 26, 2024 •

edited

Loading