-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression with tail.callvirt transformation to helper call #45527
Conversation
4296e10
to
f1bf88f
Compare
f1bf88f
to
a4c3252
Compare
a4c3252
to
c3e4398
Compare
@@ -0,0 +1,10 @@ | |||
<Project Sdk="Microsoft.NET.Sdk.IL"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we cannot represent this tail call pattern in C#, would it make sense to use an fsproj instead of (or in addition to) ilproj? JIT test infra already support F# (common props, dependencies etc. are set), e.g.: https://github.com/dotnet/runtime/blob/fae35941e16310d815460475810f069578e6e774/src/tests/JIT/Directed/tailcall/mutual_recursion.fsproj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add the F# test, but I am not sure how much value the test would bring. We added many tests written in IL in the past that were designed to specifically expose some of the tail. call
issues found during JIT stress testing.
The original F# program in #45250 was failing during execution of some internal F# function (invokeFast2
) and it would be harder to debug that program and reason about the issue in comparison with the constructed minimal test repro.
@erozenfeld suggested to extend https://github.com/dotnet/runtime/blob/master/src/tests/JIT/Directed/tailcall/more_tailcalls.cs test suite and include the failing pattern in there, so I would say this should be our approach in order to increase test coverage for such tail call scenarios.
7d0d8c5
to
efbb1ae
Compare
37313cd
to
6388f91
Compare
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
6388f91
to
09ab998
Compare
…ed to compute the target function pointer in morph.cpp
09ab998
to
cfd1a2a
Compare
The three failures in jitstress pipelines are #45326 |
@dotnet/jit-contrib This is ready for review, please take a look. @JulieLeeMSFT We should consider back-porting this to 5.0 (see dotnet/fsharp#10454) |
Does this impact all architecture? 12/11 is code complete date for 5.0.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I ran jit-diff with pmi on framework libraries on win-x64. There is a little point of comparing the changes in a "usual" way since fast tail calls would be used in most of the places instead of helper-based tail calls:
The saved 2 bytes of code size come from spilling @@ -981831,33 +981832,33 @@ G_M35118_IG01:
G_M35118_IG02:
00000F mov rdx, gword ptr [rcx+8]
000013 add rdx, 32
-000017 mov r8, gword ptr [rdx]
-00001A mov rsi, gword ptr [r8+8]
-00001E vmovupd ymm0, ymmword ptr[rcx+16]
-000023 vmovupd ymmword ptr[rsp+20H], ymm0
-000029 mov rcx, gword ptr [rdx]
-00002C mov rcx, gword ptr [rcx+8]
-000030 mov rdx, 0xD1FFAB1E
-00003A mov r8, 0xD1FFAB1E
-000044 call CORINFO_HELP_VIRTUAL_FUNC_PTR
-000049 mov r8, rax
-00004C lea rdx, bword ptr [rsp+20H]
-000051 mov rcx, rsi
-000054 call ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,System.Numerics.Vector`1[Single],long)
-000059 lea rcx, bword ptr [rsp+58H]
-00005E lea r8, bword ptr [rsp+48H]
-000063 mov rdx, 0xD1FFAB1E
-00006D call System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
-000072 mov rax, gword ptr [rsp+48H]
- ;; bbWeight=1 PerfScore 23.00
+000017 mov rdx, gword ptr [rdx]
+00001A mov rdx, gword ptr [rdx+8]
+00001E mov rsi, rdx
+000021 vmovupd ymm0, ymmword ptr[rcx+16]
+000026 vmovupd ymmword ptr[rsp+20H], ymm0
+00002C mov rcx, rdx
+00002F mov rdx, 0xD1FFAB1E
+000039 mov r8, 0xD1FFAB1E
+000043 call CORINFO_HELP_VIRTUAL_FUNC_PTR
+000048 mov r8, rax
+00004B lea rdx, bword ptr [rsp+20H]
+000050 mov rcx, rsi
+000053 call ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,System.Numerics.Vector`1[Single],long)
+000058 lea rcx, bword ptr [rsp+58H]
+00005D lea r8, bword ptr [rsp+48H]
+000062 mov rdx, 0xD1FFAB1E
+00006C call System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
+000071 mov rax, gword ptr [rsp+48H]
+ ;; bbWeight=1 PerfScore 19.50
G_M35118_IG03:
-000077 vzeroupper
-00007A add rsp, 80
-00007E pop rsi
-00007F ret
+000076 vzeroupper
+000079 add rsp, 80
+00007D pop rsi
+00007E ret
;; bbWeight=1 PerfScore 2.75
-; Total bytes of code 128, prolog size 15, PerfScore 42.25, instruction count 29 (MethodHash=e22976d1) for method action@647-12[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this
+; Total bytes of code 127, prolog size 15, PerfScore 38.65, instruction count 29 (MethodHash=e22976d1) for method action@647-12[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this To have more reasonable results I set
An example of the code changes with disabled fast tail calls @@ -24515,41 +24507,35 @@ G_M51115_IG08:
0000C2 mov rax, qword ptr [rdi]
0000C5 mov rax, qword ptr [rax+72]
0000C9 call gword ptr [rax+32]Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon][System.__Canon,System.__Canon]:Invoke(System.__Canon):System.__Canon:this
-0000CC mov r14, rax
-0000CF mov rcx, rdi
-0000D2 mov rdx, rbx
-0000D5 mov rax, qword ptr [rdi]
-0000D8 mov rax, qword ptr [rax+72]
-0000DC call gword ptr [rax+32]Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon][System.__Canon,System.__Canon]:Invoke(System.__Canon):System.__Canon:this
-0000DF mov rdi, rax
-0000E2 mov rcx, rsi
-0000E5 mov rdx, 0xD1FFAB1E
-0000EF call CORINFO_HELP_RUNTIMEHANDLE_METHOD
-0000F4 mov rdx, rax
-0000F7 mov rcx, rdi
-0000FA mov r8, 0xD1FFAB1E
-000104 call CORINFO_HELP_VIRTUAL_FUNC_PTR
-000109 mov r8, rax
-00010C mov rcx, r14
-00010F mov rdx, rbp
-000112 call ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,long,long)
-000117 lea rcx, bword ptr [rsp+68H]
-00011C lea r8, bword ptr [rsp+28H]
-000121 mov rdx, 0xD1FFAB1E
-00012B call System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
-000130 mov rax, gword ptr [rsp+28H]
- ;; bbWeight=0.50 PerfScore 11.88
+0000CC mov rdi, rax
+0000CF mov rcx, rsi
+0000D2 mov rdx, 0xD1FFAB1E
+0000DC call CORINFO_HELP_RUNTIMEHANDLE_METHOD
+0000E1 mov rdx, rax
+0000E4 mov rcx, rdi
+0000E7 mov r8, 0xD1FFAB1E
+0000F1 call CORINFO_HELP_VIRTUAL_FUNC_PTR
+0000F6 mov r8, rax
+0000F9 mov rdx, rbp
+0000FC mov rcx, rdi
+0000FF call ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,long,long)
+000104 lea rcx, bword ptr [rsp+68H]
+000109 lea r8, bword ptr [rsp+28H]
+00010E mov rdx, 0xD1FFAB1E
+000118 call System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
+00011D mov rax, gword ptr [rsp+28H]
+ ;; bbWeight=0.50 PerfScore 8.00
G_M51115_IG09:
-000135 add rsp, 64
-000139 pop rbx
-00013A pop rbp
-00013B pop rsi
-00013C pop rdi
-00013D pop r14
-00013F ret
+000122 add rsp, 64
+000126 pop rbx
+000127 pop rbp
+000128 pop rsi
+000129 pop rdi
+00012A pop r14
+00012C ret
;; bbWeight=0.50 PerfScore 1.88
-; Total bytes of code 320, prolog size 27, PerfScore 70.31, instruction count 87 (MethodHash=52ff3854) for method Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon
+; Total bytes of code 301, prolog size 27, PerfScore 64.54, instruction count 81 (MethodHash=52ff3854) for method Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon The second call to I attached the files with collected diffs. |
@JulieLeeMSFT Please find the answers below:
Yes, although it has more chances to fail on arm32 since the platform does not support fast tail calls. In other words, there is no alternative (and usually more preferred) way of transforming tail prefixed calls on that platform. It is also true on x86 - but we use a different tail call helper mechanism on that platform (except a case when I am in the middle of collecting the code diffs for win-arm32. If the diffs will turn out to be the same as what I saw on win-x64 with disabled fast tail calls then this would prove that
Moderate, due to the difficulty of triggering the failing pattern in the testing. Even with the added tests we might've missed some other edge cases.
Just to confirm my understanding of what that means - this is the date when the changes must go in to |
I don't see this change as particularly risky, and it is fixing a serious codegen bug that is a 5.0 regression. Seems like an obvious candidate for porting to 5.0. |
Just to be pedantic: this doesn't affect x86 since we use a different helper call mechanism on that architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I thought this way as well until my test has failed on x86 and I discovered that runtime/src/coreclr/jit/morph.cpp Lines 18980 to 18992 in 8e29c00
the JIT switches to portable helpers on x86 when |
I finished collecting the diffs on win-arm. All the code changes occur in
The PR does fix a regression with calling a function twice diff --git "a/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\base\\530.dasm" "b/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\diff\\530.dasm"
index ca9882b2331..5ecbf52374b 100644
--- "a/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\base\\530.dasm"
+++ "b/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\diff\\530.dasm"
@@ -5,24 +5,22 @@
; partially interruptible
; Final local variable assignments
;
-; V00 TypeCtx [V00,T02] ( 6, 5 ) int -> r5
-; V01 arg0 [V01,T01] ( 7, 5 ) ref -> r4 class-hnd
-; V02 arg1 [V02,T00] ( 14, 8 ) struct (16) [sp+0x40] do-not-enreg[SFA] multireg-arg double-align
+; V00 TypeCtx [V00,T01] ( 6, 5 ) int -> r4
+; V01 arg0 [V01,T02] ( 5, 4 ) ref -> r5 class-hnd
+; V02 arg1 [V02,T00] ( 10, 6 ) struct (16) [sp+0x40] do-not-enreg[SFA] multireg-arg double-align
; V03 arg2 [V03 ] ( 4, 2 ) long -> [sp+0x50]
; V04 loc0 [V04,T03] ( 4, 3 ) ref -> r6 class-hnd
;* V05 loc1 [V05 ] ( 0, 0 ) ref -> zero-ref class-hnd
; V06 OutArgs [V06 ] ( 1, 1 ) lclBlk (20) [sp+0x00] "OutgoingArgSpace"
; V07 tmp1 [V07 ] ( 2, 1 ) ref -> [sp+0x20] do-not-enreg[X] must-init addr-exposed "Return value for tail call dispatcher"
; V08 ReturnAddress[V08 ] ( 2, 1 ) int -> [sp+0x3C] do-not-enreg[X] addr-exposed "Return address"
-; V09 tmp3 [V09,T06] ( 2, 2 ) int -> r0 "argument with side effect"
+; V09 tmp3 [V09,T05] ( 2, 2 ) int -> r0 "argument with side effect"
; V10 tmp4 [V10 ] ( 2, 1 ) ref -> [sp+0x1C] do-not-enreg[X] must-init addr-exposed "Return value for tail call dispatcher"
-; V11 tmp5 [V11,T04] ( 2, 2 ) ref -> r0 "argument with side effect"
-; V12 tmp6 [V12,T05] ( 2, 2 ) ref -> r6 "argument with side effect"
-; V13 tmp7 [V13,T07] ( 2, 2 ) int -> [sp+0x18] "argument with side effect"
-; V14 cse0 [V14,T08] ( 2, 1 ) ref -> r0 "CSE - moderate"
-; V15 cse1 [V15,T09] ( 2, 1 ) ref -> r0 "CSE - moderate"
-; V16 rat0 [V16,T10] ( 2, 1 ) int -> [sp+0x50] do-not-enreg[] V03.lo(offs=0x00) "field V03.lo (fldOffset=0x0)"
-; V17 rat1 [V17,T11] ( 2, 1 ) int -> [sp+0x54] do-not-enreg[] V03.hi(offs=0x04) "field V03.hi (fldOffset=0x4)"
+; V11 tmp5 [V11,T04] ( 3, 3 ) ref -> r5 "tail call thisptr"
+; V12 tmp6 [V12,T06] ( 2, 2 ) int -> r0 "argument with side effect"
+; V13 cse0 [V13,T07] ( 2, 1 ) ref -> r0 "CSE - moderate"
+; V14 rat0 [V14,T08] ( 2, 1 ) int -> [sp+0x50] do-not-enreg[] V03.lo(offs=0x00) "field V03.lo (fldOffset=0x0)"
+; V15 rat1 [V15,T09] ( 2, 1 ) int -> [sp+0x54] do-not-enreg[] V03.hi(offs=0x04) "field V03.hi (fldOffset=0x4)"
;
; Lcl frame size = 44
@@ -35,12 +33,12 @@ G_M14368_IG01:
str r2, [sp+0x20] // [V07 tmp1]
str r2, [sp+0x1c] // [V10 tmp4]
str r0, [r11-0x14]
- mov r5, r0
- mov r4, r1
+ mov r4, r0
+ mov r5, r1
;; bbWeight=1 PerfScore 10.00
G_M14368_IG02:
- mov r0, r5
- mov r1, r4
+ mov r0, r4
+ mov r1, r5
movw r3, 0xd1ff
movt r3, 0xd1ff
blx r3 // CORINFO_HELP_ISINSTANCEOFCLASS
@@ -50,7 +48,7 @@ G_M14368_IG02:
;; bbWeight=1 PerfScore 8.00
G_M14368_IG03:
mov r0, r6
- mov r1, r5
+ mov r1, r4
movw r2, 0xd1ff
movt r2, 0xd1ff
movw r3, 0xd1ff
@@ -94,35 +92,23 @@ G_M14368_IG05:
ldr r1, [sp+0x4c] // [V02 arg1+0x0c]
str r0, [sp] // [V06 OutArgs]
str r1, [sp+0x04] // [V06 OutArgs+0x04]
- mov r0, r4
- ldr r1, [r4]
- ldr r1, [r1+44]
- ldr r1, [r1+16]
- blx r1 // hackishModuleName:hackishMethodName(System.Numerics.Vector`1[Single]):System.__Canon:this
- mov r6, r0
- ldr r2, [sp+0x40] // [V02 arg1]
- ldr r3, [sp+0x44] // [V02 arg1+0x04]
- ldr r0, [sp+0x48] // [V02 arg1+0x08]
- ldr r1, [sp+0x4c] // [V02 arg1+0x0c]
- str r0, [sp] // [V06 OutArgs]
- str r1, [sp+0x04] // [V06 OutArgs+0x04]
- mov r0, r4
- ldr r1, [r4]
+ mov r0, r5
+ ldr r1, [r5]
ldr r1, [r1+44]
ldr r1, [r1+16]
blx r1 // hackishModuleName:hackishMethodName(System.Numerics.Vector`1[Single]):System.__Canon:this
- mov r1, r5
+ mov r5, r0
+ mov r0, r5
+ mov r1, r4
movw r2, 0xd1ff
movt r2, 0xd1ff
movw r3, 0xd1ff
movt r3, 0xd1ff
blx r3 // CORINFO_HELP_VIRTUAL_FUNC_PTR
- str r0, [sp+0x18] // [V13 tmp7]
- mov r0, r6
- ldr r2, [sp+0x18] // [V13 tmp7]
- str r2, [sp] // [V06 OutArgs]
- ldr r2, [sp+0x50] // [V16 rat0]
- ldr r3, [sp+0x54] // [V17 rat1]
+ str r0, [sp] // [V06 OutArgs]
+ ldr r2, [sp+0x50] // [V14 rat0]
+ ldr r3, [sp+0x54] // [V15 rat1]
+ mov r0, r5
movw r1, 0xd1ff
movt r1, 0xd1ff
ldr r1, [r1]
@@ -135,7 +121,7 @@ G_M14368_IG05:
movt r3, 0xd1ff
blx r3 // hackishModuleName:hackishMethodName()
ldr r0, [sp+0x1c] // [V10 tmp4]
- ;; bbWeight=0.50 PerfScore 23.50
+ ;; bbWeight=0.50 PerfScore 17.50
G_M14368_IG06:
add sp, 44
pop {r4,r5,r6,r11,lr}
@@ -143,7 +129,7 @@ G_M14368_IG06:
bx lr
;; bbWeight=0.50 PerfScore 2.00
-; Total bytes of code 262, prolog size 22, PerfScore 87.20, instruction count 104 (MethodHash=8524c7df) for method Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon
+; Total bytes of code 238, prolog size 22, PerfScore 78.80, instruction count 92 (MethodHash=8524c7df) for method Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon
; ============================================================
Unwind Info:
@@ -155,7 +141,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 131 (0x00083) Actual length = 262 (0x000106)
+ Function Length : 119 (0x00077) Actual length = 238 (0x0000ee)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) |
I spent last couple hours trying to create a test where a virtual tail call would require null check. |
That’s what I understand. |
The changes fixes the following issue:
tail. callvirt #<generic virtual method>
to helper-based tail callthis
argument of the call is computed during another call with side effectIL_STUB_StoreTailCallArgs
. Since the call is virtual in order to compute such address usingCORINFO_HELP_VIRTUAL_FUNC_PTR
helper call the JIT would need to passthis
value to the helperthis
and use the cloned tree to compute the target function pointer addressThis is reproduces using the following IL code:
The IL snippet corresponds to C# code below with small modification of adding
tail.
prefix:Originally the issue was only reproduced on arm. However, this is due to the fact that on other platforms the JIT would use fast tail calls for such cases. If the environment has
set COMPlus_FastTailCalls=0
than the issue would reproduce everywhere where portable tail call helpers are used. In order to disallow using fast tail calls and always use helpers-based tail calls all the test cases I am adding containlocalloc
in the caller method.Before:
In
fgMorphTailCallViaHelpers
a tree that corresponds to the above-described problem:would be transformed into a call to dispatcher and an IL stub:
During that transformation a tree
[000005]
was cloned andCORINFO_HELP_VIRTUAL_FUNC_PTR
andIL_STUB_StoreTailCallArgs
After:
The cloning is replaced with spilling of the tree to a local before calling
StoreTailCallArgs
In addition to the regression test I updated more_tailcalls.cs/more_tailcalls.ik test suite and included the failing pattern.
Before:
After:
I also validate that the F# program that was misbehaving originally in #45250 executes correctly now: