-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Preference locals away from PUTARG_REG killed registers #76671
JIT: Preference locals away from PUTARG_REG killed registers #76671
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsLet's see how expensive this is going to be...
|
cc @dotnet/jit-contrib PTAL @kunalspathak The diffs look good, win-x64 sample:
arm64 looks good too but the diff is overstated due to a lot of test diffs. I will look at some of the regressions tomorrow, and also the MinOpts TP impact on linux-x64. Not completely sure where that is coming from, AFAIK the "expensive" parts here are only running in FullOpts. From looking at the diffs page I also realize I need to add something along the lines of @AndyAyersMS's suggestion with splitting improvements/regressions -- currently it is very hard to tell the magnitude of the regressions compared to the improvements. It is even worse after #76238 because the analysis summary no longer shows the number of improving/regressing contexts and it is not surfaced anywhere else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned offline, please update the comments around preference
. Also added some comments. Overall nice work.
/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs |
Azure Pipelines successfully started running 2 pipeline(s). |
I will look at |
asm.aspnet regression10 ( 2.99% of base) : 21136.dasm - AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,<DoSend>d__28]:MoveNext(System.Threading.Thread):this The diff looks like: diff --git "a/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\base\\21136.dasm" "b/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\diff\\21136.dasm"
index c16c4cc..98ab8f5 100644
--- "a/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\base\\21136.dasm"
+++ "b/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\diff\\21136.dasm"
@@ -44,23 +44,29 @@ G_M19108_IG02: ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byr
;; size=24 bbWeight=1 PerfScore 5.50
G_M19108_IG03: ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byref, isz
; gcrRegs -[rdx]
- mov rcx, gword ptr [rsi+48H]
- ; gcrRegs +[rcx]
- test rcx, rcx
+ mov rdx, gword ptr [rsi+48H]
+ ; gcrRegs +[rdx]
+ mov gword ptr [rsp+20H], rdx
+ ; GC ptr vars +{V03}
+ test rdx, rdx
je SHORT G_M19108_IG09
- ;; size=9 bbWeight=1.03 PerfScore 3.33
-G_M19108_IG04: ; gcrefRegs=000000C2 {rcx rsi rdi}, byrefRegs=00000000 {}, byref, isz
+ ;; size=14 bbWeight=1.03 PerfScore 4.36
+G_M19108_IG04: ; gcVars=0000000000000004 {V03}, gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, gcvars, byref, isz
+ ; gcrRegs -[rdx]
test rdi, rdi
jne SHORT G_M19108_IG10
- mov rdx, 0xD1FFAB1E ; const ptr
- mov rdx, gword ptr [rdx]
+ mov rcx, 0xD1FFAB1E ; const ptr
+ mov rdx, gword ptr [rcx]
; gcrRegs +[rdx]
+ mov rcx, gword ptr [rsp+20H]
+ ; gcrRegs +[rcx]
mov r8, rsi
; gcrRegs +[r8]
+ ; GC ptr vars -{V03}
call [hackishMethodName]
; gcrRegs -[rcx rdx rdi r8]
; gcr arg pop 0
- ;; size=27 bbWeight=1.03 PerfScore 6.92
+ ;; size=32 bbWeight=1.03 PerfScore 7.94
G_M19108_IG05: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz
test dword ptr [rsi+34H], 0xD1FFAB1E
jne SHORT G_M19108_IG11
@@ -98,10 +104,13 @@ G_M19108_IG08: ; gcVars=0000000000000000 {}, gcrefRegs=000000C0 {rsi rdi}
mov rcx, gword ptr [rsi+48H]
; gcrRegs +[rcx]
test rcx, rcx
+ mov gword ptr [rsp+20H], rcx
+ ; GC ptr vars +{V03}
jne SHORT G_M19108_IG04
- ;; size=48 bbWeight=0 PerfScore 0.00
+ ;; size=53 bbWeight=0 PerfScore 0.00
G_M19108_IG09: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz
; gcrRegs -[rcx rdi]
+ ; GC ptr vars -{V03}
lea rcx, bword ptr [rsi+50H]
; byrRegs +[rcx]
call [hackishMethodName]
@@ -109,25 +118,25 @@ G_M19108_IG09: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref,
; gcr arg pop 0
jmp SHORT G_M19108_IG05
;; size=12 bbWeight=0 PerfScore 0.00
-G_M19108_IG10: ; gcrefRegs=000000C2 {rcx rsi rdi}, byrefRegs=00000000 {}, byref, isz
- ; gcrRegs +[rcx rdi]
- mov gword ptr [rsp+20H], rcx
- ; GC ptr vars +{V03}
+G_M19108_IG10: ; gcVars=0000000000000004 {V03}, gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, gcvars, byref, isz
+ ; gcrRegs +[rdi]
+ ; GC ptr vars +{V02 V03}
mov r8, 0xD1FFAB1E ; const ptr
mov r8, gword ptr [r8]
; gcrRegs +[r8]
mov rcx, rdi
+ ; gcrRegs +[rcx]
mov rdx, gword ptr [rsp+20H]
; gcrRegs +[rdx]
mov r9, rsi
; gcrRegs +[r9]
- ; GC ptr vars -{V03}
+ ; GC ptr vars -{V02 V03}
call [hackishMethodName]
; gcrRegs -[rcx rdx rdi r8-r9]
; gcr arg pop 0
test dword ptr [rsi+34H], 0xD1FFAB1E
je SHORT G_M19108_IG06
- ;; size=44 bbWeight=0 PerfScore 0.00
+ ;; size=39 bbWeight=0 PerfScore 0.00
G_M19108_IG11: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz
cmp byte ptr [(reloc)], 0 ; data for (null)hackishFieldName
je SHORT G_M19108_IG12
@@ -184,7 +193,7 @@ G_M19108_IG14: ; , epilog, nogc, extend
; gcr arg pop 0
;; size=14 bbWeight=0 PerfScore 0.00 With the change we remove DefList: { N197.t140. CNS_INT }
N199 ( 5, 12) [000023] #---G------ ▌ IND ref REG NA $85
<RefPosition #127 @199 RefTypeUse <Ivl:27> BB07 regmask=[allInt] minReg=1 last wt=0.00>
Interval 28: ref RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #128 @200 RefTypeDef <Ivl:28> IND BB07 regmask=[allInt] minReg=1 wt=0.00>
DefList: { N199.t23. IND }
N201 (???,???) [000215] ----G------ ▌ PUTARG_REG ref REG r8
<RefPosition #129 @201 RefTypeFixedReg <Reg:r8 > BB07 regmask=[r8] minReg=1 wt=0.00>
<RefPosition #130 @201 RefTypeUse <Ivl:28> BB07 regmask=[r8] minReg=1 last fixed wt=0.00>
Interval 29: ref RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #131 @202 RefTypeFixedReg <Reg:r8 > BB07 regmask=[r8] minReg=1 wt=0.00>
<RefPosition #132 @202 RefTypeDef <Ivl:29> PUTARG_REG BB07 regmask=[r8] minReg=1 fixed wt=0.00>
DefList: { N201.t215. PUTARG_REG }
N203 ( 1, 1) [000021] ----------- ▌ LCL_VAR ref V01 arg1 u:1 NA (last use) REG NA $81
DefList: { N201.t215. PUTARG_REG }
N205 (???,???) [000216] ----------- ▌ PUTARG_REG ref REG rcx
+Last use of V01 between a PUTARG and CALL. Removing occupied arg regs from preferences: [r8]
<RefPosition #133 @205 RefTypeFixedReg <Reg:rcx> BB07 regmask=[rcx] minReg=1 wt=0.00>
<RefPosition #134 @205 RefTypeUse <Ivl:1 V01> LCL_VAR BB07 regmask=[rcx] minReg=1 last fixed wt=302.51>
Interval 30: ref RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #135 @206 RefTypeFixedReg <Reg:rcx> BB07 regmask=[rcx] minReg=1 wt=0.00>
<RefPosition #136 @206 RefTypeDef <Ivl:30> PUTARG_REG BB07 regmask=[rcx] minReg=1 fixed wt=0.00>
DefList: { N201.t215. PUTARG_REG; N205.t216. PUTARG_REG }
N207 ( 1, 1) [000022] ----------- ▌ LCL_VAR ref V03 loc1 u:1 NA (last use) REG NA <l:$340, c:$83>
DefList: { N201.t215. PUTARG_REG; N205.t216. PUTARG_REG }
N209 (???,???) [000217] ----------- ▌ PUTARG_REG ref REG rdx
+Last use of V03 between a PUTARG and CALL. Removing occupied arg regs from preferences: [rcx r8] This means we end up with @@ -20,22 +20,22 @@ def: {V03}
in: {V00 V01 V02}
out: {V00 V01 V02 V03}
Var=Reg beg of BB03: V00=rsi V01=rdi V02=rbx
-Var=Reg end of BB03: V00=rsi V01=rdi V03=rcx V02=rbx
+Var=Reg end of BB03: V00=rsi V01=rdi V02=rbx
BB05
use: {V01}
def: {}
in: {V00 V01 V02 V03}
out: {V00 V01 V02 V03}
-Var=Reg beg of BB05: V00=rsi V01=rdi V03=rcx V02=rbx
-Var=Reg end of BB05: V00=rsi V01=rdi V03=rcx V02=rbx
+Var=Reg beg of BB05: V00=rsi V01=rdi V02=rbx
+Var=Reg end of BB05: V00=rsi V01=rdi V02=rbx
BB06
use: {V00 V03}
def: {}
in: {V00 V02 V03}
out: {V00 V02}
-Var=Reg beg of BB06: V00=rsi V03=rcx V02=rbx
+Var=Reg beg of BB06: V00=rsi V02=rbx
Var=Reg end of BB06: V00=rsi V02=rbx
BB08
@@ -128,4 +128,4 @@ Var=Reg end of BB14: none
RESOLVING EDGES
- BB07 top (BB05->BB07): move V03 from rcx to STK (Split)
+ BB02 bottom: move V03 from rcx to STK (SharedCritical) |
benchmarks.run regression23 ( 7.40% of base) : 901.dasm - System.Array:Clear(System.Array,int,int) Diff: @@ -14,9 +14,9 @@
; V04 loc1 [V04,T07] ( 4, 3.50) int -> r14
; V05 loc2 [V05,T02] ( 7, 6 ) long -> r15
; V06 loc3 [V06,T04] ( 4, 4 ) int -> rdx
-; V07 loc4 [V07,T09] ( 3, 3 ) long -> r8
-; V08 loc5 [V08,T10] ( 4, 2.50) byref -> rcx single-def
-; V09 loc6 [V09,T05] ( 6, 3.50) long -> [rsp+20H] spill-single-def
+; V07 loc4 [V07,T09] ( 3, 3 ) long -> rcx
+; V08 loc5 [V08,T10] ( 4, 2.50) byref -> [rsp+28H] single-def
+; V09 loc6 [V09,T05] ( 6, 3.50) long -> [rsp+30H] spill-single-def
; V10 loc7 [V10,T11] ( 3, 1.50) int -> rdx
; V11 OutArgs [V11 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V12 tmp1 [V12 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
@@ -24,7 +24,7 @@
;* V14 tmp3 [V14 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
; V15 cse0 [V15,T08] ( 6, 3 ) ref -> rdx "CSE - aggressive"
;
-; Lcl frame size = 40
+; Lcl frame size = 56
G_M44216_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
push r15
@@ -33,7 +33,7 @@ G_M44216_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
push rsi
push rbp
push rbx
- sub rsp, 40
+ sub rsp, 56
mov rsi, rcx
; gcrRegs +[rsi]
mov ebx, edx
@@ -95,25 +95,29 @@ G_M44216_IG07: ; gcrefRegs=00000040 {rsi}, byrefRegs=00000020 {rbp}, byre
lea ecx, [rdx+rdi]
cmp ecx, dword ptr [rsi+08H]
ja G_M44216_IG16
- movzx r8, word ptr [r15]
- mov ecx, edx
- imul rcx, r8
- add rcx, rbp
- ; byrRegs +[rcx]
- mov edx, edi
- imul r8, rdx
- mov rdx, r8
- mov qword ptr [rsp+20H], rdx
+ movzx rcx, word ptr [r15]
+ mov edx, edx
+ imul rdx, rcx
+ add rdx, rbp
+ ; byrRegs +[rdx]
+ mov bword ptr [rsp+28H], rdx
+ ; GC ptr vars +{V08}
+ mov eax, edi
+ imul rcx, rax
+ mov qword ptr [rsp+30H], rcx
test dword ptr [r15], 0xD1FFAB1E
je SHORT G_M44216_IG10
- ;; size=78 bbWeight=1 PerfScore 20.75
-G_M44216_IG08: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref
+ ;; size=80 bbWeight=1 PerfScore 21.50
+G_M44216_IG08: ; gcVars=0000000000000400 {V08}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
; gcrRegs -[rsi]
- ; byrRegs -[rbp]
+ ; byrRegs -[rdx rbp]
+ mov rdx, rcx
shr rdx, 3
- ;; size=4 bbWeight=0.50 PerfScore 0.25
+ mov rcx, bword ptr [rsp+28H]
+ ; byrRegs +[rcx]
+ ;; size=12 bbWeight=0.50 PerfScore 0.88
G_M44216_IG09: ; , epilog, nogc, extend
- add rsp, 40
+ add rsp, 56
pop rbx
pop rbp
pop rsi
@@ -123,22 +127,26 @@ G_M44216_IG09: ; , epilog, nogc, extend
tail.jmp [hackishMethodName]
; gcr arg pop 0
;; size=18 bbWeight=0.50 PerfScore 2.62
-G_M44216_IG10: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, gcvars, byref, isz
- test rdx, rdx
+G_M44216_IG10: ; gcVars=0000000000000400 {V08}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, isz
+ ; byrRegs -[rcx]
+ test rcx, rcx
je SHORT G_M44216_IG11
- cmp rdx, 768
+ cmp rcx, 768
ja SHORT G_M44216_IG13
xor edx, edx
- mov r8, qword ptr [rsp+20H]
+ mov rcx, bword ptr [rsp+28H]
+ ; byrRegs +[rcx]
+ mov r8, qword ptr [rsp+30H]
+ ; GC ptr vars -{V08}
call CORINFO_HELP_MEMSET
; byrRegs -[rcx]
; gcr arg pop 0
- ;; size=26 bbWeight=0.50 PerfScore 2.38
+ ;; size=31 bbWeight=0.50 PerfScore 2.88
G_M44216_IG11: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
nop
;; size=1 bbWeight=0.50 PerfScore 0.12
G_M44216_IG12: ; , epilog, nogc, extend
- add rsp, 40
+ add rsp, 56
pop rbx
pop rbp
pop rsi
@@ -147,12 +155,17 @@ G_M44216_IG12: ; , epilog, nogc, extend
pop r15
ret
;; size=13 bbWeight=0.50 PerfScore 2.12
-G_M44216_IG13: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, gcvars, byref
+G_M44216_IG13: ; gcVars=0000000000000400 {V08}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
+ ; GC ptr vars +{V08 V10}
+ mov rdx, bword ptr [rsp+28H]
+ ; byrRegs +[rdx]
+ mov rcx, rdx
; byrRegs +[rcx]
- mov rdx, qword ptr [rsp+20H]
- ;; size=5 bbWeight=0.50 PerfScore 0.50
+ mov rdx, qword ptr [rsp+30H]
+ ; byrRegs -[rdx]
+ ;; size=13 bbWeight=0.50 PerfScore 1.12
G_M44216_IG14: ; , epilog, nogc, extend
- add rsp, 40
+ add rsp, 56
pop rbx
pop rbp
pop rsi
@@ -164,6 +177,7 @@ G_M44216_IG14: ; , epilog, nogc, extend
;; size=18 bbWeight=0.50 PerfScore 2.62
G_M44216_IG15: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
; byrRegs -[rcx]
+ ; GC ptr vars -{V08 V10}
mov ecx, 2
call [System.ThrowHelper:ThrowArgumentNullException(int)]
; gcr arg pop 0
@@ -175,7 +189,7 @@ G_M44216_IG16: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
int3
;; size=7 bbWeight=0 PerfScore 0.00 Thankfully not inside any inner loop or anything. We notice the following conflicts: DefList: { }
N297 ( 1, 1) [000063] ----------- ▌ LCL_VAR long V09 loc6 u:1 NA (last use) REG NA <l:$350, c:$351>
DefList: { }
N299 ( 1, 1) [000065] -c-----N--- ▌ CNS_INT long 3 REG NA $184
Contained
DefList: { }
N301 ( 22, 5) [000066] ----------- ▌ RSZ long REG NA
<RefPosition #135 @301 RefTypeUse <Ivl:9 V09> LCL_VAR BB15 regmask=[allInt] minReg=1 last wt=350.00>
Interval 37: long RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #136 @302 RefTypeDef <Ivl:37> RSZ BB15 regmask=[allInt] minReg=1 wt=200.00>
Assigning related <I37> to <V09/L9>
DefList: { N301.t66. RSZ }
N303 (???,???) [000297] ----------- ▌ PUTARG_REG long REG rdx
<RefPosition #137 @303 RefTypeFixedReg <Reg:rdx> BB15 regmask=[rdx] minReg=1 wt=50.00>
<RefPosition #138 @303 RefTypeUse <Ivl:37> BB15 regmask=[rdx] minReg=1 last fixed wt=50.00>
Interval 38: long RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #139 @304 RefTypeFixedReg <Reg:rdx> BB15 regmask=[rdx] minReg=1 wt=50.00>
<RefPosition #140 @304 RefTypeDef <Ivl:38> PUTARG_REG BB15 regmask=[rdx] minReg=1 fixed wt=200.00>
DefList: { N303.t297. PUTARG_REG }
N305 ( 3, 2) [000062] ----------- ▌ LCL_VAR byref V08 loc5 u:1 NA (last use) REG NA <l:$1c9, c:$1ca>
DefList: { N303.t297. PUTARG_REG }
N307 (???,???) [000298] ----------- ▌ PUTARG_REG byref REG rcx
+Last use of V08 between PUTARG and CALL. Removing occupied arg regs from preferences: [rdx]
...
DefList: { }
N355 ( 3, 2) [000058] ----------- ▌ LCL_VAR byref V08 loc5 u:1 NA (last use) REG NA <l:$1c9, c:$1ca>
DefList: { }
N357 (???,???) [000301] ----------- ▌ PUTARG_REG byref REG rcx
<RefPosition #182 @357 RefTypeFixedReg <Reg:rcx> BB19 regmask=[rcx] minReg=1 wt=50.00>
<RefPosition #183 @357 RefTypeUse <Ivl:8 V08> LCL_VAR BB19 regmask=[rcx] minReg=1 last fixed wt=250.00>
Interval 41: byref RefPositions {} physReg:NA Preferences=[allInt]
<RefPosition #184 @358 RefTypeFixedReg <Reg:rcx> BB19 regmask=[rcx] minReg=1 wt=50.00>
<RefPosition #185 @358 RefTypeDef <Ivl:41> PUTARG_REG BB19 regmask=[rcx] minReg=1 fixed wt=200.00>
DefList: { N357.t301. PUTARG_REG }
N359 ( 1, 1) [000201] ----------- ▌ LCL_VAR long V09 loc6 u:1 NA (last use) REG NA <l:$350, c:$351>
DefList: { N357.t301. PUTARG_REG }
N361 (???,???) [000302] ----------- ▌ PUTARG_REG long REG rdx
+Last use of V09 between PUTARG and CALL. Removing occupied arg regs from preferences: [rcx]
Again this leads to some different allocation in earlier basic blocks that leads to a new resolution: diff --git "a/.\\base.txt" "b/.\\diff.txt"
index 625a2ec..ecb5830 100644
--- "a/.\\base.txt"
+++ "b/.\\diff.txt"
@@ -100,14 +100,14 @@ def: {V07 V08 V09}
in: {V02 V03 V05 V06}
out: {V08 V09}
Var=Reg beg of BB14: V02=rdi V05=r15 V06=rdx V03=rbp
-Var=Reg end of BB14: V09=rdx V08=rcx
+Var=Reg end of BB14: V09=rcx
BB15
use: {V08 V09}
def: {}
in: {V08 V09}
out: {}
-Var=Reg beg of BB15: V09=rdx V08=rcx
+Var=Reg beg of BB15: V09=rcx
Var=Reg end of BB15: none
BB16
@@ -115,23 +115,23 @@ use: {V09}
def: {}
in: {V08 V09}
out: {V08 V09}
-Var=Reg beg of BB16: V09=rdx V08=rcx
-Var=Reg end of BB16: V09=rdx V08=rcx
+Var=Reg beg of BB16: V09=rcx
+Var=Reg end of BB16: V09=rcx
BB17
use: {V09}
def: {}
in: {V08 V09}
out: {V08 V09}
-Var=Reg beg of BB17: V09=rdx V08=rcx
-Var=Reg end of BB17: V08=rcx
+Var=Reg beg of BB17: V09=rcx
+Var=Reg end of BB17: none
BB18
use: {V08 V09}
def: {}
in: {V08 V09}
out: {}
-Var=Reg beg of BB18: V08=rcx
+Var=Reg beg of BB18: none
Var=Reg end of BB18: none
BB20
@@ -147,7 +147,7 @@ use: {V08 V09}
def: {}
in: {V08 V09}
out: {}
-Var=Reg beg of BB19: V08=rcx
+Var=Reg beg of BB19: V08=rdx
Var=Reg end of BB19: none
BB02
@@ -168,4 +168,5 @@ Var=Reg end of BB13: none
RESOLVING EDGES
-
+ BB19 top (BB17->BB19): move V08 from STK to rdx (Split)
+N001 ( 3, 2) [000310] ----------z t310 = LCL_VAR byref V08 loc5 rdx REG rdx |
Can you paste the IR around the block that you highlighted for asp regression? |
Oddly superpmi failed for windows-arm64 because it didn't find download the mch file properly? |
Seems jitstressregs timed out for windows-arm64. Should we rerun it? |
Yes, the collection run failed for win-arm64 yesterday.
Retriggered that job. |
The IR for the block with the call looks like: ------------ BB07 [04D..05A) -> BB13 (cond), preds={BB05} succs={BB09,BB13}
N195 (???,???) [000187] ----------- IL_OFFSET void INLRT @ 0x04D[E-] REG NA
N197 ( 3, 10) [000140] H---------- t140 = CNS_INT(h) long 0x1eeefc071d8 const ptr Fseq[hackishFieldName] REG r8 $104
┌──▌ t140 long
N199 ( 5, 12) [000023] #---G------ t23 = ▌ IND ref REG r8 $85
┌──▌ t23 ref
N201 (???,???) [000215] ----G------ t215 = ▌ PUTARG_REG ref REG r8
N203 ( 1, 1) [000021] ----------- t21 = LCL_VAR ref V01 arg1 u:1 rdi (last use) REG rdi $81
┌──▌ t21 ref
N205 (???,???) [000216] ----------- t216 = ▌ PUTARG_REG ref REG rcx
N207 ( 1, 1) [000022] ----------z t22 = LCL_VAR ref V03 loc1 u:1 rdx (last use) REG rdx <l:$340, c:$83>
┌──▌ t22 ref
N209 (???,???) [000217] ----------- t217 = ▌ PUTARG_REG ref REG rdx
N211 ( 1, 1) [000024] ----------- t24 = LCL_VAR ref V00 this u:1 rsi REG rsi $80
┌──▌ t24 ref
N213 (???,???) [000218] ----------- t218 = ▌ PUTARG_REG ref REG r9
N215 ( 3, 10) [000219] Hc--------- t219 = CNS_INT(h) long 0x7ff9cad63d98 ftn REG NA
┌──▌ t219 long
N217 ( 5, 12) [000220] -c--------- t220 = ▌ IND long REG NA
┌──▌ t215 ref arg2 in r8
├──▌ t216 ref arg0 in rcx
├──▌ t217 ref arg1 in rdx
├──▌ t218 ref arg3 in r9
├──▌ t220 long control expr
N219 ( 22, 24) [000025] --CXG------ ▌ CALL void hackishModuleName.hackishMethodName REG NA $VN.Void
N221 (???,???) [000188] ----------- IL_OFFSET void INLRT @ 0x05A[E-] REG NA
N223 ( 1, 1) [000167] ----------- t167 = LCL_VAR ref V00 this u:1 rsi REG rsi $80
┌──▌ t167 ref
N225 ( 2, 2) [000166] -c--------- t166 = ▌ LEA(b+52) byref REG NA
┌──▌ t166 byref
N227 ( 4, 4) [000165] Vc--GO----- t165 = ▌ IND int REG NA $287
N229 ( 1, 4) [000169] -c--------- t169 = CNS_INT int 0x1600000 REG NA $48
┌──▌ t165 int
├──▌ t169 int
N231 ( 8, 11) [000163] J---GO-N--- ▌ TEST_EQ void REG NA
N233 ( 10, 13) [000162] ----GO----- ▌ JTRUE void REG NA $344 after LSRA |
Another timeout in the win-arm64 jitstressregs. Looks like CI is just too strapped for resources to run win-arm64 pri1 jobs. In any case I don't think we need to hold the PR up on it, I think the linux-arm64 coverage should be sufficient since the JITs are practically the same. I have a small nit change that I will push, but do you have other any other feedback @kunalspathak? I think the regressions are few enough that we can see if we hit anything in perf triage and go from there. |
One thing I want to do before merging is also look into the linux-arm64 MinOpts TP diffs, which I haven't gotten around to yet. |
I used @SingleAccretion's more detailed PIN tool to figure out where the instructions are spent in the linux-x64 MinOpts contexts, and it looks like majority of the new instructions are spent inside After some investigation it looks like MSVC has flipped some branches and is generating slightly different code for the function. So I'm going to trust that MSVC knows better there (besides, with native PGO/clang the function is presumably going to be compiled differently anyway). For the curious, this is the diff in pseudocode for the compilations produced for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added some minor comment suggestions that you can do it in follow-up PR.
Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
The failure looks like #74328 |
If we see uses of locals between
PUTARG_REG
and the correspondingCALL
node, then preference those intervals to not be allocated into the already placed register. Doing so will otherwise force a spill.To do this effectively we only need to look at dying locals. If the local is not dying between the
PUTARG_REG
andCALL
then it either does not have a use after thePUTARG_REG
, or it has a use after theCALL
and will be callee-saved register preferenced anyway.