Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Preference locals away from PUTARG_REG killed registers #76671

Merged
merged 8 commits into from
Oct 10, 2022

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Oct 5, 2022

If we see uses of locals between PUTARG_REG and the corresponding CALL node, then preference those intervals to not be allocated into the already placed register. Doing so will otherwise force a spill.

To do this effectively we only need to look at dying locals. If the local is not dying between the PUTARG_REG and CALL then it either does not have a use after the PUTARG_REG, or it has a use after the CALL and will be callee-saved register preferenced anyway.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 5, 2022
@ghost ghost assigned jakobbotsch Oct 5, 2022
@ghost
Copy link

ghost commented Oct 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Let's see how expensive this is going to be...

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch marked this pull request as ready for review October 5, 2022 23:57
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Oct 5, 2022

cc @dotnet/jit-contrib PTAL @kunalspathak

The diffs look good, win-x64 sample:

Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 12,329,670 -3,562
benchmarks.run.windows.x64.checked.mch 9,934,650 -2,787
coreclr_tests.run.windows.x64.checked.mch 86,156,629 -9,445
libraries.crossgen2.windows.x64.checked.mch 34,536,864 -22,775
libraries.pmi.windows.x64.checked.mch 51,835,982 -96,068
libraries_tests.pmi.windows.x64.checked.mch 110,972,059 -35,169

arm64 looks good too but the diff is overstated due to a lot of test diffs.

I will look at some of the regressions tomorrow, and also the MinOpts TP impact on linux-x64. Not completely sure where that is coming from, AFAIK the "expensive" parts here are only running in FullOpts.

From looking at the diffs page I also realize I need to add something along the lines of @AndyAyersMS's suggestion with splitting improvements/regressions -- currently it is very hard to tell the magnitude of the regressions compared to the improvements. It is even worse after #76238 because the analysis summary no longer shows the number of improving/regressing contexts and it is not surfaced anywhere else.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned offline, please update the comments around preference. Also added some comments. Overall nice work.

src/coreclr/jit/lsra.h Outdated Show resolved Hide resolved
src/coreclr/jit/lsrabuild.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsrabuild.cpp Show resolved Hide resolved
src/coreclr/jit/lsrabuild.cpp Show resolved Hide resolved
src/coreclr/jit/lsrabuild.cpp Outdated Show resolved Hide resolved
@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

I will look at PUTARG_SPLIT in a follow-up, it is (practically) arm32 only and when I tried enabling similar logic for it I saw a lot of arm32 regressions that I need to dedicate some time on understanding.

@jakobbotsch
Copy link
Member Author

Failures look like #67870 and #76280

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Oct 7, 2022

asm.aspnet regression
10 ( 2.99% of base) : 21136.dasm - AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,<DoSend>d__28]:MoveNext(System.Threading.Thread):this

The diff looks like:

diff --git "a/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\base\\21136.dasm" "b/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\diff\\21136.dasm"
index c16c4cc..98ab8f5 100644
--- "a/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\base\\21136.dasm"
+++ "b/C:\\dev\\dotnet\\spmi\\asm.aspnet.run.windows.x64.checked\\diff\\21136.dasm"
@@ -44,23 +44,29 @@ G_M19108_IG02:        ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byr
 						;; size=24 bbWeight=1    PerfScore 5.50
 G_M19108_IG03:        ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byref, isz
        ; gcrRegs -[rdx]
-       mov      rcx, gword ptr [rsi+48H]
-       ; gcrRegs +[rcx]
-       test     rcx, rcx
+       mov      rdx, gword ptr [rsi+48H]
+       ; gcrRegs +[rdx]
+       mov      gword ptr [rsp+20H], rdx
+       ; GC ptr vars +{V03}
+       test     rdx, rdx
        je       SHORT G_M19108_IG09
-						;; size=9 bbWeight=1.03 PerfScore 3.33
-G_M19108_IG04:        ; gcrefRegs=000000C2 {rcx rsi rdi}, byrefRegs=00000000 {}, byref, isz
+						;; size=14 bbWeight=1.03 PerfScore 4.36
+G_M19108_IG04:        ; gcVars=0000000000000004 {V03}, gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, gcvars, byref, isz
+       ; gcrRegs -[rdx]
        test     rdi, rdi
        jne      SHORT G_M19108_IG10
-       mov      rdx, 0xD1FFAB1E      ; const ptr
-       mov      rdx, gword ptr [rdx]
+       mov      rcx, 0xD1FFAB1E      ; const ptr
+       mov      rdx, gword ptr [rcx]
        ; gcrRegs +[rdx]
+       mov      rcx, gword ptr [rsp+20H]
+       ; gcrRegs +[rcx]
        mov      r8, rsi
        ; gcrRegs +[r8]
+       ; GC ptr vars -{V03}
        call     [hackishMethodName]
        ; gcrRegs -[rcx rdx rdi r8]
        ; gcr arg pop 0
-						;; size=27 bbWeight=1.03 PerfScore 6.92
+						;; size=32 bbWeight=1.03 PerfScore 7.94
 G_M19108_IG05:        ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz
        test     dword ptr [rsi+34H], 0xD1FFAB1E
        jne      SHORT G_M19108_IG11
@@ -98,10 +104,13 @@ G_M19108_IG08:        ; gcVars=0000000000000000 {}, gcrefRegs=000000C0 {rsi rdi}
        mov      rcx, gword ptr [rsi+48H]
        ; gcrRegs +[rcx]
        test     rcx, rcx
+       mov      gword ptr [rsp+20H], rcx
+       ; GC ptr vars +{V03}
        jne      SHORT G_M19108_IG04
-						;; size=48 bbWeight=0    PerfScore 0.00
+						;; size=53 bbWeight=0    PerfScore 0.00
 G_M19108_IG09:        ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz
        ; gcrRegs -[rcx rdi]
+       ; GC ptr vars -{V03}
        lea      rcx, bword ptr [rsi+50H]
        ; byrRegs +[rcx]
        call     [hackishMethodName]
@@ -109,25 +118,25 @@ G_M19108_IG09:        ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref,
        ; gcr arg pop 0
        jmp      SHORT G_M19108_IG05
 						;; size=12 bbWeight=0    PerfScore 0.00
-G_M19108_IG10:        ; gcrefRegs=000000C2 {rcx rsi rdi}, byrefRegs=00000000 {}, byref, isz
-       ; gcrRegs +[rcx rdi]
-       mov      gword ptr [rsp+20H], rcx
-       ; GC ptr vars +{V03}
+G_M19108_IG10:        ; gcVars=0000000000000004 {V03}, gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, gcvars, byref, isz
+       ; gcrRegs +[rdi]
+       ; GC ptr vars +{V02 V03}
        mov      r8, 0xD1FFAB1E      ; const ptr
        mov      r8, gword ptr [r8]
        ; gcrRegs +[r8]
        mov      rcx, rdi
+       ; gcrRegs +[rcx]
        mov      rdx, gword ptr [rsp+20H]
        ; gcrRegs +[rdx]
        mov      r9, rsi
        ; gcrRegs +[r9]
-       ; GC ptr vars -{V03}
+       ; GC ptr vars -{V02 V03}
        call     [hackishMethodName]
        ; gcrRegs -[rcx rdx rdi r8-r9]
        ; gcr arg pop 0
        test     dword ptr [rsi+34H], 0xD1FFAB1E
        je       SHORT G_M19108_IG06
-						;; size=44 bbWeight=0    PerfScore 0.00
+						;; size=39 bbWeight=0    PerfScore 0.00
 G_M19108_IG11:        ; gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, isz
        cmp      byte  ptr [(reloc)], 0      ; data for (null)hackishFieldName
        je       SHORT G_M19108_IG12
@@ -184,7 +193,7 @@ G_M19108_IG14:        ; , epilog, nogc, extend
        ; gcr arg pop 0
 						;; size=14 bbWeight=0    PerfScore 0.00

With the change we remove rcx from V03's preferences because it is going to conflict at the call:

 DefList: { N197.t140. CNS_INT }
 N199 (  5, 12) [000023] #---G------                         ▌  IND       ref    REG NA $85
 <RefPosition #127 @199 RefTypeUse <Ivl:27> BB07 regmask=[allInt] minReg=1 last wt=0.00>
 Interval 28: ref RefPositions {} physReg:NA Preferences=[allInt]
 <RefPosition #128 @200 RefTypeDef <Ivl:28> IND BB07 regmask=[allInt] minReg=1 wt=0.00>

 DefList: { N199.t23. IND }
 N201 (???,???) [000215] ----G------                         ▌  PUTARG_REG ref    REG r8
 <RefPosition #129 @201 RefTypeFixedReg <Reg:r8 > BB07 regmask=[r8] minReg=1 wt=0.00>
 <RefPosition #130 @201 RefTypeUse <Ivl:28> BB07 regmask=[r8] minReg=1 last fixed wt=0.00>
 Interval 29: ref RefPositions {} physReg:NA Preferences=[allInt]
 <RefPosition #131 @202 RefTypeFixedReg <Reg:r8 > BB07 regmask=[r8] minReg=1 wt=0.00>
 <RefPosition #132 @202 RefTypeDef <Ivl:29> PUTARG_REG BB07 regmask=[r8] minReg=1 fixed wt=0.00>
 
 DefList: { N201.t215. PUTARG_REG }
 N203 (  1,  1) [000021] -----------                         ▌  LCL_VAR   ref    V01 arg1         u:1 NA (last use) REG NA $81
 
 DefList: { N201.t215. PUTARG_REG }
 N205 (???,???) [000216] -----------                         ▌  PUTARG_REG ref    REG rcx
+Last use of V01 between a PUTARG and CALL. Removing occupied arg regs from preferences: [r8]
 <RefPosition #133 @205 RefTypeFixedReg <Reg:rcx> BB07 regmask=[rcx] minReg=1 wt=0.00>
 <RefPosition #134 @205 RefTypeUse <Ivl:1 V01> LCL_VAR BB07 regmask=[rcx] minReg=1 last fixed wt=302.51>
 Interval 30: ref RefPositions {} physReg:NA Preferences=[allInt]
 <RefPosition #135 @206 RefTypeFixedReg <Reg:rcx> BB07 regmask=[rcx] minReg=1 wt=0.00>
 <RefPosition #136 @206 RefTypeDef <Ivl:30> PUTARG_REG BB07 regmask=[rcx] minReg=1 fixed wt=0.00>
 
 DefList: { N201.t215. PUTARG_REG; N205.t216. PUTARG_REG }
 N207 (  1,  1) [000022] -----------                         ▌  LCL_VAR   ref    V03 loc1         u:1 NA (last use) REG NA <l:$340, c:$83>
 
 DefList: { N201.t215. PUTARG_REG; N205.t216. PUTARG_REG }
 N209 (???,???) [000217] -----------                         ▌  PUTARG_REG ref    REG rdx
+Last use of V03 between a PUTARG and CALL. Removing occupied arg regs from preferences: [rcx r8]

This means we end up with V03 in rdx instead of rcx in a previous BB, which leads to worse resolution:

@@ -20,22 +20,22 @@ def: {V03}
  in: {V00 V01 V02}
 out: {V00 V01 V02 V03}
 Var=Reg beg of BB03: V00=rsi V01=rdi V02=rbx 
-Var=Reg end of BB03: V00=rsi V01=rdi V03=rcx V02=rbx 
+Var=Reg end of BB03: V00=rsi V01=rdi V02=rbx 
 
 BB05
 use: {V01}
 def: {}
  in: {V00 V01 V02 V03}
 out: {V00 V01 V02 V03}
-Var=Reg beg of BB05: V00=rsi V01=rdi V03=rcx V02=rbx 
-Var=Reg end of BB05: V00=rsi V01=rdi V03=rcx V02=rbx 
+Var=Reg beg of BB05: V00=rsi V01=rdi V02=rbx 
+Var=Reg end of BB05: V00=rsi V01=rdi V02=rbx 
 
 BB06
 use: {V00 V03}
 def: {}
  in: {V00 V02 V03}
 out: {V00 V02}
-Var=Reg beg of BB06: V00=rsi V03=rcx V02=rbx 
+Var=Reg beg of BB06: V00=rsi V02=rbx 
 Var=Reg end of BB06: V00=rsi V02=rbx 
 
 BB08
@@ -128,4 +128,4 @@ Var=Reg end of BB14: none
 
 
 RESOLVING EDGES
-   BB07 top (BB05->BB07): move V03 from rcx to STK (Split)
+   BB02 bottom: move V03 from rcx to STK (SharedCritical)

@jakobbotsch
Copy link
Member Author

benchmarks.run regression
23 ( 7.40% of base) : 901.dasm - System.Array:Clear(System.Array,int,int)

Diff:

@@ -14,9 +14,9 @@
 ;  V04 loc1         [V04,T07] (  4,  3.50)     int  ->  r14        
 ;  V05 loc2         [V05,T02] (  7,  6   )    long  ->  r15        
 ;  V06 loc3         [V06,T04] (  4,  4   )     int  ->  rdx        
-;  V07 loc4         [V07,T09] (  3,  3   )    long  ->   r8        
-;  V08 loc5         [V08,T10] (  4,  2.50)   byref  ->  rcx         single-def
-;  V09 loc6         [V09,T05] (  6,  3.50)    long  ->  [rsp+20H]   spill-single-def
+;  V07 loc4         [V07,T09] (  3,  3   )    long  ->  rcx        
+;  V08 loc5         [V08,T10] (  4,  2.50)   byref  ->  [rsp+28H]   single-def
+;  V09 loc6         [V09,T05] (  6,  3.50)    long  ->  [rsp+30H]   spill-single-def
 ;  V10 loc7         [V10,T11] (  3,  1.50)     int  ->  rdx        
 ;  V11 OutArgs      [V11    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
 ;* V12 tmp1         [V12    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
@@ -24,7 +24,7 @@
 ;* V14 tmp3         [V14    ] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
 ;  V15 cse0         [V15,T08] (  6,  3   )     ref  ->  rdx         "CSE - aggressive"
 ;
-; Lcl frame size = 40
+; Lcl frame size = 56
 
 G_M44216_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
        push     r15
@@ -33,7 +33,7 @@ G_M44216_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
        push     rsi
        push     rbp
        push     rbx
-       sub      rsp, 40
+       sub      rsp, 56
        mov      rsi, rcx
        ; gcrRegs +[rsi]
        mov      ebx, edx
@@ -95,25 +95,29 @@ G_M44216_IG07:        ; gcrefRegs=00000040 {rsi}, byrefRegs=00000020 {rbp}, byre
        lea      ecx, [rdx+rdi]
        cmp      ecx, dword ptr [rsi+08H]
        ja       G_M44216_IG16
-       movzx    r8, word  ptr [r15]
-       mov      ecx, edx
-       imul     rcx, r8
-       add      rcx, rbp
-       ; byrRegs +[rcx]
-       mov      edx, edi
-       imul     r8, rdx
-       mov      rdx, r8
-       mov      qword ptr [rsp+20H], rdx
+       movzx    rcx, word  ptr [r15]
+       mov      edx, edx
+       imul     rdx, rcx
+       add      rdx, rbp
+       ; byrRegs +[rdx]
+       mov      bword ptr [rsp+28H], rdx
+       ; GC ptr vars +{V08}
+       mov      eax, edi
+       imul     rcx, rax
+       mov      qword ptr [rsp+30H], rcx
        test     dword ptr [r15], 0xD1FFAB1E
        je       SHORT G_M44216_IG10
-						;; size=78 bbWeight=1    PerfScore 20.75
-G_M44216_IG08:        ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref
+						;; size=80 bbWeight=1    PerfScore 21.50
+G_M44216_IG08:        ; gcVars=0000000000000400 {V08}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
        ; gcrRegs -[rsi]
-       ; byrRegs -[rbp]
+       ; byrRegs -[rdx rbp]
+       mov      rdx, rcx
        shr      rdx, 3
-						;; size=4 bbWeight=0.50 PerfScore 0.25
+       mov      rcx, bword ptr [rsp+28H]
+       ; byrRegs +[rcx]
+						;; size=12 bbWeight=0.50 PerfScore 0.88
 G_M44216_IG09:        ; , epilog, nogc, extend
-       add      rsp, 40
+       add      rsp, 56
        pop      rbx
        pop      rbp
        pop      rsi
@@ -123,22 +127,26 @@ G_M44216_IG09:        ; , epilog, nogc, extend
        tail.jmp [hackishMethodName]
        ; gcr arg pop 0
 						;; size=18 bbWeight=0.50 PerfScore 2.62
-G_M44216_IG10:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, gcvars, byref, isz
-       test     rdx, rdx
+G_M44216_IG10:        ; gcVars=0000000000000400 {V08}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, isz
+       ; byrRegs -[rcx]
+       test     rcx, rcx
        je       SHORT G_M44216_IG11
-       cmp      rdx, 768
+       cmp      rcx, 768
        ja       SHORT G_M44216_IG13
        xor      edx, edx
-       mov      r8, qword ptr [rsp+20H]
+       mov      rcx, bword ptr [rsp+28H]
+       ; byrRegs +[rcx]
+       mov      r8, qword ptr [rsp+30H]
+       ; GC ptr vars -{V08}
        call     CORINFO_HELP_MEMSET
        ; byrRegs -[rcx]
        ; gcr arg pop 0
-						;; size=26 bbWeight=0.50 PerfScore 2.38
+						;; size=31 bbWeight=0.50 PerfScore 2.88
 G_M44216_IG11:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
        nop      
 						;; size=1 bbWeight=0.50 PerfScore 0.12
 G_M44216_IG12:        ; , epilog, nogc, extend
-       add      rsp, 40
+       add      rsp, 56
        pop      rbx
        pop      rbp
        pop      rsi
@@ -147,12 +155,17 @@ G_M44216_IG12:        ; , epilog, nogc, extend
        pop      r15
        ret      
 						;; size=13 bbWeight=0.50 PerfScore 2.12
-G_M44216_IG13:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, gcvars, byref
+G_M44216_IG13:        ; gcVars=0000000000000400 {V08}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
+       ; GC ptr vars +{V08 V10}
+       mov      rdx, bword ptr [rsp+28H]
+       ; byrRegs +[rdx]
+       mov      rcx, rdx
        ; byrRegs +[rcx]
-       mov      rdx, qword ptr [rsp+20H]
-						;; size=5 bbWeight=0.50 PerfScore 0.50
+       mov      rdx, qword ptr [rsp+30H]
+       ; byrRegs -[rdx]
+						;; size=13 bbWeight=0.50 PerfScore 1.12
 G_M44216_IG14:        ; , epilog, nogc, extend
-       add      rsp, 40
+       add      rsp, 56
        pop      rbx
        pop      rbp
        pop      rsi
@@ -164,6 +177,7 @@ G_M44216_IG14:        ; , epilog, nogc, extend
 						;; size=18 bbWeight=0.50 PerfScore 2.62
 G_M44216_IG15:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
        ; byrRegs -[rcx]
+       ; GC ptr vars -{V08 V10}
        mov      ecx, 2
        call     [System.ThrowHelper:ThrowArgumentNullException(int)]
        ; gcr arg pop 0
@@ -175,7 +189,7 @@ G_M44216_IG16:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
        int3     
 						;; size=7 bbWeight=0    PerfScore 0.00

Thankfully not inside any inner loop or anything. We notice the following conflicts:

 DefList: {  }
 N297 (  1,  1) [000063] -----------                         ▌  LCL_VAR   long   V09 loc6         u:1 NA (last use) REG NA <l:$350, c:$351>
 
 DefList: {  }
 N299 (  1,  1) [000065] -c-----N---                         ▌  CNS_INT   long   3 REG NA $184
 Contained
 DefList: {  }
 N301 ( 22,  5) [000066] -----------                         ▌  RSZ       long   REG NA
 <RefPosition #135 @301 RefTypeUse <Ivl:9 V09> LCL_VAR BB15 regmask=[allInt] minReg=1 last wt=350.00>
 Interval 37: long RefPositions {} physReg:NA Preferences=[allInt]
 <RefPosition #136 @302 RefTypeDef <Ivl:37> RSZ BB15 regmask=[allInt] minReg=1 wt=200.00>
 Assigning related <I37> to <V09/L9>
 
 DefList: { N301.t66. RSZ }
 N303 (???,???) [000297] -----------                         ▌  PUTARG_REG long   REG rdx
 <RefPosition #137 @303 RefTypeFixedReg <Reg:rdx> BB15 regmask=[rdx] minReg=1 wt=50.00>
 <RefPosition #138 @303 RefTypeUse <Ivl:37> BB15 regmask=[rdx] minReg=1 last fixed wt=50.00>
 Interval 38: long RefPositions {} physReg:NA Preferences=[allInt]
 <RefPosition #139 @304 RefTypeFixedReg <Reg:rdx> BB15 regmask=[rdx] minReg=1 wt=50.00>
 <RefPosition #140 @304 RefTypeDef <Ivl:38> PUTARG_REG BB15 regmask=[rdx] minReg=1 fixed wt=200.00>
 
 DefList: { N303.t297. PUTARG_REG }
 N305 (  3,  2) [000062] -----------                         ▌  LCL_VAR   byref  V08 loc5         u:1 NA (last use) REG NA <l:$1c9, c:$1ca>
 
 DefList: { N303.t297. PUTARG_REG }
 N307 (???,???) [000298] -----------                         ▌  PUTARG_REG byref  REG rcx
+Last use of V08 between PUTARG and CALL. Removing occupied arg regs from preferences: [rdx]

...

 DefList: {  }
 N355 (  3,  2) [000058] -----------                         ▌  LCL_VAR   byref  V08 loc5         u:1 NA (last use) REG NA <l:$1c9, c:$1ca>
 
 DefList: {  }
 N357 (???,???) [000301] -----------                         ▌  PUTARG_REG byref  REG rcx
 <RefPosition #182 @357 RefTypeFixedReg <Reg:rcx> BB19 regmask=[rcx] minReg=1 wt=50.00>
 <RefPosition #183 @357 RefTypeUse <Ivl:8 V08> LCL_VAR BB19 regmask=[rcx] minReg=1 last fixed wt=250.00>
 Interval 41: byref RefPositions {} physReg:NA Preferences=[allInt]
 <RefPosition #184 @358 RefTypeFixedReg <Reg:rcx> BB19 regmask=[rcx] minReg=1 wt=50.00>
 <RefPosition #185 @358 RefTypeDef <Ivl:41> PUTARG_REG BB19 regmask=[rcx] minReg=1 fixed wt=200.00>
 
 DefList: { N357.t301. PUTARG_REG }
 N359 (  1,  1) [000201] -----------                         ▌  LCL_VAR   long   V09 loc6         u:1 NA (last use) REG NA <l:$350, c:$351>
 
 DefList: { N357.t301. PUTARG_REG }
 N361 (???,???) [000302] -----------                         ▌  PUTARG_REG long   REG rdx
+Last use of V09 between PUTARG and CALL. Removing occupied arg regs from preferences: [rcx]

Again this leads to some different allocation in earlier basic blocks that leads to a new resolution:

diff --git "a/.\\base.txt" "b/.\\diff.txt"
index 625a2ec..ecb5830 100644
--- "a/.\\base.txt"
+++ "b/.\\diff.txt"
@@ -100,14 +100,14 @@ def: {V07 V08 V09}
  in: {V02 V03 V05 V06}
 out: {V08 V09}
 Var=Reg beg of BB14: V02=rdi V05=r15 V06=rdx V03=rbp 
-Var=Reg end of BB14: V09=rdx V08=rcx 
+Var=Reg end of BB14: V09=rcx 
 
 BB15
 use: {V08 V09}
 def: {}
  in: {V08 V09}
 out: {}
-Var=Reg beg of BB15: V09=rdx V08=rcx 
+Var=Reg beg of BB15: V09=rcx 
 Var=Reg end of BB15: none
 
 BB16
@@ -115,23 +115,23 @@ use: {V09}
 def: {}
  in: {V08 V09}
 out: {V08 V09}
-Var=Reg beg of BB16: V09=rdx V08=rcx 
-Var=Reg end of BB16: V09=rdx V08=rcx 
+Var=Reg beg of BB16: V09=rcx 
+Var=Reg end of BB16: V09=rcx 
 
 BB17
 use: {V09}
 def: {}
  in: {V08 V09}
 out: {V08 V09}
-Var=Reg beg of BB17: V09=rdx V08=rcx 
-Var=Reg end of BB17: V08=rcx 
+Var=Reg beg of BB17: V09=rcx 
+Var=Reg end of BB17: none
 
 BB18
 use: {V08 V09}
 def: {}
  in: {V08 V09}
 out: {}
-Var=Reg beg of BB18: V08=rcx 
+Var=Reg beg of BB18: none
 Var=Reg end of BB18: none
 
 BB20
@@ -147,7 +147,7 @@ use: {V08 V09}
 def: {}
  in: {V08 V09}
 out: {}
-Var=Reg beg of BB19: V08=rcx 
+Var=Reg beg of BB19: V08=rdx 
 Var=Reg end of BB19: none
 
 BB02
@@ -168,4 +168,5 @@ Var=Reg end of BB13: none
 
 
 RESOLVING EDGES
-
+   BB19 top (BB17->BB19): move V08 from STK to rdx (Split)
+N001 (  3,  2) [000310] ----------z                  t310 =    LCL_VAR   byref  V08 loc5          rdx REG rdx

@kunalspathak
Copy link
Member

Can you paste the IR around the block that you highlighted for asp regression?

@kunalspathak
Copy link
Member

Oddly superpmi failed for windows-arm64 because it didn't find download the mch file properly?

@kunalspathak
Copy link
Member

Seems jitstressregs timed out for windows-arm64. Should we rerun it?

@jakobbotsch
Copy link
Member Author

Oddly superpmi failed for windows-arm64 because it didn't find download the mch file properly?

Yes, the collection run failed for win-arm64 yesterday.

Seems jitstressregs timed out for windows-arm64. Should we rerun it?

Retriggered that job.

@jakobbotsch
Copy link
Member Author

Can you paste the IR around the block that you highlighted for asp regression?

The IR for the block with the call looks like:

------------ BB07 [04D..05A) -> BB13 (cond), preds={BB05} succs={BB09,BB13}
N195 (???,???) [000187] -----------                            IL_OFFSET void   INLRT @ 0x04D[E-] REG NA
N197 (  3, 10) [000140] H----------                  t140 =    CNS_INT(h) long   0x1eeefc071d8 const ptr Fseq[hackishFieldName] REG r8 $104
                                                            ┌──▌  t140   long   
N199 (  5, 12) [000023] #---G------                   t23 =IND       ref    REG r8 $85
                                                            ┌──▌  t23    ref    
N201 (???,???) [000215] ----G------                  t215 =PUTARG_REG ref    REG r8
N203 (  1,  1) [000021] -----------                   t21 =    LCL_VAR   ref    V01 arg1         u:1 rdi (last use) REG rdi $81
                                                            ┌──▌  t21    ref    
N205 (???,???) [000216] -----------                  t216 =PUTARG_REG ref    REG rcx
N207 (  1,  1) [000022] ----------z                   t22 =    LCL_VAR   ref    V03 loc1         u:1 rdx (last use) REG rdx <l:$340, c:$83>
                                                            ┌──▌  t22    ref    
N209 (???,???) [000217] -----------                  t217 =PUTARG_REG ref    REG rdx
N211 (  1,  1) [000024] -----------                   t24 =    LCL_VAR   ref    V00 this         u:1 rsi REG rsi $80
                                                            ┌──▌  t24    ref    
N213 (???,???) [000218] -----------                  t218 =PUTARG_REG ref    REG r9
N215 (  3, 10) [000219] Hc---------                  t219 =    CNS_INT(h) long   0x7ff9cad63d98 ftn REG NA
                                                            ┌──▌  t219   long   
N217 (  5, 12) [000220] -c---------                  t220 =IND       long   REG NA
                                                            ┌──▌  t215   ref    arg2 in r8
                                                            ├──▌  t216   ref    arg0 in rcx
                                                            ├──▌  t217   ref    arg1 in rdx
                                                            ├──▌  t218   ref    arg3 in r9
                                                            ├──▌  t220   long   control expr
N219 ( 22, 24) [000025] --CXG------CALL      void   hackishModuleName.hackishMethodName REG NA $VN.Void
N221 (???,???) [000188] -----------                            IL_OFFSET void   INLRT @ 0x05A[E-] REG NA
N223 (  1,  1) [000167] -----------                  t167 =    LCL_VAR   ref    V00 this         u:1 rsi REG rsi $80
                                                            ┌──▌  t167   ref    
N225 (  2,  2) [000166] -c---------                  t166 =LEA(b+52) byref  REG NA
                                                            ┌──▌  t166   byref  
N227 (  4,  4) [000165] Vc--GO-----                  t165 =IND       int    REG NA $287
N229 (  1,  4) [000169] -c---------                  t169 =    CNS_INT   int    0x1600000 REG NA $48
                                                            ┌──▌  t165   int    
                                                            ├──▌  t169   int    
N231 (  8, 11) [000163] J---GO-N---TEST_EQ   void   REG NA
N233 ( 10, 13) [000162] ----GO-----JTRUE     void   REG NA $344

after LSRA

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Oct 7, 2022

Another timeout in the win-arm64 jitstressregs. Looks like CI is just too strapped for resources to run win-arm64 pri1 jobs. In any case I don't think we need to hold the PR up on it, I think the linux-arm64 coverage should be sufficient since the JITs are practically the same.

I have a small nit change that I will push, but do you have other any other feedback @kunalspathak? I think the regressions are few enough that we can see if we hit anything in perf triage and go from there.

@jakobbotsch
Copy link
Member Author

One thing I want to do before merging is also look into the linux-arm64 MinOpts TP diffs, which I haven't gotten around to yet.

@build-analysis build-analysis bot mentioned this pull request Oct 8, 2022
2 tasks
@jakobbotsch
Copy link
Member Author

I used @SingleAccretion's more detailed PIN tool to figure out where the instructions are spent in the linux-x64 MinOpts contexts, and it looks like majority of the new instructions are spent inside BuildUse. However, the only code added in there is the UpdatePreferencesOfDyingLocal call, and that call does not run in MinOpts contexts.

After some investigation it looks like MSVC has flipped some branches and is generating slightly different code for the function. So I'm going to trust that MSVC knows better there (besides, with native PGO/clang the function is presumably going to be compiled differently anyway).

For the curious, this is the diff in pseudocode for the compilations produced for BuildUse.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added some minor comment suggestions that you can do it in follow-up PR.

src/coreclr/jit/lsrabuild.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsrabuild.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsrabuild.cpp Outdated Show resolved Hide resolved
Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
@jakobbotsch
Copy link
Member Author

The failure looks like #74328

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants