JIT: Allow moving handler regions in fgRelocateEHRegions #101613

amanasifkhalid · 2024-04-26T16:43:06Z

Follow-up to #101611. fgRelocateEHRegions previously would not try to move handler regions due to implementation restrictions in fgDetermineFirstColdBlock that have since been lifted. This should only affect behavior on Windows x86 when we aren't using funclets.

Also, I noticed we never set Compiler::fgCanRelocateEHRegions to false anywhere, so it seems safe to remove this member.

dotnet-policy-service · 2024-04-26T16:43:32Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid · 2024-04-26T19:44:25Z

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs look like an improvement: A lot of the size savings seem to come from shorter jumps on non-exceptional paths when we decide to move handlers further down the method, while a lot of the size regressions stem from new jumps now that handlers don't fall through to the non-exceptional successor, as well as larger jumps to the handlers. Since we only move handlers if they're marked as rarely run, I think we want to do this movement as a matter of principle.

amanasifkhalid · 2024-04-26T20:47:24Z

SPMI replay failure is a timeout -- this change shouldn't affect x64, anyway.

AndyAyersMS · 2024-04-26T21:21:53Z

Probably worth running jitstress. Unfortunately, that is blocked by #101628

amanasifkhalid · 2024-04-29T15:44:31Z

Looks like jitstress is no longer blocked (at least not on x86). I'll run it now

amanasifkhalid · 2024-04-29T15:45:53Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-04-29T15:46:14Z

Azure Pipelines successfully started running 2 pipeline(s).

amanasifkhalid · 2024-05-02T16:56:10Z

@AndyAyersMS this is the only test to fail in the outerloop pipelines: The NullReferenceException triggered in the try block is never caught by the catch block. I'm able to repro the failure without any stress modes locally. If I replace the offending delegate func(null, 1, 2, 3, 4, 5, 6, 7, 8) with a normal method call that throws, the failure doesn't repro, so it seems specific to exception handling with delegates. The codegen looks fine... Here's without this change:

; Assembly listing for method Test35000:TestEntryPoint():int (Tier0-FullOpts)
; Emitting BLENDED_CODE for X86 with AVX - Windows
; Tier-0 switched to FullOpts code
; optimized code
; ebp based frame
; partially interruptible
; No PGO data
; 2 inlinees with PGO data; 8 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 loc0         [V00,T09] (  2,  2   )     ref  ->  eax         class-hnd single-def <System.Reflection.MethodInfo>
;  V01 loc1         [V01,T00] (  7,136   )     ref  ->  [ebp-0x28]  class-hnd EH-live single-def <System.Func`10[Test35000+TestData0,int,int,int,int,int,int,int,int,System.Object]>
;  V02 loc2         [V02,T02] (  2, 32   )     ref  ->  [ebp-0x2C]  class-hnd exact EH-live spill-single-def <Test35000+TestData0>
;  V03 loc3         [V03,T03] (  2, 32   )     ref  ->  [ebp-0x30]  class-hnd exact EH-live spill-single-def <Test35000+TestData1>
;  V04 loc4         [V04,T08] (  4,  2   )     int  ->  [ebp-0x20]  do-not-enreg[Z] EH-live
;  V05 loc5         [V05,T04] (  4, 25   )     int  ->  [ebp-0x24]  do-not-enreg[Z] EH-live
;  V06 loc6         [V06,T01] (  4,104   )     int  ->  edi        
;* V07 loc7         [V07    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <System.NullReferenceException>
;* V08 tmp0         [V08    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Test35000+TestData0>
;* V09 tmp1         [V09    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Test35000+TestData1>
;  V10 tmp2         [V10,T10] (  2,  0   )     ref  ->  ecx         class-hnd "impSpillSpecialSideEff" <System.NullReferenceException>
;  V11 tmp3         [V11,T06] (  3,  4.50)     ref  ->  eax         single-def "argument with side effect"
;  V12 EHSlots      [V12    ] (  1,  1   )  struct (16) [ebp-0x1C]  do-not-enreg[XS] must-init addr-exposed "lvaShadowSPslotsVar"
;  V13 rat0         [V13,T05] (  5,  7.50)     ref  ->  esi         "replacement local"
;  V14 rat1         [V14,T07] (  3,  2.50)     int  ->  ecx         "CSE for expectedClsNode"
;
; Lcl frame size = 40

G_M22950_IG01:  ;; offset=0x0000
       push     ebp
       mov      ebp, esp
       push     edi
       push     esi
       sub      esp, 40
       xor      eax, eax
       mov      dword ptr [ebp-0x1C], eax
       mov      dword ptr [ebp-0x18], eax
       mov      dword ptr [ebp-0x14], eax
       mov      dword ptr [ebp-0x10], eax
						;; size=22 bbWeight=1 PerfScore 7.75
G_M22950_IG02:  ;; offset=0x0016
       push     -1
       push     28
       push     0
       push     3
       push     0
       push     0
       mov      ecx, 0x872781C      ; 'TestData0'
       mov      edx, 0x8727830      ; 'MyMethod'
       call     [System.RuntimeType:GetMethodImplCommon(System.String,int,int,System.Reflection.Binder,int,System.Type[],System.Reflection.ParameterModifier[]):System.Reflection.MethodInfo:this]
       push     eax
       push     1
       mov      ecx, 0x8727850      ; 'System.Func`10[Test35000+TestData0,System.Int32,System.Int32,Sy'
       xor      edx, edx
       call     [System.Delegate:CreateDelegate(System.Type,System.Object,System.Reflection.MethodInfo,ubyte):System.Delegate]
       mov      esi, eax
       test     esi, esi
       je       SHORT G_M22950_IG05
						;; size=50 bbWeight=1 PerfScore 16.50
G_M22950_IG03:  ;; offset=0x0048
       mov      ecx, 0x96AD8F0      ; System.Func`10[Test35000+TestData0,int,int,int,int,int,int,int,int,System.Object]
       cmp      dword ptr [esi], ecx
       je       SHORT G_M22950_IG05
						;; size=9 bbWeight=0.50 PerfScore 2.12
G_M22950_IG04:  ;; offset=0x0051
       mov      edx, eax
       call     [CORINFO_HELP_CHKCASTANY]
       mov      esi, eax
						;; size=10 bbWeight=0.25 PerfScore 0.88
G_M22950_IG05:  ;; offset=0x005B
       mov      gword ptr [ebp-0x28], esi
       mov      ecx, 0x96AD688      ; Test35000+TestData0
       call     CORINFO_HELP_NEWSFAST
       mov      gword ptr [ebp-0x2C], eax
       mov      ecx, 0x96ADB78      ; Test35000+TestData1
       call     CORINFO_HELP_NEWSFAST
       mov      gword ptr [ebp-0x30], eax
       xor      edx, edx
       mov      dword ptr [ebp-0x20], edx
						;; size=34 bbWeight=1 PerfScore 6.75
G_M22950_IG06:  ;; offset=0x007D
       mov      dword ptr [ebp-0x24], edx
						;; size=3 bbWeight=1 PerfScore 1.00
G_M22950_IG07:  ;; offset=0x0080
       xor      edi, edi
						;; size=2 bbWeight=8 PerfScore 2.00
G_M22950_IG08:  ;; offset=0x0082
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       mov      edx, gword ptr [ebp-0x2C]
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       mov      edx, gword ptr [ebp-0x30]
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       inc      edi
       cmp      edi, 50
       jl       SHORT G_M22950_IG08
						;; size=56 bbWeight=32 PerfScore 944.00
G_M22950_IG09:  ;; offset=0x00BA
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       xor      edx, edx
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       jmp      SHORT G_M22950_IG11
						;; size=26 bbWeight=4 PerfScore 61.00
G_M22950_IG10:  ;; offset=0x00D4
       mov      ecx, eax
       mov      eax, dword ptr [ebp-0x20]
       inc      eax
       mov      dword ptr [ebp-0x20], eax
       call     [System.Console:WriteLine(System.Object)]
       call     CORINFO_HELP_ENDCATCH
       mov      esi, gword ptr [ebp-0x28]
						;; size=23 bbWeight=0 PerfScore 0.00
G_M22950_IG11:  ;; offset=0x00EB
       mov      eax, dword ptr [ebp-0x24]
       inc      eax
       mov      dword ptr [ebp-0x24], eax
       cmp      dword ptr [ebp-0x24], 10
       jl       SHORT G_M22950_IG07
						;; size=13 bbWeight=8 PerfScore 42.00
G_M22950_IG12:  ;; offset=0x00F8
       mov      eax, 100
       mov      ecx, 101
       cmp      dword ptr [ebp-0x20], 10
       cmovne   eax, ecx
						;; size=17 bbWeight=1 PerfScore 2.75
G_M22950_IG13:  ;; offset=0x0109
       lea      esp, [ebp-0x08]
       pop      esi
       pop      edi
       pop      ebp
       ret      
						;; size=7 bbWeight=1 PerfScore 3.00

; Total bytes of code 272, prolog size 22, PerfScore 1089.75, instruction count 102, allocated bytes for code 272 (MethodHash=0103a659) for method Test35000:TestEntryPoint():int (Tier0-FullOpts)
; ============================================================

and with:

; Assembly listing for method Test35000:TestEntryPoint():int (Tier0-FullOpts)
; Emitting BLENDED_CODE for X86 with AVX - Windows
; Tier-0 switched to FullOpts code
; optimized code
; ebp based frame
; partially interruptible
; No PGO data
; 2 inlinees with PGO data; 8 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 loc0         [V00,T09] (  2,  2   )     ref  ->  eax         class-hnd single-def <System.Reflection.MethodInfo>
;  V01 loc1         [V01,T00] (  7,136   )     ref  ->  [ebp-0x28]  class-hnd EH-live single-def <System.Func`10[Test35000+TestData0,int,int,int,int,int,int,int,int,System.Object]>
;  V02 loc2         [V02,T02] (  2, 32   )     ref  ->  [ebp-0x2C]  class-hnd exact EH-live spill-single-def <Test35000+TestData0>
;  V03 loc3         [V03,T03] (  2, 32   )     ref  ->  [ebp-0x30]  class-hnd exact EH-live spill-single-def <Test35000+TestData1>
;  V04 loc4         [V04,T08] (  4,  2   )     int  ->  [ebp-0x20]  do-not-enreg[Z] EH-live
;  V05 loc5         [V05,T04] (  4, 25   )     int  ->  [ebp-0x24]  do-not-enreg[Z] EH-live
;  V06 loc6         [V06,T01] (  4,104   )     int  ->  edi        
;* V07 loc7         [V07    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <System.NullReferenceException>
;* V08 tmp0         [V08    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Test35000+TestData0>
;* V09 tmp1         [V09    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Test35000+TestData1>
;  V10 tmp2         [V10,T10] (  2,  0   )     ref  ->  ecx         class-hnd "impSpillSpecialSideEff" <System.NullReferenceException>
;  V11 tmp3         [V11,T06] (  3,  4.50)     ref  ->  eax         single-def "argument with side effect"
;  V12 EHSlots      [V12    ] (  1,  1   )  struct (16) [ebp-0x1C]  do-not-enreg[XS] must-init addr-exposed "lvaShadowSPslotsVar"
;  V13 rat0         [V13,T05] (  5,  7.50)     ref  ->  esi         "replacement local"
;  V14 rat1         [V14,T07] (  3,  2.50)     int  ->  ecx         "CSE for expectedClsNode"
;
; Lcl frame size = 40

G_M22950_IG01:  ;; offset=0x0000
       push     ebp
       mov      ebp, esp
       push     edi
       push     esi
       sub      esp, 40
       xor      eax, eax
       mov      dword ptr [ebp-0x1C], eax
       mov      dword ptr [ebp-0x18], eax
       mov      dword ptr [ebp-0x14], eax
       mov      dword ptr [ebp-0x10], eax
						;; size=22 bbWeight=1 PerfScore 7.75
G_M22950_IG02:  ;; offset=0x0016
       push     -1
       push     28
       push     0
       push     3
       push     0
       push     0
       mov      ecx, 0x8FF7878      ; 'TestData0'
       mov      edx, 0x8FF788C      ; 'MyMethod'
       call     [System.RuntimeType:GetMethodImplCommon(System.String,int,int,System.Reflection.Binder,int,System.Type[],System.Reflection.ParameterModifier[]):System.Reflection.MethodInfo:this]
       push     eax
       push     1
       mov      ecx, 0x8FF78AC      ; 'System.Func`10[Test35000+TestData0,System.Int32,System.Int32,Sy'
       xor      edx, edx
       call     [System.Delegate:CreateDelegate(System.Type,System.Object,System.Reflection.MethodInfo,ubyte):System.Delegate]
       mov      esi, eax
       test     esi, esi
       je       SHORT G_M22950_IG05
						;; size=50 bbWeight=1 PerfScore 16.50
G_M22950_IG03:  ;; offset=0x0048
       mov      ecx, 0x9F7D5F0      ; System.Func`10[Test35000+TestData0,int,int,int,int,int,int,int,int,System.Object]
       cmp      dword ptr [esi], ecx
       je       SHORT G_M22950_IG05
						;; size=9 bbWeight=0.50 PerfScore 2.12
G_M22950_IG04:  ;; offset=0x0051
       mov      edx, eax
       call     [CORINFO_HELP_CHKCASTANY]
       mov      esi, eax
						;; size=10 bbWeight=0.25 PerfScore 0.88
G_M22950_IG05:  ;; offset=0x005B
       mov      gword ptr [ebp-0x28], esi
       mov      ecx, 0x9F7D388      ; Test35000+TestData0
       call     CORINFO_HELP_NEWSFAST
       mov      gword ptr [ebp-0x2C], eax
       mov      ecx, 0x9F7D878      ; Test35000+TestData1
       call     CORINFO_HELP_NEWSFAST
       mov      gword ptr [ebp-0x30], eax
       xor      edx, edx
       mov      dword ptr [ebp-0x20], edx
						;; size=34 bbWeight=1 PerfScore 6.75
G_M22950_IG06:  ;; offset=0x007D
       mov      dword ptr [ebp-0x24], edx
						;; size=3 bbWeight=1 PerfScore 1.00
G_M22950_IG07:  ;; offset=0x0080
       xor      edi, edi
						;; size=2 bbWeight=8 PerfScore 2.00
G_M22950_IG08:  ;; offset=0x0082
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       mov      edx, gword ptr [ebp-0x2C]
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       mov      edx, gword ptr [ebp-0x30]
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       inc      edi
       cmp      edi, 50
       jl       SHORT G_M22950_IG08
						;; size=56 bbWeight=32 PerfScore 944.00
G_M22950_IG09:  ;; offset=0x00BA
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       xor      edx, edx
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
						;; size=24 bbWeight=4 PerfScore 53.00
G_M22950_IG10:  ;; offset=0x00D2
       mov      eax, dword ptr [ebp-0x24]
       inc      eax
       mov      dword ptr [ebp-0x24], eax
       cmp      dword ptr [ebp-0x24], 10
       jl       SHORT G_M22950_IG07
						;; size=13 bbWeight=8 PerfScore 42.00
G_M22950_IG11:  ;; offset=0x00DF
       mov      eax, 100
       mov      ecx, 101
       cmp      dword ptr [ebp-0x20], 10
       cmovne   eax, ecx
						;; size=17 bbWeight=1 PerfScore 2.75
G_M22950_IG12:  ;; offset=0x00F0
       lea      esp, [ebp-0x08]
       pop      esi
       pop      edi
       pop      ebp
       ret      
						;; size=7 bbWeight=1 PerfScore 3.00
G_M22950_IG13:  ;; offset=0x00F7
       mov      ecx, eax
       mov      eax, dword ptr [ebp-0x20]
       inc      eax
       mov      dword ptr [ebp-0x20], eax
       call     [System.Console:WriteLine(System.Object)]
       call     CORINFO_HELP_ENDCATCH
       mov      esi, gword ptr [ebp-0x28]
       jmp      SHORT G_M22950_IG10
						;; size=25 bbWeight=0 PerfScore 0.00

; Total bytes of code 272, prolog size 22, PerfScore 1081.75, instruction count 102, allocated bytes for code 272 (MethodHash=0103a659) for method Test35000:TestEntryPoint():int (Tier0-FullOpts)
; ============================================================

We decided to move the catch block since it's cold, but we kept the try block in the loop. The only difference in codegen is the try block now falls into the bottom loop block, and the catch block has to jump to the bottom loop block, rather than the other way around. This looks like a runtime issue, considering the failure is specific to delegates, right?

AndyAyersMS · 2024-05-02T17:09:07Z

I wonder if this is one of those cases where we have to put a NOP after the call, if it's at the end of the try range, or else we can't figure out if we were in the try or not.

When you did your experiment of calling a method directly, did the call end the try or was there another instruction afterwards?

amanasifkhalid · 2024-05-02T17:49:11Z

When you did your experiment of calling a method directly, did the call end the try or was there another instruction afterwards?

The try block still ended with a call, which led me to think the codegen is fine. Though let me try disabling the jump-to-next removal optimization, and see if that fixes the failure locally...

AndyAyersMS · 2024-05-02T17:51:59Z

When you did your experiment of calling a method directly, did the call end the try or was there another instruction afterwards?

The try block still ended with a call, which led me to think the codegen is fine. Though let me try disabling the jump-to-next removal optimization, and see if that fixes the failure locally...

The clr-abi.md and code say this NOP is only needed for x64. I wonder if it is also needed for x86 but since historically for x86 the catch was always just after the try, the try would always end with something other than a call, so we just never noticed?

amanasifkhalid · 2024-05-02T18:04:17Z

The clr-abi.md and code say this NOP is only needed for x64. I wonder if it is also needed for x86 but since historically for x86 the catch was always just after the try, the try would always end with something other than a call, so we just never noticed?

I think you're right. Enabling the NOP check for x86 fixes this failure. New codegen:

; Assembly listing for method Test35000:TestEntryPoint():int (Tier0-FullOpts)
; Emitting BLENDED_CODE for X86 with AVX - Windows
; Tier-0 switched to FullOpts code
; optimized code
; ebp based frame
; partially interruptible
; No PGO data
; 2 inlinees with PGO data; 8 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 loc0         [V00,T09] (  2,  2   )     ref  ->  eax         class-hnd single-def <System.Reflection.MethodInfo>
;  V01 loc1         [V01,T00] (  7,136   )     ref  ->  [ebp-0x28]  class-hnd EH-live single-def <System.Func`10[Test35000+TestData0,int,int,int,int,int,int,int,int,System.Object]>
;  V02 loc2         [V02,T02] (  2, 32   )     ref  ->  [ebp-0x2C]  class-hnd exact EH-live spill-single-def <Test35000+TestData0>
;  V03 loc3         [V03,T03] (  2, 32   )     ref  ->  [ebp-0x30]  class-hnd exact EH-live spill-single-def <Test35000+TestData1>
;  V04 loc4         [V04,T08] (  4,  2   )     int  ->  [ebp-0x20]  do-not-enreg[Z] EH-live
;  V05 loc5         [V05,T04] (  4, 25   )     int  ->  [ebp-0x24]  do-not-enreg[Z] EH-live
;  V06 loc6         [V06,T01] (  4,104   )     int  ->  edi        
;* V07 loc7         [V07    ] (  0,  0   )     ref  ->  zero-ref    class-hnd <System.NullReferenceException>
;* V08 tmp0         [V08    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Test35000+TestData0>
;* V09 tmp1         [V09    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "NewObj constructor temp" <Test35000+TestData1>
;  V10 tmp2         [V10,T10] (  2,  0   )     ref  ->  ecx         class-hnd "impSpillSpecialSideEff" <System.NullReferenceException>
;  V11 tmp3         [V11,T06] (  3,  4.50)     ref  ->  eax         single-def "argument with side effect"
;  V12 EHSlots      [V12    ] (  1,  1   )  struct (16) [ebp-0x1C]  do-not-enreg[XS] must-init addr-exposed "lvaShadowSPslotsVar"
;  V13 rat0         [V13,T05] (  5,  7.50)     ref  ->  esi         "replacement local"
;  V14 rat1         [V14,T07] (  3,  2.50)     int  ->  ecx         "CSE for expectedClsNode"
;
; Lcl frame size = 40

G_M22950_IG01:  ;; offset=0x0000
       push     ebp
       mov      ebp, esp
       push     edi
       push     esi
       sub      esp, 40
       xor      eax, eax
       mov      dword ptr [ebp-0x1C], eax
       mov      dword ptr [ebp-0x18], eax
       mov      dword ptr [ebp-0x14], eax
       mov      dword ptr [ebp-0x10], eax
						;; size=22 bbWeight=1 PerfScore 7.75
G_M22950_IG02:  ;; offset=0x0016
       push     -1
       push     28
       push     0
       push     3
       push     0
       push     0
       mov      ecx, 0x810781C      ; 'TestData0'
       mov      edx, 0x8107830      ; 'MyMethod'
       call     [System.RuntimeType:GetMethodImplCommon(System.String,int,int,System.Reflection.Binder,int,System.Type[],System.Reflection.ParameterModifier[]):System.Reflection.MethodInfo:this]
       push     eax
       push     1
       mov      ecx, 0x8107850      ; 'System.Func`10[Test35000+TestData0,System.Int32,System.Int32,Sy'
       xor      edx, edx
       call     [System.Delegate:CreateDelegate(System.Type,System.Object,System.Reflection.MethodInfo,ubyte):System.Delegate]
       mov      esi, eax
       test     esi, esi
       je       SHORT G_M22950_IG05
						;; size=50 bbWeight=1 PerfScore 16.50
G_M22950_IG03:  ;; offset=0x0048
       mov      ecx, 0x908D8F0      ; System.Func`10[Test35000+TestData0,int,int,int,int,int,int,int,int,System.Object]
       cmp      dword ptr [esi], ecx
       je       SHORT G_M22950_IG05
						;; size=9 bbWeight=0.50 PerfScore 2.12
G_M22950_IG04:  ;; offset=0x0051
       mov      edx, eax
       call     [CORINFO_HELP_CHKCASTANY]
       mov      esi, eax
						;; size=10 bbWeight=0.25 PerfScore 0.88
G_M22950_IG05:  ;; offset=0x005B
       mov      gword ptr [ebp-0x28], esi
       mov      ecx, 0x908D688      ; Test35000+TestData0
       call     CORINFO_HELP_NEWSFAST
       mov      gword ptr [ebp-0x2C], eax
       mov      ecx, 0x908DB78      ; Test35000+TestData1
       call     CORINFO_HELP_NEWSFAST
       mov      gword ptr [ebp-0x30], eax
       xor      edx, edx
       mov      dword ptr [ebp-0x20], edx
						;; size=34 bbWeight=1 PerfScore 6.75
G_M22950_IG06:  ;; offset=0x007D
       mov      dword ptr [ebp-0x24], edx
						;; size=3 bbWeight=1 PerfScore 1.00
G_M22950_IG07:  ;; offset=0x0080
       xor      edi, edi
						;; size=2 bbWeight=8 PerfScore 2.00
G_M22950_IG08:  ;; offset=0x0082
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       mov      edx, gword ptr [ebp-0x2C]
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       mov      edx, gword ptr [ebp-0x30]
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       inc      edi
       cmp      edi, 50
       jl       SHORT G_M22950_IG08
						;; size=56 bbWeight=32 PerfScore 944.00
G_M22950_IG09:  ;; offset=0x00BA
       push     1
       push     2
       push     3
       push     4
       push     5
       push     6
       push     7
       push     8
       xor      edx, edx
       mov      ecx, gword ptr [esi+0x04]
       call     [esi+0x0C]System.Func`10[System.__Canon,int,int,int,int,int,int,int,int,System.__Canon]:Invoke(System.__Canon,int,int,int,int,int,int,int,int):System.__Canon:this
       nop      
						;; size=25 bbWeight=4 PerfScore 54.00
G_M22950_IG10:  ;; offset=0x00D3
       mov      eax, dword ptr [ebp-0x24]
       inc      eax
       mov      dword ptr [ebp-0x24], eax
       cmp      dword ptr [ebp-0x24], 10
       jl       SHORT G_M22950_IG07
						;; size=13 bbWeight=8 PerfScore 42.00
G_M22950_IG11:  ;; offset=0x00E0
       mov      eax, 100
       mov      ecx, 101
       cmp      dword ptr [ebp-0x20], 10
       cmovne   eax, ecx
						;; size=17 bbWeight=1 PerfScore 2.75
G_M22950_IG12:  ;; offset=0x00F1
       lea      esp, [ebp-0x08]
       pop      esi
       pop      edi
       pop      ebp
       ret      
						;; size=7 bbWeight=1 PerfScore 3.00
G_M22950_IG13:  ;; offset=0x00F8
       mov      ecx, eax
       mov      eax, dword ptr [ebp-0x20]
       inc      eax
       mov      dword ptr [ebp-0x20], eax
       call     [System.Console:WriteLine(System.Object)]
       call     CORINFO_HELP_ENDCATCH
       mov      esi, gword ptr [ebp-0x28]
       jmp      SHORT G_M22950_IG10
						;; size=25 bbWeight=0 PerfScore 0.00

; Total bytes of code 273, prolog size 22, PerfScore 1082.75, instruction count 103, allocated bytes for code 273 (MethodHash=0103a659) for method Test35000:TestEntryPoint():int (Tier0-FullOpts)
; ============================================================

Would you like me to turn it on in this PR, or a separate one? It might be interesting to see how many cases this fires for.

amanasifkhalid · 2024-05-02T20:01:27Z

I wonder if it is also needed for x86 but since historically for x86 the catch was always just after the try, the try would always end with something other than a call, so we just never noticed?

I'm surprised we haven't run into the other cases yet, like a call instruction preceding the start of a try region. Since we haven't faced issues for those, should we be conservative in our fix by only emitting the NOP for the "call before EH region end" case for now?

AndyAyersMS · 2024-05-02T20:25:46Z

I wonder if it is also needed for x86 but since historically for x86 the catch was always just after the try, the try would always end with something other than a call, so we just never noticed?

I'm surprised we haven't run into the other cases yet, like a call instruction preceding the start of a try region. Since we haven't faced issues for those, should we be conservative in our fix by only emitting the NOP for the "call before EH region end" case for now?

Good question. Seems like if the try-entering side was a problem we'd have already run into it somehow, since there is nothing preventing us from falling into a try?

AndyAyersMS · 2024-05-02T20:26:48Z

Would you like me to turn it on in this PR, or a separate one? It might be interesting to see how many cases this fires for.

Our guess is that it won't ever fire without this PR. If you verify that I don't see why a separate PR would be interesting.

amanasifkhalid · 2024-05-02T20:36:42Z

Our guess is that it won't ever fire without this PR. If you verify that I don't see why a separate PR would be interesting.

When I experimented with placing a call to a normal method that throws at the end of the try region, there wasn't any code after the call instruction, but it didn't fail, so I suspect there might be other methods that didn't fail, but we'll end up inserting NOPs for with the fix.

amanasifkhalid · 2024-05-03T19:16:29Z

I added a check to genCodeForBBList to emit a NOP on x86 if a try region with a catch handler ends with a call, and falls into its non-EH successor. Without this PR's changes to fgRelocateEHRegions this produces 40 diffs in the coreclr_tests collection. I'm guessing we were either getting (un)lucky and not needing to handle exceptions during the call from the try region in those cases, or for some reason, the call's return address wasn't being seen as outside the EH region by the runtime when handling exceptions.

I think this is the minimally impactful solution we can use to unblock this for now.

amanasifkhalid · 2024-05-03T21:32:30Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2024-05-03T21:32:47Z

Azure Pipelines successfully started running 1 pipeline(s).

amanasifkhalid · 2024-05-06T19:33:10Z

@AndyAyersMS emitting a NOP before the end of the try region seemed to fix the failure. Linux ARM64 failure is unrelated. Should we update the ABI documentation to cover this case?

AndyAyersMS · 2024-05-06T20:27:37Z

I would like to get confirmation from the runtime side that NOP padding for x86 EH makes sense.

@mangod9 who can we ask about this?

mangod9 · 2024-05-06T21:22:40Z

@janvorli.

…ntiguity later (#108914) Part of #107749. `Compiler::fgMoveColdBlocks` currently moves cold try blocks to the end of their innermost regions. This is problematic for our 3-opt layout plans: When identifying a candidate span of blocks to reorder, assuming that all cold blocks are at the end of the method vastly simplifies our implementation. However, if we have EH regions with their own cold sections, `fgMoveColdBlocks` will interleave hot and cold blocks. To facilitate later layout passes, we can simplify `fgMoveColdBlocks` to naively move all cold blocks to the end of the method, regardless of EH region, and rely on a "fixup" pass for making EH regions contiguous again. To start, I've tweaked `fgMoveColdBlocks` to break up try regions only. When handlers are placed in the funclet section, we don't need to do anything extra to get cold EH blocks out of the main method body's hot section. However, for jitted x86 code, we don't use the funclet model (yet), so cold handler blocks can still litter the main method body, hindering 3-opt's candidate space. I'd rather not expand on this PR's logic to rebuild handler regions if we can do something simpler, such as getting #101613 merged in, and using `fgRelocateEHRegions` to move all handlers to the end of the method under the assumption that they're cold (i.e. a pseudo-funclet region). Moving try entry blocks in `fgMoveColdBlocks` proved painful enough that I think we're better off leaving them as-is. Leaving each try region's entry in-place gives us a nice breadcrumb for reinserting the remaining blocks, and it might be beneficial to leave these entries in the candidate span of blocks for 3-opt, so we can effectively move entire try regions just by moving the entry. For try regions that are entirely cold, I can look into calling `fgRelocateEHRegions` before `fgMoveColdBlocks` on all platforms to quickly get these out of the way. All of this would be unnecessary if we could remove the VM's requirement of contiguous EH regions, and the codegen improvements would likely outweigh the additional VM complexity, though that's a conversation for another day.

…ntiguity later (dotnet#108914) Part of dotnet#107749. `Compiler::fgMoveColdBlocks` currently moves cold try blocks to the end of their innermost regions. This is problematic for our 3-opt layout plans: When identifying a candidate span of blocks to reorder, assuming that all cold blocks are at the end of the method vastly simplifies our implementation. However, if we have EH regions with their own cold sections, `fgMoveColdBlocks` will interleave hot and cold blocks. To facilitate later layout passes, we can simplify `fgMoveColdBlocks` to naively move all cold blocks to the end of the method, regardless of EH region, and rely on a "fixup" pass for making EH regions contiguous again. To start, I've tweaked `fgMoveColdBlocks` to break up try regions only. When handlers are placed in the funclet section, we don't need to do anything extra to get cold EH blocks out of the main method body's hot section. However, for jitted x86 code, we don't use the funclet model (yet), so cold handler blocks can still litter the main method body, hindering 3-opt's candidate space. I'd rather not expand on this PR's logic to rebuild handler regions if we can do something simpler, such as getting dotnet#101613 merged in, and using `fgRelocateEHRegions` to move all handlers to the end of the method under the assumption that they're cold (i.e. a pseudo-funclet region). Moving try entry blocks in `fgMoveColdBlocks` proved painful enough that I think we're better off leaving them as-is. Leaving each try region's entry in-place gives us a nice breadcrumb for reinserting the remaining blocks, and it might be beneficial to leave these entries in the candidate span of blocks for 3-opt, so we can effectively move entire try regions just by moving the entry. For try regions that are entirely cold, I can look into calling `fgRelocateEHRegions` before `fgMoveColdBlocks` on all platforms to quickly get these out of the way. All of this would be unnecessary if we could remove the VM's requirement of contiguous EH regions, and the codegen improvements would likely outweigh the additional VM complexity, though that's a conversation for another day.

amanasifkhalid · 2025-03-10T18:19:51Z

Superseded by #113330

Move handler regions in fgRelocateEHRegions

8ef4cb6

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 26, 2024

dotnet-policy-service bot assigned amanasifkhalid Apr 26, 2024

Merge branch 'main' into relocate-eh-handlers

f81ebcc

build-analysis bot mentioned this pull request May 1, 2024

System.Numerics.Tensors.Tests.SingleGenericTensorPrimitives.SpanScalarDestination_SpecialValues fails #101721

Closed

Emit NOP after call at try region end

04f3fbf

amanasifkhalid added this to the 10.0.0 milestone Aug 12, 2024

amanasifkhalid mentioned this pull request Oct 16, 2024

JIT: Break up try regions in Compiler::fgMoveColdBlocks, and fix contiguity later #108914

Merged

amanasifkhalid closed this Mar 10, 2025

amanasifkhalid mentioned this pull request Mar 12, 2025

Test failure: Regressions/coreclr/GitHub_35000/test35000/test35000.cmd #113106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Allow moving handler regions in fgRelocateEHRegions #101613

JIT: Allow moving handler regions in fgRelocateEHRegions #101613

amanasifkhalid commented Apr 26, 2024 •

edited

Loading

dotnet-policy-service bot commented Apr 26, 2024

amanasifkhalid commented Apr 26, 2024

amanasifkhalid commented Apr 26, 2024

AndyAyersMS commented Apr 26, 2024

amanasifkhalid commented Apr 29, 2024

amanasifkhalid commented Apr 29, 2024

azure-pipelines bot commented Apr 29, 2024

amanasifkhalid commented May 2, 2024

AndyAyersMS commented May 2, 2024

amanasifkhalid commented May 2, 2024

AndyAyersMS commented May 2, 2024

amanasifkhalid commented May 2, 2024

amanasifkhalid commented May 2, 2024

AndyAyersMS commented May 2, 2024

AndyAyersMS commented May 2, 2024

amanasifkhalid commented May 2, 2024

amanasifkhalid commented May 3, 2024 •

edited

Loading

amanasifkhalid commented May 3, 2024

azure-pipelines bot commented May 3, 2024

amanasifkhalid commented May 6, 2024

AndyAyersMS commented May 6, 2024

mangod9 commented May 6, 2024

amanasifkhalid commented Mar 10, 2025

JIT: Allow moving handler regions in fgRelocateEHRegions #101613

JIT: Allow moving handler regions in fgRelocateEHRegions #101613

Conversation

amanasifkhalid commented Apr 26, 2024 • edited Loading

dotnet-policy-service bot commented Apr 26, 2024

amanasifkhalid commented Apr 26, 2024

amanasifkhalid commented Apr 26, 2024

AndyAyersMS commented Apr 26, 2024

amanasifkhalid commented Apr 29, 2024

amanasifkhalid commented Apr 29, 2024

azure-pipelines bot commented Apr 29, 2024

amanasifkhalid commented May 2, 2024

AndyAyersMS commented May 2, 2024

amanasifkhalid commented May 2, 2024

AndyAyersMS commented May 2, 2024

amanasifkhalid commented May 2, 2024

amanasifkhalid commented May 2, 2024

AndyAyersMS commented May 2, 2024

AndyAyersMS commented May 2, 2024

amanasifkhalid commented May 2, 2024

amanasifkhalid commented May 3, 2024 • edited Loading

amanasifkhalid commented May 3, 2024

azure-pipelines bot commented May 3, 2024

amanasifkhalid commented May 6, 2024

AndyAyersMS commented May 6, 2024

mangod9 commented May 6, 2024

amanasifkhalid commented Mar 10, 2025

amanasifkhalid commented Apr 26, 2024 •

edited

Loading

amanasifkhalid commented May 3, 2024 •

edited

Loading