Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Windows/x64: 3 Regressions on 1/12/2024 5:22:47 PM #27332

Closed
performanceautofiler bot opened this issue Jan 16, 2024 · 4 comments
Closed

[Perf] Windows/x64: 3 Regressions on 1/12/2024 5:22:47 PM #27332

performanceautofiler bot opened this issue Jan 16, 2024 · 4 comments

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Jan 16, 2024

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Compare 38f53749f0de7026340bb9df98d0495d9e3b0e64
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Buffers.Text.Tests.Utf8ParserTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
20.96 ns 22.26 ns 1.06 0.02 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Buffers.Text.Tests.Utf8ParserTests*'

Payloads

Baseline
Compare

System.Buffers.Text.Tests.Utf8ParserTests.TryParseUInt64(value: 18446744073709551615)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Compare 38f53749f0de7026340bb9df98d0495d9e3b0e64
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
11.66 μs 15.37 μs 1.32 0.11 True
199.72 μs 217.10 μs 1.09 0.08 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'

Payloads

Baseline
Compare

Span.Sorting.QuickSortSpan(Size: 512)

ETL Files

Histogram

JIT Disasms

Span.Sorting.BubbleSortArray(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins
Copy link
Member

@jakobbotsch
Copy link
Member

System.Buffers.Text.Tests.Utf8ParserTests.TryParseUInt64(value: 18446744073709551615)

Hot functions:

  • (78.49%) Utf8Parser.TryParseUInt64D (Tier-1)
    • Has diffs
  • (11.02%) Utf8Parser.TryParse (Tier-1)
    • No diffs
  • (6.14%) Utf8ParserTests.TryParseUInt64 (Tier-1)
    • No diffs
  • (1.92%) Runnable_0.WorkloadActionUnroll (FullOpt)
    • No diffs
Diffs

[System.Private.CoreLib]Utf8Parser.TryParseUInt64D(value class System.ReadOnlySpan`1<unsigned int8>,unsigned int64&,int32&)

 ; optimized using Dynamic PGO
 ; rsp based frame
 ; fully interruptible
-; with Dynamic PGO: edge weights are valid, and fgCalledCount is 20844
+; with Dynamic PGO: edge weights are valid, and fgCalledCount is 20472
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
@@ -183,13 +183,13 @@ G_M27482_IG17:
 ;  V01 arg1         [V01,T06] (  4,  3   )   byref  ->  rdx         single-def
 ;  V02 arg2         [V02,T07] (  4,  3   )   byref  ->   r8         single-def
 ;  V03 loc0         [V03,T08] (  3,  3   )    long  ->   r9        
-;  V04 loc1         [V04,T01] (  9, 58.53)    long  ->   r9        
-;  V05 loc2         [V05,T00] ( 10, 78.39)     int  ->  rax        
+;  V04 loc1         [V04,T01] (  9, 59.53)    long  ->   r9        
+;  V05 loc2         [V05,T00] ( 10, 79.71)     int  ->  rax        
 ;  V06 loc3         [V06,T09] (  3,  0   )    long  ->  r11        
-;  V07 loc4         [V07,T02] (  5, 57.51)    long  ->  r11        
+;  V07 loc4         [V07,T02] (  5, 58.53)    long  ->  r11        
 ;  V08 OutArgs      [V08    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;  V09 tmp1         [V09,T04] (  4, 20.84)   byref  ->  r10         single-def "field V00._reference (fldOffset=0x0)" P-INDEP
-;  V10 tmp2         [V10,T03] (  5, 22.86)     int  ->  rcx         "field V00._length (fldOffset=0x8)" P-INDEP
+;  V09 tmp1         [V09,T04] (  4, 21.18)   byref  ->  r10         single-def "field V00._reference (fldOffset=0x0)" P-INDEP
+;  V10 tmp2         [V10,T03] (  5, 23.18)     int  ->  rcx         "field V00._length (fldOffset=0x8)" P-INDEP
 ;* V11 tmp3         [V11    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.ReadOnlySpan`1[ubyte]>
 ;
 ; Lcl frame size = 32
@@ -202,86 +202,85 @@ G_M27482_IG02:
        mov      r10, bword ptr [rcx]
        mov      ecx, dword ptr [rcx+0x08]
        test     ecx, ecx
-       je       G_M27482_IG12
+       je       G_M27482_IG10
        movzx    rax, byte  ptr [r10]
        add      eax, -48
        mov      r9d, eax
        cmp      r9d, 9
-       ja       G_M27482_IG12
+       ja       SHORT G_M27482_IG10
        mov      eax, 1
        cmp      ecx, 20
-       jl       SHORT G_M27482_IG09
-						;; size=44 bbWeight=1 PerfScore 10.50
+       jl       SHORT G_M27482_IG07
+       jmp      SHORT G_M27482_IG03
+       align    [0 bytes for IG03]
+						;; size=42 bbWeight=1 PerfScore 12.50
 G_M27482_IG03:
        cmp      eax, ecx
-       jae      SHORT G_M27482_IG06
-						;; size=4 bbWeight=19.84 PerfScore 24.80
+       jae      SHORT G_M27482_IG08
+						;; size=4 bbWeight=20.17 PerfScore 25.22
 G_M27482_IG04:
        mov      r11d, eax
        movzx    r11, byte  ptr [r10+r11]
        add      r11d, -48
        cmp      r11d, 9
-       ja       SHORT G_M27482_IG06
+       ja       SHORT G_M27482_IG08
        inc      eax
        mov      rbx, 0xD1FFAB1E
        cmp      r9, rbx
-       jae      SHORT G_M27482_IG08
-						;; size=35 bbWeight=18.84 PerfScore 103.64
+       jae      SHORT G_M27482_IG06
+						;; size=35 bbWeight=19.18 PerfScore 105.47
 G_M27482_IG05:
        lea      r9, [r9+4*r9]
        lea      r9, [r11+2*r9]
        jmp      SHORT G_M27482_IG03
-						;; size=10 bbWeight=17.86 PerfScore 53.58
+						;; size=10 bbWeight=18.17 PerfScore 54.52
 G_M27482_IG06:
+       mov      rbx, 0xD1FFAB1E
+       cmp      r9, rbx
+       jne      SHORT G_M27482_IG10
+       cmp      r11d, 5
+       ja       SHORT G_M27482_IG10
+       lea      r9, [r11-0x06]
+       jmp      SHORT G_M27482_IG03
+						;; size=27 bbWeight=1.00 PerfScore 5.27
+G_M27482_IG07:
+       cmp      eax, ecx
+       jb       SHORT G_M27482_IG13
+						;; size=4 bbWeight=0.01 PerfScore 0.01
+G_M27482_IG08:
        mov      dword ptr [r8], eax
        mov      qword ptr [rdx], r9
        mov      eax, 1
 						;; size=11 bbWeight=1 PerfScore 2.25
-G_M27482_IG07:
+G_M27482_IG09:
        add      rsp, 32
        pop      rbx
        ret      
 						;; size=6 bbWeight=1 PerfScore 1.75
-G_M27482_IG08:
-       mov      rbx, 0xD1FFAB1E
-       cmp      r9, rbx
-       jne      SHORT G_M27482_IG12
-       cmp      r11d, 5
-       ja       SHORT G_M27482_IG12
-       lea      r9, [r11-0x06]
-       jmp      SHORT G_M27482_IG03
-						;; size=27 bbWeight=0.98 PerfScore 5.16
-G_M27482_IG09:
-       cmp      eax, ecx
-       jb       SHORT G_M27482_IG11
-						;; size=4 bbWeight=0.02 PerfScore 0.03
 G_M27482_IG10:
-       jmp      SHORT G_M27482_IG06
-						;; size=2 bbWeight=0.01 PerfScore 0.01
-G_M27482_IG11:
-       mov      r11d, eax
-       movzx    r11, byte  ptr [r10+r11]
-       add      r11d, -48
-       cmp      r11d, 9
-       ja       SHORT G_M27482_IG06
-       lea      r9, [r9+4*r9]
-       lea      r9, [r11+2*r9]
-       inc      eax
-       jmp      SHORT G_M27482_IG09
-						;; size=30 bbWeight=0 PerfScore 0.00
-G_M27482_IG12:
        xor      eax, eax
        mov      dword ptr [r8], eax
 						;; size=5 bbWeight=0 PerfScore 0.00
-G_M27482_IG13:
+G_M27482_IG11:
        mov      qword ptr [rdx], rax
 						;; size=3 bbWeight=0 PerfScore 0.00
-G_M27482_IG14:
+G_M27482_IG12:
        add      rsp, 32
        pop      rbx
        ret      
 						;; size=6 bbWeight=0 PerfScore 0.00
+G_M27482_IG13:
+       mov      r11d, eax
+       movzx    r11, byte  ptr [r10+r11]
+       add      r11d, -48
+       cmp      r11d, 9
+       ja       SHORT G_M27482_IG08
+       lea      r9, [r9+4*r9]
+       lea      r9, [r11+2*r9]
+       inc      eax
+       jmp      SHORT G_M27482_IG07
+						;; size=30 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 192, prolog size 5, PerfScore 222.16, instruction count 59, allocated bytes for code 192 (MethodHash=069294a5) for method System.Buffers.Text.Utf8Parser:TryParseUInt64D(System.ReadOnlySpan`1[ubyte],byref,byref):ubyte (Tier1)
+; Total bytes of code 188, prolog size 5, PerfScore 227.43, instruction count 60, allocated bytes for code 192 (MethodHash=069294a5) for method System.Buffers.Text.Utf8Parser:TryParseUInt64D(System.ReadOnlySpan`1[ubyte],byref,byref):ubyte (Tier1)
 ; ============================================================
 

Span.Sorting.QuickSortSpan(Size: 512)

Hot functions:

  • (100.00%) Sorting.TestQuickSortSpan (Tier-1)
    • Has diffs
Diffs

[MicroBenchmarks]Sorting.TestQuickSortSpan(value class System.Span`1<int32>)

 ; optimized using Dynamic PGO
 ; rsp based frame
 ; fully interruptible
-; with Dynamic PGO: edge weights are invalid, and fgCalledCount is 297.53
+; with Dynamic PGO: edge weights are invalid, and fgCalledCount is 299.35
 ; 1 inlinees with PGO data; 2 single block inlinees; 1 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 arg0         [V00,T14] (  5,  9   )   byref  ->  rcx         ld-addr-op single-def
 ;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
-;  V02 loc1         [V02,T08] (  3, 60.28)     int  ->   r8        
-;  V03 loc2         [V03,T00] ( 15,736.32)     int  ->  rbx        
-;  V04 loc3         [V04,T01] ( 10,706.76)     int  ->  rax        
-;  V05 loc4         [V05,T04] (  3,214.53)     int  ->  rdx        
-;  V06 loc5         [V06,T07] (  4, 64.11)     int  ->  r10        
+;  V02 loc1         [V02,T08] (  3, 59.91)     int  ->   r8        
+;  V03 loc2         [V03,T00] ( 15,733.84)     int  ->  rbx        
+;  V04 loc3         [V04,T01] ( 10,700.16)     int  ->  rax        
+;  V05 loc4         [V05,T04] (  3,213.29)     int  ->  rdx        
+;  V06 loc5         [V06,T07] (  4, 63.72)     int  ->  r10        
 ;  V07 OutArgs      [V07    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V08 tmp1         [V08    ] (  0,  0   )  struct (16) zero-ref    "spilled call-like call argument" <System.Span`1[int]>
 ;* V09 tmp2         [V09    ] (  0,  0   )  struct (16) zero-ref    "spilled call-like call argument" <System.Span`1[int]>
 ;* V10 tmp3         [V10    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
 ;* V11 tmp4         [V11    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
-;  V12 tmp5         [V12,T06] (  4, 81.31)     int  ->  rcx         "Inlining Arg"
+;  V12 tmp5         [V12,T06] (  4, 80.82)     int  ->  rcx         "Inlining Arg"
 ;* V13 tmp6         [V13    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
-;  V14 tmp7         [V14,T09] (  2, 40.65)   byref  ->  rax         single-def "Inlining Arg"
-;  V15 tmp8         [V15,T10] (  2, 40.65)     int  ->  rdi         "Inlining Arg"
+;  V14 tmp7         [V14,T09] (  2, 40.41)   byref  ->  rax         single-def "Inlining Arg"
+;  V15 tmp8         [V15,T10] (  2, 40.41)     int  ->  rdi         "Inlining Arg"
 ;* V16 tmp9         [V16    ] (  0,  0   )   byref  ->  zero-ref    "field V08._reference (fldOffset=0x0)" P-INDEP
 ;* V17 tmp10        [V17    ] (  0,  0   )     int  ->  zero-ref    "field V08._length (fldOffset=0x8)" P-INDEP
 ;* V18 tmp11        [V18    ] (  0,  0   )   byref  ->  zero-ref    "field V09._reference (fldOffset=0x0)" P-INDEP
 ;* V19 tmp12        [V19    ] (  0,  0   )     int  ->  zero-ref    "field V09._length (fldOffset=0x8)" P-INDEP
 ;* V20 tmp13        [V20,T15] (  0,  0   )   byref  ->  zero-ref    single-def "field V10._reference (fldOffset=0x0)" P-INDEP
-;  V21 tmp14        [V21,T12] (  2, 20.33)     int  ->  rcx         single-def "field V10._length (fldOffset=0x8)" P-INDEP
-;  V22 tmp15        [V22,T11] (  2, 20.33)   byref  ->  rax         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
-;  V23 tmp16        [V23,T13] (  2, 20.33)     int  ->  rdi         single-def "field V13._length (fldOffset=0x8)" P-INDEP
-;  V24 tmp17        [V24,T02] ( 12,348.29)   byref  ->  rsi         single-def "V00.[000..008)"
-;  V25 tmp18        [V25,T03] ( 10,302.24)     int  ->  rdi         single-def "V00.[008..012)"
-;  V26 tmp19        [V26    ] (  6,121.96)  struct (16) [rsp+0x20]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.Span`1[int]>
-;  V27 cse0         [V27,T05] (  6, 96.16)    long  ->  rcx         multi-def "CSE - aggressive"
+;  V21 tmp14        [V21,T12] (  2, 20.20)     int  ->  rcx         single-def "field V10._length (fldOffset=0x8)" P-INDEP
+;  V22 tmp15        [V22,T11] (  2, 20.20)   byref  ->  rax         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
+;  V23 tmp16        [V23,T13] (  2, 20.20)     int  ->  rdi         single-def "field V13._length (fldOffset=0x8)" P-INDEP
+;  V24 tmp17        [V24,T02] ( 12,346.23)   byref  ->  rsi         single-def "V00.[000..008)"
+;  V25 tmp18        [V25,T03] ( 10,300.47)     int  ->  rdi         single-def "V00.[008..012)"
+;  V26 tmp19        [V26    ] (  6,121.22)  struct (16) [rsp+0x20]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.Span`1[int]>
+;  V27 cse0         [V27,T05] (  6, 95.58)    long  ->  rcx         multi-def "CSE - aggressive"
 ;
 ; Lcl frame size = 48
 
@@ -356,14 +356,15 @@ G_M24415_IG02:
        mov      rsi, bword ptr [rcx]
        mov      edi, dword ptr [rcx+0x08]
        jmp      SHORT G_M24415_IG05
-						;; size=8 bbWeight=1 PerfScore 6.00
+       align    [11 bytes for IG03]
+						;; size=19 bbWeight=1 PerfScore 6.00
 G_M24415_IG03:
        inc      ebx
-						;; size=2 bbWeight=70.59 PerfScore 17.65
+						;; size=2 bbWeight=70.61 PerfScore 17.65
 G_M24415_IG04:
        cmp      ebx, eax
        jge      SHORT G_M24415_IG08
-						;; size=4 bbWeight=104.92 PerfScore 131.15
+						;; size=4 bbWeight=104.72 PerfScore 130.90
 G_M24415_IG05:
        cmp      ebx, edi
        jae      G_M24415_IG16
@@ -373,25 +374,25 @@ G_M24415_IG05:
 						;; size=15 bbWeight=100.00 PerfScore 550.00
 G_M24415_IG06:
        jmp      SHORT G_M24415_IG08
-						;; size=2 bbWeight=29.40 PerfScore 58.80
+						;; size=2 bbWeight=29.39 PerfScore 58.78
 G_M24415_IG07:
        dec      eax
-						;; size=2 bbWeight=82.48 PerfScore 20.62
+						;; size=2 bbWeight=81.43 PerfScore 20.36
 G_M24415_IG08:
        cmp      eax, ebx
        jle      SHORT G_M24415_IG10
-						;; size=4 bbWeight=116.80 PerfScore 146.00
+						;; size=4 bbWeight=115.54 PerfScore 144.43
 G_M24415_IG09:
        cmp      eax, edi
        jae      G_M24415_IG16
        mov      ecx, eax
        cmp      dword ptr [rsi+4*rcx], edx
        jge      SHORT G_M24415_IG07
-						;; size=15 bbWeight=106.64 PerfScore 586.51
+						;; size=15 bbWeight=105.44 PerfScore 579.93
 G_M24415_IG10:
        cmp      ebx, eax
        jge      SHORT G_M24415_IG12
-						;; size=4 bbWeight=34.32 PerfScore 42.90
+						;; size=4 bbWeight=34.11 PerfScore 42.64
 G_M24415_IG11:
        cmp      ebx, edi
        jae      G_M24415_IG16
@@ -405,11 +406,11 @@ G_M24415_IG11:
        mov      ecx, eax
        mov      dword ptr [rsi+4*rcx], r10d
        jmp      SHORT G_M24415_IG04
-						;; size=37 bbWeight=24.16 PerfScore 271.79
+						;; size=37 bbWeight=24.01 PerfScore 270.14
 G_M24415_IG12:
        cmp      ebx, r8d
        je       SHORT G_M24415_IG14
-						;; size=5 bbWeight=44.49 PerfScore 55.61
+						;; size=5 bbWeight=44.22 PerfScore 55.27
 G_M24415_IG13:
        cmp      ebx, edi
        jae      SHORT G_M24415_IG16
@@ -420,7 +421,7 @@ G_M24415_IG13:
        jae      SHORT G_M24415_IG16
        mov      ecx, r8d
        mov      dword ptr [rsi+4*rcx], r10d
-						;; size=25 bbWeight=7.90 PerfScore 55.27
+						;; size=25 bbWeight=7.85 PerfScore 54.93
 G_M24415_IG14:
        cmp      ebx, edi
        ja       SHORT G_M24415_IG17
@@ -440,7 +441,7 @@ G_M24415_IG14:
        lea      rcx, [rsp+0x20]
        call     [Span.Sorting:TestQuickSortSpan(System.Span`1[int])]
        nop      
-						;; size=62 bbWeight=10.16 PerfScore 157.54
+						;; size=62 bbWeight=10.10 PerfScore 156.58
 G_M24415_IG15:
        add      rsp, 248
        pop      rbx
@@ -448,7 +449,7 @@ G_M24415_IG15:
        pop      rdi
        pop      rbp
        ret      
-						;; size=12 bbWeight=10.16 PerfScore 33.03
+						;; size=12 bbWeight=10.10 PerfScore 32.83
 G_M24415_IG16:
        call     CORINFO_HELP_RNGCHKFAIL
 						;; size=5 bbWeight=0 PerfScore 0.00
@@ -457,7 +458,7 @@ G_M24415_IG17:
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 286, prolog size 77, PerfScore 2176.97, instruction count 84, allocated bytes for code 286 (MethodHash=e254a0a0) for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1-OSR)
+; Total bytes of code 297, prolog size 77, PerfScore 2165.64, instruction count 85, allocated bytes for code 297 (MethodHash=e254a0a0) for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1-OSR)
 ; ============================================================
 
 ; Assembly listing for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Instrumented Tier0)
@@ -765,38 +766,38 @@ G_M24415_IG25:
 ; optimized using Dynamic PGO
 ; rsp based frame
 ; fully interruptible
-; with Dynamic PGO: edge weights are invalid, and fgCalledCount is 163942.4
+; with Dynamic PGO: edge weights are invalid, and fgCalledCount is 217344
 ; 2 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 arg0         [V00,T14] (  5,  9   )   byref  ->  rcx         ld-addr-op single-def
 ;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
-;  V02 loc1         [V02,T08] (  3, 60.80)     int  ->   r8        
-;  V03 loc2         [V03,T00] ( 15,736.90)     int  ->  rbx        
-;  V04 loc3         [V04,T01] ( 10,702.08)     int  ->  rax        
-;  V05 loc4         [V05,T04] (  3,213.58)     int  ->  rdx        
-;  V06 loc5         [V06,T07] (  4, 64.23)     int  ->  r10        
+;  V02 loc1         [V02,T08] (  3, 61.03)     int  ->   r8        
+;  V03 loc2         [V03,T00] ( 15,738.42)     int  ->  rbx        
+;  V04 loc3         [V04,T01] ( 10,705.78)     int  ->  rax        
+;  V05 loc4         [V05,T04] (  3,213.96)     int  ->  rdx        
+;  V06 loc5         [V06,T07] (  4, 64.41)     int  ->  r10        
 ;  V07 OutArgs      [V07    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V08 tmp1         [V08    ] (  0,  0   )  struct (16) zero-ref    "spilled call-like call argument" <System.Span`1[int]>
 ;* V09 tmp2         [V09    ] (  0,  0   )  struct (16) zero-ref    "spilled call-like call argument" <System.Span`1[int]>
 ;* V10 tmp3         [V10    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
 ;* V11 tmp4         [V11    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
-;  V12 tmp5         [V12,T06] (  4, 82.69)     int  ->  rcx         "Inlining Arg"
+;  V12 tmp5         [V12,T06] (  4, 83.83)     int  ->  rcx         "Inlining Arg"
 ;* V13 tmp6         [V13    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
-;  V14 tmp7         [V14,T09] (  2, 41.34)   byref  ->  rax         single-def "Inlining Arg"
-;  V15 tmp8         [V15,T10] (  2, 41.34)     int  ->  rdi         "Inlining Arg"
+;  V14 tmp7         [V14,T09] (  2, 41.91)   byref  ->  rax         single-def "Inlining Arg"
+;  V15 tmp8         [V15,T10] (  2, 41.91)     int  ->  rdi         "Inlining Arg"
 ;* V16 tmp9         [V16    ] (  0,  0   )   byref  ->  zero-ref    "field V08._reference (fldOffset=0x0)" P-INDEP
 ;* V17 tmp10        [V17    ] (  0,  0   )     int  ->  zero-ref    "field V08._length (fldOffset=0x8)" P-INDEP
 ;* V18 tmp11        [V18    ] (  0,  0   )   byref  ->  zero-ref    "field V09._reference (fldOffset=0x0)" P-INDEP
 ;* V19 tmp12        [V19    ] (  0,  0   )     int  ->  zero-ref    "field V09._length (fldOffset=0x8)" P-INDEP
 ;* V20 tmp13        [V20,T15] (  0,  0   )   byref  ->  zero-ref    single-def "field V10._reference (fldOffset=0x0)" P-INDEP
-;  V21 tmp14        [V21,T12] (  2, 20.67)     int  ->  rcx         single-def "field V10._length (fldOffset=0x8)" P-INDEP
-;  V22 tmp15        [V22,T11] (  2, 20.67)   byref  ->  rax         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
-;  V23 tmp16        [V23,T13] (  2, 20.67)     int  ->  rdi         single-def "field V13._length (fldOffset=0x8)" P-INDEP
-;  V24 tmp17        [V24,T02] ( 12,347.66)   byref  ->  rsi         single-def "V00.[000..008)"
-;  V25 tmp18        [V25,T03] ( 10,301.79)     int  ->  rdi         single-def "V00.[008..012)"
-;  V26 tmp19        [V26    ] (  6,124.03)  struct (16) [rsp+0x20]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.Span`1[int]>
-;  V27 cse0         [V27,T05] (  6, 96.34)    long  ->  rcx         multi-def "CSE - aggressive"
+;  V21 tmp14        [V21,T12] (  2, 20.96)     int  ->  rcx         single-def "field V10._length (fldOffset=0x8)" P-INDEP
+;  V22 tmp15        [V22,T11] (  2, 20.96)   byref  ->  rax         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
+;  V23 tmp16        [V23,T13] (  2, 20.96)     int  ->  rdi         single-def "field V13._length (fldOffset=0x8)" P-INDEP
+;  V24 tmp17        [V24,T02] ( 12,348.63)   byref  ->  rsi         single-def "V00.[000..008)"
+;  V25 tmp18        [V25,T03] ( 10,302.75)     int  ->  rdi         single-def "V00.[008..012)"
+;  V26 tmp19        [V26    ] (  6,125.74)  struct (16) [rsp+0x20]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.Span`1[int]>
+;  V27 cse0         [V27,T05] (  6, 96.62)    long  ->  rcx         multi-def "CSE - aggressive"
 ;
 ; Lcl frame size = 48
 
@@ -818,14 +819,15 @@ G_M24415_IG02:
        mov      rsi, bword ptr [rcx]
        mov      edi, dword ptr [rcx+0x08]
        jmp      SHORT G_M24415_IG05
-						;; size=8 bbWeight=1 PerfScore 6.00
+       align    [11 bytes for IG03]
+						;; size=19 bbWeight=1 PerfScore 6.00
 G_M24415_IG03:
        inc      ebx
-						;; size=2 bbWeight=70.69 PerfScore 17.67
+						;; size=2 bbWeight=70.60 PerfScore 17.65
 G_M24415_IG04:
        cmp      ebx, eax
        jge      SHORT G_M24415_IG08
-						;; size=4 bbWeight=105.11 PerfScore 131.39
+						;; size=4 bbWeight=105.04 PerfScore 131.30
 G_M24415_IG05:
        cmp      ebx, edi
        jae      G_M24415_IG16
@@ -835,25 +837,25 @@ G_M24415_IG05:
 						;; size=15 bbWeight=100 PerfScore 550.00
 G_M24415_IG06:
        jmp      SHORT G_M24415_IG08
-						;; size=2 bbWeight=29.31 PerfScore 58.61
+						;; size=2 bbWeight=29.40 PerfScore 58.81
 G_M24415_IG07:
        dec      eax
-						;; size=2 bbWeight=81.59 PerfScore 20.40
+						;; size=2 bbWeight=82.53 PerfScore 20.63
 G_M24415_IG08:
        cmp      eax, ebx
        jle      SHORT G_M24415_IG10
-						;; size=4 bbWeight=116.00 PerfScore 145.00
+						;; size=4 bbWeight=116.97 PerfScore 146.21
 G_M24415_IG09:
        cmp      eax, edi
        jae      G_M24415_IG16
        mov      ecx, eax
        cmp      dword ptr [rsi+4*rcx], edx
        jge      SHORT G_M24415_IG07
-						;; size=15 bbWeight=105.56 PerfScore 580.56
+						;; size=15 bbWeight=105.91 PerfScore 582.49
 G_M24415_IG10:
        cmp      ebx, eax
        jge      SHORT G_M24415_IG12
-						;; size=4 bbWeight=34.42 PerfScore 43.02
+						;; size=4 bbWeight=34.45 PerfScore 43.06
 G_M24415_IG11:
        cmp      ebx, edi
        jae      G_M24415_IG16
@@ -867,11 +869,11 @@ G_M24415_IG11:
        mov      ecx, eax
        mov      dword ptr [rsi+4*rcx], r10d
        jmp      SHORT G_M24415_IG04
-						;; size=37 bbWeight=24.09 PerfScore 270.99
+						;; size=37 bbWeight=24.15 PerfScore 271.70
 G_M24415_IG12:
        cmp      ebx, r8d
        je       SHORT G_M24415_IG14
-						;; size=5 bbWeight=44.75 PerfScore 55.94
+						;; size=5 bbWeight=44.92 PerfScore 56.15
 G_M24415_IG13:
        cmp      ebx, edi
        jae      SHORT G_M24415_IG16
@@ -882,7 +884,7 @@ G_M24415_IG13:
        jae      SHORT G_M24415_IG16
        mov      ecx, r8d
        mov      dword ptr [rsi+4*rcx], r10d
-						;; size=25 bbWeight=8.03 PerfScore 56.18
+						;; size=25 bbWeight=8.06 PerfScore 56.39
 G_M24415_IG14:
        cmp      ebx, edi
        ja       SHORT G_M24415_IG17
@@ -902,7 +904,7 @@ G_M24415_IG14:
        lea      rcx, [rsp+0x20]
        call     [Span.Sorting:TestQuickSortSpan(System.Span`1[int])]
        nop      
-						;; size=62 bbWeight=10.34 PerfScore 160.20
+						;; size=62 bbWeight=10.48 PerfScore 162.41
 G_M24415_IG15:
        add      rsp, 248
        pop      rbx
@@ -910,7 +912,7 @@ G_M24415_IG15:
        pop      rdi
        pop      rbp
        ret      
-						;; size=12 bbWeight=10.34 PerfScore 33.59
+						;; size=12 bbWeight=10.48 PerfScore 34.05
 G_M24415_IG16:
        call     CORINFO_HELP_RNGCHKFAIL
 						;; size=5 bbWeight=0 PerfScore 0.00
@@ -919,7 +921,7 @@ G_M24415_IG17:
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 286, prolog size 77, PerfScore 2173.66, instruction count 84, allocated bytes for code 286 (MethodHash=e254a0a0) for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1-OSR)
+; Total bytes of code 297, prolog size 77, PerfScore 2182.06, instruction count 85, allocated bytes for code 297 (MethodHash=e254a0a0) for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1-OSR)
 ; ============================================================
 
 ; Assembly listing for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1)
@@ -929,38 +931,38 @@ G_M24415_IG17:
 ; optimized using Dynamic PGO
 ; rsp based frame
 ; fully interruptible
-; with Dynamic PGO: edge weights are invalid, and fgCalledCount is 3398656
+; with Dynamic PGO: edge weights are invalid, and fgCalledCount is 4527104
 ; 2 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 arg0         [V00,T05] (  4,  8   )   byref  ->  rcx         ld-addr-op single-def
 ;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref    single-def
-;  V02 loc1         [V02,T10] (  4,  1.88)     int  ->  rcx         single-def
-;  V03 loc2         [V03,T00] ( 19, 38.09)     int  ->  rdi        
-;  V04 loc3         [V04,T01] ( 13, 37.02)     int  ->  rax        
-;  V05 loc4         [V05,T04] (  4, 10.80)     int  ->  rdx         single-def
-;  V06 loc5         [V06,T07] (  4,  3.10)     int  ->   r8        
+;  V02 loc1         [V02,T10] (  4,  1.90)     int  ->  rcx         single-def
+;  V03 loc2         [V03,T00] ( 19, 38.00)     int  ->  rdi        
+;  V04 loc3         [V04,T01] ( 13, 37.05)     int  ->  rax        
+;  V05 loc4         [V05,T04] (  4, 10.78)     int  ->  rdx         single-def
+;  V06 loc5         [V06,T07] (  4,  3.09)     int  ->   r8        
 ;  V07 OutArgs      [V07    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;* V08 tmp1         [V08    ] (  0,  0   )  struct (16) zero-ref    "spilled call-like call argument" <System.Span`1[int]>
 ;* V09 tmp2         [V09    ] (  0,  0   )  struct (16) zero-ref    "spilled call-like call argument" <System.Span`1[int]>
 ;* V10 tmp3         [V10    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
 ;* V11 tmp4         [V11    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
-;  V12 tmp5         [V12,T06] (  4,  3.99)     int  ->  rcx         "Inlining Arg"
+;  V12 tmp5         [V12,T06] (  4,  4.02)     int  ->  rcx         "Inlining Arg"
 ;* V13 tmp6         [V13    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.Span`1[int]>
-;  V14 tmp7         [V14,T08] (  2,  1.99)   byref  ->  rax         single-def "Inlining Arg"
-;  V15 tmp8         [V15,T09] (  2,  1.99)     int  ->  rsi         "Inlining Arg"
-;  V16 tmp9         [V16,T02] ( 13, 18.22)   byref  ->  rbx         single-def "field V00._reference (fldOffset=0x0)" P-INDEP
-;  V17 tmp10        [V17,T03] ( 11, 16.62)     int  ->  rsi         single-def "field V00._length (fldOffset=0x8)" P-INDEP
+;  V14 tmp7         [V14,T08] (  2,  2.01)   byref  ->  rax         single-def "Inlining Arg"
+;  V15 tmp8         [V15,T09] (  2,  2.01)     int  ->  rsi         "Inlining Arg"
+;  V16 tmp9         [V16,T02] ( 13, 18.19)   byref  ->  rbx         single-def "field V00._reference (fldOffset=0x0)" P-INDEP
+;  V17 tmp10        [V17,T03] ( 11, 16.60)     int  ->  rsi         single-def "field V00._length (fldOffset=0x8)" P-INDEP
 ;* V18 tmp11        [V18    ] (  0,  0   )   byref  ->  zero-ref    "field V08._reference (fldOffset=0x0)" P-INDEP
 ;* V19 tmp12        [V19    ] (  0,  0   )     int  ->  zero-ref    "field V08._length (fldOffset=0x8)" P-INDEP
 ;* V20 tmp13        [V20    ] (  0,  0   )   byref  ->  zero-ref    "field V09._reference (fldOffset=0x0)" P-INDEP
 ;* V21 tmp14        [V21    ] (  0,  0   )     int  ->  zero-ref    "field V09._length (fldOffset=0x8)" P-INDEP
 ;* V22 tmp15        [V22,T14] (  0,  0   )   byref  ->  zero-ref    single-def "field V10._reference (fldOffset=0x0)" P-INDEP
-;  V23 tmp16        [V23,T12] (  2,  1.00)     int  ->  rcx         single-def "field V10._length (fldOffset=0x8)" P-INDEP
-;  V24 tmp17        [V24,T11] (  2,  1.00)   byref  ->  rax         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
-;  V25 tmp18        [V25,T13] (  2,  1.00)     int  ->  rsi         single-def "field V13._length (fldOffset=0x8)" P-INDEP
+;  V23 tmp16        [V23,T12] (  2,  1.01)     int  ->  rcx         single-def "field V10._length (fldOffset=0x8)" P-INDEP
+;  V24 tmp17        [V24,T11] (  2,  1.01)   byref  ->  rax         single-def "field V13._reference (fldOffset=0x0)" P-INDEP
+;  V25 tmp18        [V25,T13] (  2,  1.01)     int  ->  rsi         single-def "field V13._length (fldOffset=0x8)" P-INDEP
 ;* V26 tmp19        [V26    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref" <System.Span`1[int]>
-;  V27 tmp20        [V27    ] (  6,  5.98)  struct (16) [rsp+0x20]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.Span`1[int]>
+;  V27 tmp20        [V27    ] (  6,  6.04)  struct (16) [rsp+0x20]  do-not-enreg[XSF] must-init addr-exposed "by-value struct argument" <System.Span`1[int]>
 ;
 ; Lcl frame size = 48
 
@@ -977,7 +979,7 @@ G_M24415_IG02:
        mov      rbx, bword ptr [rcx]
        mov      esi, dword ptr [rcx+0x08]
        cmp      esi, 1
-       jg       G_M24415_IG17
+       jg       G_M24415_IG18
 						;; size=15 bbWeight=1 PerfScore 5.25
 G_M24415_IG03:
        add      rsp, 48
@@ -985,65 +987,70 @@ G_M24415_IG03:
        pop      rsi
        pop      rdi
        ret      
-						;; size=8 bbWeight=0.50 PerfScore 1.38
+						;; size=8 bbWeight=0.50 PerfScore 1.37
 G_M24415_IG04:
-       dec      eax
-						;; size=2 bbWeight=3.94 PerfScore 0.98
+       align    [6 bytes for IG05]
+						;; size=6 bbWeight=0.50 PerfScore 0.12
 G_M24415_IG05:
-       cmp      eax, edi
-       jle      SHORT G_M24415_IG07
-						;; size=4 bbWeight=5.60 PerfScore 6.99
+       dec      eax
+						;; size=2 bbWeight=3.96 PerfScore 0.99
 G_M24415_IG06:
+       cmp      eax, edi
+       jle      SHORT G_M24415_IG08
+						;; size=4 bbWeight=5.62 PerfScore 7.02
+G_M24415_IG07:
        cmp      eax, esi
-       jae      G_M24415_IG18
+       jae      G_M24415_IG19
        mov      r8d, eax
        cmp      dword ptr [rbx+4*r8], edx
-       jge      SHORT G_M24415_IG04
-						;; size=17 bbWeight=5.09 PerfScore 28.00
-G_M24415_IG07:
-       cmp      edi, eax
-       jge      SHORT G_M24415_IG09
-						;; size=4 bbWeight=1.66 PerfScore 2.08
+       jge      SHORT G_M24415_IG05
+						;; size=17 bbWeight=5.08 PerfScore 27.97
 G_M24415_IG08:
+       cmp      edi, eax
+       jge      SHORT G_M24415_IG10
+						;; size=4 bbWeight=1.65 PerfScore 2.07
+G_M24415_IG09:
        cmp      edi, esi
-       jae      G_M24415_IG18
+       jae      G_M24415_IG19
        mov      r8d, edi
        mov      r8d, dword ptr [rbx+4*r8]
        mov      r10d, edi
        cmp      eax, esi
-       jae      G_M24415_IG18
+       jae      G_M24415_IG19
        mov      r9d, eax
        mov      r9d, dword ptr [rbx+4*r9]
        mov      dword ptr [rbx+4*r10], r9d
        mov      r10d, eax
        mov      dword ptr [rbx+4*r10], r8d
-						;; size=44 bbWeight=1.16 PerfScore 11.04
-G_M24415_IG09:
-       cmp      edi, eax
-       jge      SHORT G_M24415_IG13
-						;; size=4 bbWeight=2.16 PerfScore 2.70
+						;; size=44 bbWeight=1.16 PerfScore 11.01
 G_M24415_IG10:
        cmp      edi, eax
-       jge      SHORT G_M24415_IG05
-						;; size=4 bbWeight=5.07 PerfScore 6.34
+       jge      SHORT G_M24415_IG14
+       jmp      SHORT G_M24415_IG11
+       align    [3 bytes for IG11]
+						;; size=9 bbWeight=2.16 PerfScore 7.55
 G_M24415_IG11:
+       cmp      edi, eax
+       jge      SHORT G_M24415_IG06
+						;; size=4 bbWeight=5.04 PerfScore 6.30
+G_M24415_IG12:
        cmp      edi, esi
-       jae      G_M24415_IG18
+       jae      G_M24415_IG19
        mov      r8d, edi
        cmp      dword ptr [rbx+4*r8], edx
-       jg       SHORT G_M24415_IG05
-						;; size=17 bbWeight=4.82 PerfScore 26.53
-G_M24415_IG12:
-       inc      edi
-       jmp      SHORT G_M24415_IG10
-						;; size=4 bbWeight=3.41 PerfScore 7.67
+       jg       SHORT G_M24415_IG06
+						;; size=17 bbWeight=4.80 PerfScore 26.41
 G_M24415_IG13:
-       cmp      edi, ecx
-       je       SHORT G_M24415_IG15
-						;; size=4 bbWeight=0.50 PerfScore 0.62
+       inc      edi
+       jmp      SHORT G_M24415_IG11
+						;; size=4 bbWeight=3.39 PerfScore 7.63
 G_M24415_IG14:
+       cmp      edi, ecx
+       je       SHORT G_M24415_IG16
+						;; size=4 bbWeight=0.50 PerfScore 0.63
+G_M24415_IG15:
        cmp      edi, esi
-       jae      SHORT G_M24415_IG18
+       jae      SHORT G_M24415_IG19
        mov      r8d, edi
        mov      r8d, dword ptr [rbx+4*r8]
        mov      eax, edi
@@ -1051,9 +1058,9 @@ G_M24415_IG14:
        mov      ecx, ecx
        mov      dword ptr [rbx+4*rcx], r8d
 						;; size=22 bbWeight=0.39 PerfScore 2.32
-G_M24415_IG15:
+G_M24415_IG16:
        cmp      edi, esi
-       ja       SHORT G_M24415_IG19
+       ja       SHORT G_M24415_IG20
        mov      ecx, edi
        mov      bword ptr [rsp+0x20], rbx
        mov      dword ptr [rsp+0x28], ecx
@@ -1061,7 +1068,7 @@ G_M24415_IG15:
        call     [Span.Sorting:TestQuickSortSpan(System.Span`1[int])]
        lea      ecx, [rdi+0x01]
        cmp      ecx, esi
-       ja       SHORT G_M24415_IG19
+       ja       SHORT G_M24415_IG20
        mov      eax, ecx
        lea      rax, bword ptr [rbx+4*rax]
        sub      esi, ecx
@@ -1070,30 +1077,30 @@ G_M24415_IG15:
        lea      rcx, [rsp+0x20]
        call     [Span.Sorting:TestQuickSortSpan(System.Span`1[int])]
        nop      
-						;; size=62 bbWeight=0.50 PerfScore 7.73
-G_M24415_IG16:
+						;; size=62 bbWeight=0.50 PerfScore 7.80
+G_M24415_IG17:
        add      rsp, 48
        pop      rbx
        pop      rsi
        pop      rdi
        ret      
-						;; size=8 bbWeight=0.50 PerfScore 1.37
-G_M24415_IG17:
+						;; size=8 bbWeight=0.50 PerfScore 1.38
+G_M24415_IG18:
        lea      ecx, [rsi-0x01]
        xor      edi, edi
        mov      eax, ecx
        mov      edx, eax
        mov      edx, dword ptr [rbx+4*rdx]
-       jmp      G_M24415_IG09
-						;; size=17 bbWeight=0.50 PerfScore 2.62
-G_M24415_IG18:
+       jmp      G_M24415_IG10
+						;; size=17 bbWeight=0.50 PerfScore 2.64
+G_M24415_IG19:
        call     CORINFO_HELP_RNGCHKFAIL
 						;; size=5 bbWeight=0 PerfScore 0.00
-G_M24415_IG19:
+G_M24415_IG20:
        call     [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
        int3     
 						;; size=7 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 267, prolog size 19, PerfScore 145.83, instruction count 91, allocated bytes for code 267 (MethodHash=e254a0a0) for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1)
+; Total bytes of code 278, prolog size 19, PerfScore 151.75, instruction count 94, allocated bytes for code 278 (MethodHash=e254a0a0) for method Span.Sorting:TestQuickSortSpan(System.Span`1[int]) (Tier1)
 ; ============================================================
 

Span.Sorting.BubbleSortArray(Size: 512)

Hot functions:

  • (97.42%) Sorting.TestBubbleSortArray (Tier-1)
    • Has diffs
Diffs

[MicroBenchmarks]Sorting.TestBubbleSortArray(int32[])

 ; optimized using Dynamic PGO
 ; rsp based frame
 ; fully interruptible
-; with Dynamic PGO: edge weights are valid, and fgCalledCount is 10775.77
+; with Dynamic PGO: edge weights are valid, and fgCalledCount is 21884.61
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T01] ( 13,397.85)     ref  ->  rcx         class-hnd single-def <int[]>
-;  V01 loc0         [V01,T10] (  4, 98.13)   ubyte  ->   r8        
-;  V02 loc1         [V02,T07] (  4,195.85)     int  ->   r9        
+;  V00 arg0         [V00,T01] ( 13,395.90)     ref  ->  rcx         class-hnd single-def <int[]>
+;  V01 loc0         [V01,T10] (  4, 97.15)   ubyte  ->   r8        
+;  V02 loc1         [V02,T07] (  4,193.90)     int  ->   r9        
 ;  V03 loc2         [V03,T09] (  7,100.41)     int  ->  rdx        
-;  V04 loc3         [V04,T00] ( 13,497.84)     int  ->  rax        
+;  V04 loc3         [V04,T00] ( 13,496.88)     int  ->  rax        
 ;  V05 OutArgs      [V05    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;  V06 tmp1         [V06,T03] (  4,391.71)     int  ->  rsi         "Strict ordering of exceptions for Array store"
-;  V07 cse0         [V07,T04] (  3,294.75)     int  ->  rsi         "CSE - aggressive"
-;  V08 cse1         [V08,T12] (  3,  2.98)     int  ->  rsi         "CSE - aggressive"
-;  V09 cse2         [V09,T13] (  3,  2.98)     int  ->   r9         "CSE - aggressive"
-;  V10 cse3         [V10,T05] (  3,294.75)     int  ->   r9         "CSE - aggressive"
-;  V11 cse4         [V11,T06] (  3,294.75)    long  ->  rbx         "CSE - aggressive"
-;  V12 cse5         [V12,T14] (  3,  2.98)    long  ->  rbx         "CSE - aggressive"
+;  V06 tmp1         [V06,T03] (  4,387.80)     int  ->  rsi         "Strict ordering of exceptions for Array store"
+;  V07 cse0         [V07,T04] (  3,293.78)     int  ->  rsi         "CSE - aggressive"
+;  V08 cse1         [V08,T12] (  3,  2.97)     int  ->  rsi         "CSE - aggressive"
+;  V09 cse2         [V09,T13] (  3,  2.97)     int  ->   r9         "CSE - aggressive"
+;  V10 cse3         [V10,T05] (  3,293.78)     int  ->   r9         "CSE - aggressive"
+;  V11 cse4         [V11,T06] (  3,293.78)    long  ->  rbx         "CSE - aggressive"
+;  V12 cse5         [V12,T14] (  3,  2.97)    long  ->  rbx         "CSE - aggressive"
 ;  V13 cse6         [V13,T02] (  4,395.60)     int  ->  r11         "CSE - aggressive"
 ;  V14 cse7         [V14,T08] (  6,101.10)     int  ->  r10         hoist multi-def "CSE - aggressive"
 ;  V15 cse8         [V15,T11] (  4,  4.00)     int  ->  rax         "CSE - aggressive"
-;  V16 cse9         [V16,T15] (  3,  2.98)    long  ->  r11         "CSE - aggressive"
+;  V16 cse9         [V16,T15] (  3,  2.97)    long  ->  r11         "CSE - aggressive"
 ;
 ; Lcl frame size = 40
 
@@ -525,13 +525,13 @@ G_M37703_IG06:
        mov      esi, dword ptr [rcx+4*rbx+0x10]
        cmp      r9d, esi
        jle      SHORT G_M37703_IG08
-						;; size=29 bbWeight=98.90 PerfScore 741.74
+						;; size=29 bbWeight=98.90 PerfScore 741.75
 G_M37703_IG07:
        mov      r8d, eax
        mov      dword ptr [rcx+4*r8+0x10], esi
        mov      dword ptr [rcx+4*rbx+0x10], r9d
        mov      r8d, 1
-						;; size=19 bbWeight=96.95 PerfScore 242.37
+						;; size=19 bbWeight=95.98 PerfScore 239.95
 G_M37703_IG08:
        mov      eax, r11d
        cmp      eax, edx
@@ -560,7 +560,7 @@ G_M37703_IG12:
        mov      dword ptr [rcx+4*r11+0x10], esi
        mov      dword ptr [rcx+4*rbx+0x10], r9d
        mov      r8d, 1
-						;; size=16 bbWeight=0.98 PerfScore 2.20
+						;; size=16 bbWeight=0.97 PerfScore 2.18
 G_M37703_IG13:
        cmp      eax, edx
        jl       SHORT G_M37703_IG11
@@ -585,7 +585,7 @@ G_M37703_IG17:
        ret      
 						;; size=11 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 222, prolog size 44, PerfScore 1180.20, instruction count 66, allocated bytes for code 222 (MethodHash=e4a36cb8) for method Span.Sorting:TestBubbleSortArray(int[]) (Tier1-OSR)
+; Total bytes of code 222, prolog size 44, PerfScore 1177.77, instruction count 66, allocated bytes for code 222 (MethodHash=e4a36cb8) for method Span.Sorting:TestBubbleSortArray(int[]) (Tier1-OSR)
 ; ============================================================
 
 ; Assembly listing for method Span.Sorting:TestBubbleSortArray(int[]) (Tier1)
@@ -598,22 +598,22 @@ G_M37703_IG17:
 ; with Blended PGO: edge weights are valid, and fgCalledCount is 100
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T02] ( 11,1960758.02)     ref  ->  rcx         class-hnd single-def <int[]>
-;  V01 loc0         [V01,T10] (  4, 485234.21)   ubyte  ->   r8        
-;  V02 loc1         [V02,T07] (  4, 966468.42)     int  ->   r9        
-;  V03 loc2         [V03,T09] (  7, 501142.29)     int  ->  rdx        
-;  V04 loc3         [V04,T00] ( 12,2477778.84)     int  ->  r10        
+;  V00 arg0         [V00,T02] ( 11,1962257.78)     ref  ->  rcx         class-hnd single-def <int[]>
+;  V01 loc0         [V01,T10] (  4, 484257.34)   ubyte  ->   r8        
+;  V02 loc1         [V02,T07] (  4, 964514.67)     int  ->   r9        
+;  V03 loc2         [V03,T09] (  7, 502869.05)     int  ->  rdx        
+;  V04 loc3         [V04,T00] ( 12,2483726.26)     int  ->  r10        
 ;  V05 OutArgs      [V05    ] (  1,      1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;  V06 tmp1         [V06,T03] (  4,1932936.85)     int  ->  rsi         "Strict ordering of exceptions for Array store"
-;  V07 cse0         [V07,T04] (  3,1462745.60)     int  ->  rsi         "CSE - aggressive"
-;  V08 cse1         [V08,T12] (  3,  14775.21)     int  ->  rsi         "CSE - aggressive"
-;  V09 cse2         [V09,T13] (  3,  14775.21)     int  ->   r9         "CSE - aggressive"
-;  V10 cse3         [V10,T05] (  3,1462745.60)     int  ->   r9         "CSE - aggressive"
-;  V11 cse4         [V11,T06] (  3,1462745.60)    long  ->  rbx         "CSE - aggressive"
-;  V12 cse5         [V12,T14] (  3,  14775.21)    long  ->  rbx         "CSE - aggressive"
-;  V13 cse6         [V13,T01] (  4,1968687.46)     int  ->  r11         "CSE - aggressive"
-;  V14 cse7         [V14,T08] (  6, 503114.72)     int  ->  rax         "CSE - aggressive"
-;  V15 cse8         [V15,T11] (  4,  19885.73)     int  ->  r11         "CSE - aggressive"
+;  V06 tmp1         [V06,T03] (  4,1929029.35)     int  ->  rsi         "Strict ordering of exceptions for Array store"
+;  V07 cse0         [V07,T04] (  3,1465197.47)     int  ->  rsi         "CSE - aggressive"
+;  V08 cse1         [V08,T12] (  3,  14799.97)     int  ->  rsi         "CSE - aggressive"
+;  V09 cse2         [V09,T13] (  3,  14799.97)     int  ->   r9         "CSE - aggressive"
+;  V10 cse3         [V10,T05] (  3,1465197.47)     int  ->   r9         "CSE - aggressive"
+;  V11 cse4         [V11,T06] (  3,1465197.47)    long  ->  rbx         "CSE - aggressive"
+;  V12 cse5         [V12,T14] (  3,  14799.97)    long  ->  rbx         "CSE - aggressive"
+;  V13 cse6         [V13,T01] (  4,1975525.42)     int  ->  r11         "CSE - aggressive"
+;  V14 cse7         [V14,T08] (  6, 504858.75)     int  ->  rax         "CSE - aggressive"
+;  V15 cse8         [V15,T11] (  4,  19954.80)     int  ->  r11         "CSE - aggressive"
 ;
 ; Lcl frame size = 40
 
@@ -637,32 +637,34 @@ G_M37703_IG04:
        jl       SHORT G_M37703_IG09
        jmp      SHORT G_M37703_IG05
        align    [0 bytes for IG05]
-						;; size=6 bbWeight=997.99 PerfScore 3243.48
+						;; size=6 bbWeight=998.00 PerfScore 3243.50
 G_M37703_IG05:
        mov      r9d, r10d
        mov      r9d, dword ptr [rcx+4*r9+0x10]
        lea      r11d, [r10+0x01]
        cmp      r11d, eax
        jae      SHORT G_M37703_IG14
+		  ;; NOP compensation instructions of 4 bytes.
        mov      ebx, r11d
        mov      esi, dword ptr [rcx+4*rbx+0x10]
        cmp      r9d, esi
        jle      SHORT G_M37703_IG07
-						;; size=29 bbWeight=492171.87 PerfScore 3691289.00
+						;; size=33 bbWeight=493881.35 PerfScore 3704110.15
 G_M37703_IG06:
        mov      r8d, r10d
        mov      dword ptr [rcx+4*r8+0x10], esi
        mov      dword ptr [rcx+4*rbx+0x10], r9d
        mov      r8d, 1
-						;; size=19 bbWeight=478401.87 PerfScore 1196004.67
+						;; size=19 bbWeight=477434.76 PerfScore 1193586.91
 G_M37703_IG07:
        mov      r10d, r11d
        cmp      r10d, edx
        jl       SHORT G_M37703_IG05
-						;; size=8 bbWeight=492171.87 PerfScore 738257.80
+						;; size=8 bbWeight=493881.35 PerfScore 740822.03
 G_M37703_IG08:
        jmp      SHORT G_M37703_IG12
-						;; size=2 bbWeight=997.99 PerfScore 1995.99
+       align    [2 bytes for IG09]
+						;; size=4 bbWeight=998.00 PerfScore 1996.00
 G_M37703_IG09:
        cmp      r10d, eax
        jae      SHORT G_M37703_IG14
@@ -675,18 +677,18 @@ G_M37703_IG09:
        mov      esi, dword ptr [rcx+4*rbx+0x10]
        cmp      r9d, esi
        jle      SHORT G_M37703_IG11
-						;; size=34 bbWeight=4971.43 PerfScore 43500.04
+						;; size=34 bbWeight=4988.70 PerfScore 43651.13
 G_M37703_IG10:
        mov      r8d, r10d
        mov      dword ptr [rcx+4*r8+0x10], esi
        mov      dword ptr [rcx+4*rbx+0x10], r9d
        mov      r8d, 1
-						;; size=19 bbWeight=4832.34 PerfScore 12080.86
+						;; size=19 bbWeight=4822.57 PerfScore 12056.43
 G_M37703_IG11:
        mov      r10d, r11d
        cmp      r10d, edx
        jl       SHORT G_M37703_IG09
-						;; size=8 bbWeight=4971.43 PerfScore 7457.15
+						;; size=8 bbWeight=4988.70 PerfScore 7483.05
 G_M37703_IG12:
        dec      edx
        test     r8d, r8d
@@ -703,6 +705,6 @@ G_M37703_IG14:
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 175, prolog size 6, PerfScore 5697103.47, instruction count 57, allocated bytes for code 175 (MethodHash=e4a36cb8) for method Span.Sorting:TestBubbleSortArray(int[]) (Tier1)
+; Total bytes of code 181, prolog size 6, PerfScore 5710224.30, instruction count 58, allocated bytes for code 181 (MethodHash=e4a36cb8) for method Span.Sorting:TestBubbleSortArray(int[]) (Tier1)
 ; ============================================================
 

@jakobbotsch
Copy link
Member

jakobbotsch commented Aug 16, 2024

These look like they are fixed now (I can't close the issue)

@tannergooding
Copy link
Member

Closing per the above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants