Skip to content

Conversation

tomeksowi
Copy link
Member

Don't allocate temp register, use sh2add.

Part of #84834, cc @dotnet/samsung

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 26, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 26, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tomeksowi
Copy link
Member Author

No regressions. 2 less instructions, 1 less register per switch site.

Diffs are based on 36,155 contexts (12,724 MinOpts, 23,431 FullOpts).

Overall (-800 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.3.mch 44,453,752 -800 -0.01%
MinOpts (-80 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.3.mch 26,497,332 -80 -0.01%
FullOpts (-720 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.3.mch 17,956,420 -720 -0.01%
Example diffs
linux.riscv64.Checked.3.mch
-8 (-3.08%) : 13600.dasm - JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
@@ -45,20 +45,18 @@ G_M14747_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             bltu           ra, t6, G_M14747_IG07
 						;; size=12 bbWeight=1 PerfScore 4.50
 G_M14747_IG06:        ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            zext.w         a0, a0
             auipc          t6, 0xD1FFAB1E
             addi           a1, t6, 0xD1FFAB1E
-            slli           a2, a0, 2
-            add            a1, a1, a2
+            sh2add.uw      a1, a0, a1
             lw             a1, 0xD1FFAB1E(a1)
             lui            t6, 0xD1FFAB1E
             addi           t6, t6, 0xD1FFAB1E
-            lui            a2, 0xD1FFAB1E
-            slli           a2, a2, 20
-            add            a2, a2, t6
-            add            a1, a1, a2
+            lui            a0, 0xD1FFAB1E
+            slli           a0, a0, 20
+            add            a0, a0, t6
+            add            a1, a1, a0
             jr             a1
-						;; size=52 bbWeight=0.94 PerfScore 13.65
+						;; size=44 bbWeight=0.94 PerfScore 12.71
 G_M14747_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             sext.w         a0, zero
 						;; size=4 bbWeight=0.50 PerfScore 0.25
@@ -138,7 +136,7 @@ RWD00  	dd	G_M14747_IG18 - G_M14747_IG02
        	dd	G_M14747_IG07 - G_M14747_IG02
 
 
-; Total bytes of code 260, prolog size 16, PerfScore 66.15, instruction count 49, allocated bytes for code 260 (MethodHash=8f51c664) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
+; Total bytes of code 252, prolog size 16, PerfScore 65.21, instruction count 47, allocated bytes for code 252 (MethodHash=8f51c664) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -149,7 +147,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 65 (0x00041) Actual length = 260 (0x000104)
+  Function Length   : 63 (0x0003f) Actual length = 252 (0x0000fc)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-2.53%) : 13619.dasm - JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
@@ -45,20 +45,18 @@ G_M2782_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             bltu           ra, t6, G_M2782_IG08
 						;; size=12 bbWeight=1 PerfScore 4.50
 G_M2782_IG06:        ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            zext.w         a0, a0
             auipc          t6, 0xD1FFAB1E
             addi           a1, t6, 0xD1FFAB1E
-            slli           a2, a0, 2
-            add            a1, a1, a2
+            sh2add.uw      a1, a0, a1
             lw             a1, 0xD1FFAB1E(a1)
             lui            t6, 0xD1FFAB1E
             addi           t6, t6, 0xD1FFAB1E
-            lui            a2, 0xD1FFAB1E
-            slli           a2, a2, 20
-            add            a2, a2, t6
-            add            a1, a1, a2
+            lui            a0, 0xD1FFAB1E
+            slli           a0, a0, 20
+            add            a0, a0, t6
+            add            a1, a1, a0
             jr             a1
-						;; size=52 bbWeight=0.94 PerfScore 13.65
+						;; size=44 bbWeight=0.94 PerfScore 12.71
 G_M2782_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             addiw          a0, zero, 0xD1FFAB1E
             slli           a0, a0, 52
@@ -175,7 +173,7 @@ RWD152 	dq	3F8111111110F30Ch
 RWD160 	dq	BFC5555555555543h
 
 
-; Total bytes of code 316, prolog size 16, PerfScore 80.90, instruction count 59, allocated bytes for code 316 (MethodHash=0386f521) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
+; Total bytes of code 308, prolog size 16, PerfScore 79.96, instruction count 57, allocated bytes for code 308 (MethodHash=0386f521) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -186,7 +184,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 79 (0x0004f) Actual length = 316 (0x00013c)
+  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-2.33%) : 23707.dasm - System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
@@ -47,20 +47,18 @@ G_M63141_IG03:        ; bbWeight=0.50, gcrefRegs=0600 {s1 a0}, byrefRegs=0000 {}
             bltu           ra, t6, G_M63141_IG05
 						;; size=32 bbWeight=0.50 PerfScore 7.00
 G_M63141_IG04:        ; bbWeight=0.45, gcrefRegs=0200 {s1}, byrefRegs=0000 {}, byref
-            zext.w         a1, a1
             auipc          t6, 0xD1FFAB1E
             addi           a0, t6, 0xD1FFAB1E
-            slli           a2, a1, 2
-            add            a0, a0, a2
+            sh2add.uw      a0, a1, a0
             lw             a0, 0xD1FFAB1E(a0)
             lui            t6, 0xD1FFAB1E
             addi           t6, t6, 0xD1FFAB1E
-            lui            a2, 0xD1FFAB1E
-            slli           a2, a2, 20
-            add            a2, a2, t6
-            add            a0, a0, a2
+            lui            a1, 0xD1FFAB1E
+            slli           a1, a1, 20
+            add            a1, a1, t6
+            add            a0, a0, a1
             jr             a0
-						;; size=52 bbWeight=0.45 PerfScore 6.53
+						;; size=44 bbWeight=0.45 PerfScore 6.07
 G_M63141_IG05:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcrRegs -[s1]
             addi           a0, zero, 0xD1FFAB1E
@@ -161,7 +159,7 @@ RWD00  	dd	G_M63141_IG17 - G_M63141_IG02
        	dd	G_M63141_IG07 - G_M63141_IG02
 
 
-; Total bytes of code 344, prolog size 20, PerfScore 70.21, instruction count 69, allocated bytes for code 344 (MethodHash=33a2095a) for method System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
+; Total bytes of code 336, prolog size 20, PerfScore 69.76, instruction count 67, allocated bytes for code 336 (MethodHash=33a2095a) for method System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -172,7 +170,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 86 (0x00056) Actual length = 344 (0x000158)
+  Function Length   : 84 (0x00054) Actual length = 336 (0x000150)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 36016.dasm - (dynamicClass):IL_STUB_PInvoke(ptr,byref):int (FullOpts)

No diffs found?

+0 (0.00%) : 36000.dasm - (dynamicClass):IL_STUB_PInvoke(nint,int,byref):int (FullOpts)

No diffs found?

+0 (0.00%) : 35936.dasm - Microsoft.Diagnostics.Tracing.TraceEventDispatcher:Dispose(bool):this (FullOpts)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
linux.riscv64.Checked.3.mch 2,209 88 0 2,121 -800 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
linux.riscv64.Checked.3.mch 2,209 88 0 2,121 -0.28% 0.00% -0.0010%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
linux.riscv64.Checked.3.mch 36,155 12,724 23,431 0 (0.00%) 0 (0.00%)

jit-analyze output

@risc-vv
Copy link

risc-vv commented Jun 26, 2025

RISC-V Release-CLR-QEMU: 9085 / 9116 (99.66%)
=======================
      passed: 9085
      failed: 2
     skipped: 600
      killed: 29
------------------------
 TOTAL tests: 9716
VIRTUAL time: 35h 15min 10s 40ms
   REAL time: 36min 5s 631ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9086 / 9116 (99.67%)
=======================
      passed: 9086
      failed: 2
     skipped: 600
      killed: 28
------------------------
 TOTAL tests: 9716
VIRTUAL time: 10h 44min 2s 950ms
   REAL time: 43min 33s 811ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283633 / 284704 (99.62%)
=======================
      passed: 283633
      failed: 1065
     skipped: 39
      killed: 6
------------------------
 TOTAL tests: 284743
VIRTUAL time: 30h 58min 43s 820ms
   REAL time: 1h 12min 1s 412ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 510214 / 511957 (99.66%)
=======================
      passed: 510214
      failed: 1735
     skipped: 39
      killed: 8
------------------------
 TOTAL tests: 511996
VIRTUAL time: 21h 56min 18s 724ms
   REAL time: 2h 15min 0s 159ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: cff502802d17caf12bc33629e8313fec47914644
CI: 01a69d638efe9b8b18c2888449be98b4a6cde211
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jul 1, 2025

RISC-V Release-CLR-QEMU: 9078 / 9108 (99.67%)
=======================
      passed: 9078
      failed: 2
     skipped: 600
      killed: 28
------------------------
 TOTAL tests: 9708
VIRTUAL time: 35h 11min 27s 504ms
   REAL time: 36min 0s 327ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9079 / 9109 (99.67%)
=======================
      passed: 9079
      failed: 2
     skipped: 600
      killed: 28
------------------------
 TOTAL tests: 9709
VIRTUAL time: 10h 30min 49s 974ms
   REAL time: 42min 44s 769ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283909 / 284982 (99.62%)
=======================
      passed: 283909
      failed: 1068
     skipped: 39
      killed: 5
------------------------
 TOTAL tests: 285021
VIRTUAL time: 29h 57min 4s 636ms
   REAL time: 1h 11min 37s 490ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 509103 / 510859 (99.66%)
=======================
      passed: 509103
      failed: 1747
     skipped: 39
      killed: 9
------------------------
 TOTAL tests: 510898
VIRTUAL time: 22h 59min 4s 359ms
   REAL time: 2h 11min 59s 611ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: cff502802d17caf12bc33629e8313fec47914644
CI: 8669404f3505d3b0b74fcc8f425a1ff735d746cc
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@JulieLeeMSFT JulieLeeMSFT requested a review from jakobbotsch July 28, 2025 13:36
@JulieLeeMSFT
Copy link
Member

@jakobbotsch, PTAL if we need to take this for .NET10 or can push out to .NET 11.

@jakobbotsch jakobbotsch merged commit cd63792 into dotnet:main Aug 4, 2025
112 of 120 checks passed
tomeksowi added a commit to tomeksowi/runtime that referenced this pull request Aug 6, 2025
Regression after dotnet#117048, using `idxReg` as temp results in early clobber if it's used later
tomeksowi added a commit to tomeksowi/runtime that referenced this pull request Aug 6, 2025
Regression after dotnet#117048, using `idxReg` as temp clobbered the value if it was used later
jakobbotsch pushed a commit that referenced this pull request Aug 6, 2025
Regression after #117048, using `idxReg` as temp clobbered the value if it was used later
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants