-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[RISC-V] Optimize switch table #117048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] Optimize switch table #117048
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
No regressions. 2 less instructions, 1 less register per switch site. Diffs are based on 36,155 contexts (12,724 MinOpts, 23,431 FullOpts). Overall (-800 bytes)
MinOpts (-80 bytes)
FullOpts (-720 bytes)
Example diffslinux.riscv64.Checked.3.mch-8 (-3.08%) : 13600.dasm - JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)@@ -45,20 +45,18 @@ G_M14747_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
bltu ra, t6, G_M14747_IG07
;; size=12 bbWeight=1 PerfScore 4.50
G_M14747_IG06: ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- zext.w a0, a0
auipc t6, 0xD1FFAB1E
addi a1, t6, 0xD1FFAB1E
- slli a2, a0, 2
- add a1, a1, a2
+ sh2add.uw a1, a0, a1
lw a1, 0xD1FFAB1E(a1)
lui t6, 0xD1FFAB1E
addi t6, t6, 0xD1FFAB1E
- lui a2, 0xD1FFAB1E
- slli a2, a2, 20
- add a2, a2, t6
- add a1, a1, a2
+ lui a0, 0xD1FFAB1E
+ slli a0, a0, 20
+ add a0, a0, t6
+ add a1, a1, a0
jr a1
- ;; size=52 bbWeight=0.94 PerfScore 13.65
+ ;; size=44 bbWeight=0.94 PerfScore 12.71
G_M14747_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sext.w a0, zero
;; size=4 bbWeight=0.50 PerfScore 0.25
@@ -138,7 +136,7 @@ RWD00 dd G_M14747_IG18 - G_M14747_IG02
dd G_M14747_IG07 - G_M14747_IG02
-; Total bytes of code 260, prolog size 16, PerfScore 66.15, instruction count 49, allocated bytes for code 260 (MethodHash=8f51c664) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
+; Total bytes of code 252, prolog size 16, PerfScore 65.21, instruction count 47, allocated bytes for code 252 (MethodHash=8f51c664) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(float,float,byte):float (FullOpts)
; ============================================================
Unwind Info:
@@ -149,7 +147,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 65 (0x00041) Actual length = 260 (0x000104)
+ Function Length : 63 (0x0003f) Actual length = 252 (0x0000fc)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -8 (-2.53%) : 13619.dasm - JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)@@ -45,20 +45,18 @@ G_M2782_IG05: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
bltu ra, t6, G_M2782_IG08
;; size=12 bbWeight=1 PerfScore 4.50
G_M2782_IG06: ; bbWeight=0.94, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- zext.w a0, a0
auipc t6, 0xD1FFAB1E
addi a1, t6, 0xD1FFAB1E
- slli a2, a0, 2
- add a1, a1, a2
+ sh2add.uw a1, a0, a1
lw a1, 0xD1FFAB1E(a1)
lui t6, 0xD1FFAB1E
addi t6, t6, 0xD1FFAB1E
- lui a2, 0xD1FFAB1E
- slli a2, a2, 20
- add a2, a2, t6
- add a1, a1, a2
+ lui a0, 0xD1FFAB1E
+ slli a0, a0, 20
+ add a0, a0, t6
+ add a1, a1, a0
jr a1
- ;; size=52 bbWeight=0.94 PerfScore 13.65
+ ;; size=44 bbWeight=0.94 PerfScore 12.71
G_M2782_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
addiw a0, zero, 0xD1FFAB1E
slli a0, a0, 52
@@ -175,7 +173,7 @@ RWD152 dq 3F8111111110F30Ch
RWD160 dq BFC5555555555543h
-; Total bytes of code 316, prolog size 16, PerfScore 80.90, instruction count 59, allocated bytes for code 316 (MethodHash=0386f521) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
+; Total bytes of code 308, prolog size 16, PerfScore 79.96, instruction count 57, allocated bytes for code 308 (MethodHash=0386f521) for method JIT.HardwareIntrinsics.Arm.Helpers:TrigonometricMultiplyAddCoefficient(double,double,byte):double (FullOpts)
; ============================================================
Unwind Info:
@@ -186,7 +184,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 79 (0x0004f) Actual length = 316 (0x00013c)
+ Function Length : 77 (0x0004d) Actual length = 308 (0x000134)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -8 (-2.33%) : 23707.dasm - System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)@@ -47,20 +47,18 @@ G_M63141_IG03: ; bbWeight=0.50, gcrefRegs=0600 {s1 a0}, byrefRegs=0000 {}
bltu ra, t6, G_M63141_IG05
;; size=32 bbWeight=0.50 PerfScore 7.00
G_M63141_IG04: ; bbWeight=0.45, gcrefRegs=0200 {s1}, byrefRegs=0000 {}, byref
- zext.w a1, a1
auipc t6, 0xD1FFAB1E
addi a0, t6, 0xD1FFAB1E
- slli a2, a1, 2
- add a0, a0, a2
+ sh2add.uw a0, a1, a0
lw a0, 0xD1FFAB1E(a0)
lui t6, 0xD1FFAB1E
addi t6, t6, 0xD1FFAB1E
- lui a2, 0xD1FFAB1E
- slli a2, a2, 20
- add a2, a2, t6
- add a0, a0, a2
+ lui a1, 0xD1FFAB1E
+ slli a1, a1, 20
+ add a1, a1, t6
+ add a0, a0, a1
jr a0
- ;; size=52 bbWeight=0.45 PerfScore 6.53
+ ;; size=44 bbWeight=0.45 PerfScore 6.07
G_M63141_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcrRegs -[s1]
addi a0, zero, 0xD1FFAB1E
@@ -161,7 +159,7 @@ RWD00 dd G_M63141_IG17 - G_M63141_IG02
dd G_M63141_IG07 - G_M63141_IG02
-; Total bytes of code 344, prolog size 20, PerfScore 70.21, instruction count 69, allocated bytes for code 344 (MethodHash=33a2095a) for method System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
+; Total bytes of code 336, prolog size 20, PerfScore 69.76, instruction count 67, allocated bytes for code 336 (MethodHash=33a2095a) for method System.Xml.XPath.XNodeNavigator:get_NodeType():int:this (FullOpts)
; ============================================================
Unwind Info:
@@ -172,7 +170,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 86 (0x00056) Actual length = 344 (0x000158)
+ Function Length : 84 (0x00054) Actual length = 336 (0x000150)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 36016.dasm - (dynamicClass):IL_STUB_PInvoke(ptr,byref):int (FullOpts)No diffs found? +0 (0.00%) : 36000.dasm - (dynamicClass):IL_STUB_PInvoke(nint,int,byref):int (FullOpts)No diffs found? +0 (0.00%) : 35936.dasm - Microsoft.Diagnostics.Tracing.TraceEventDispatcher:Dispose(bool):this (FullOpts)No diffs found? DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
RISC-V Release-CLR-QEMU: 9085 / 9116 (99.66%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9086 / 9116 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283633 / 284704 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 510214 / 511957 (99.66%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
RISC-V Release-CLR-QEMU: 9078 / 9108 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9079 / 9109 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283909 / 284982 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 509103 / 510859 (99.66%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
@jakobbotsch, PTAL if we need to take this for .NET10 or can push out to .NET 11. |
Regression after dotnet#117048, using `idxReg` as temp results in early clobber if it's used later
Regression after dotnet#117048, using `idxReg` as temp clobbered the value if it was used later
Regression after #117048, using `idxReg` as temp clobbered the value if it was used later
Don't allocate temp register, use
sh2add
.Part of #84834, cc @dotnet/samsung