-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ldp/stp with SIMD registers on Arm64 #84135
Conversation
Use pairwise load/stores for 1. the instructions using SIMD registers ``` ldr q1, [x0, #0x20] ldr q2, [x0, #0x30] => ldp q1, q2, [x0, #0x20] ``` 2. the instructions using base and base plus immediate offset format ``` ldr w1, [x20] ldr w2, [x20, #0x04] => ldp w1, w2, [x20] ldr q1, [x0] ldr q2, [x0, #0x10] => ldp q1, q2, [x0] ```
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsUse pairwise load/stores for
(Fixes #83773)
(Fixes #35133)
|
Not sure about the potential GC holes or how to confirm that yet. The following spmi asmdiffs summary shows multiple matches as expected. Diffs are based on 1,469,735 contexts (402,470 MinOpts, 1,067,265 FullOpts). MISSED contexts: 3 (0.00%) Overall (-769,712 bytes)
MinOpts (+0 bytes)
FullOpts (-769,712 bytes)
Example diffsbenchmarks.run.linux.arm64.checked.mch-4 (-16.67%) : 2709.dasm - System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this@@ -20,15 +20,14 @@ G_M30325_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M30325_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str x1, [x0]
- str x2, [x0, #0x08]
- ;; size=8 bbWeight=1 PerfScore 2.00
+ stp x1, x2, [x0]
+ ;; size=4 bbWeight=1 PerfScore 1.00
G_M30325_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
; ============================================================
Unwind Info:
@@ -39,7 +38,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 925.dasm - System.Reflection.Emit.OpCode:.ctor(int,int):this@@ -19,15 +19,14 @@ G_M55742_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M55742_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str w1, [x0]
- str w2, [x0, #0x04]
- ;; size=8 bbWeight=1 PerfScore 2.00
+ stp w1, w2, [x0]
+ ;; size=4 bbWeight=1 PerfScore 1.00
G_M55742_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=9e0a2641) for method System.Reflection.Emit.OpCode:.ctor(int,int):this
+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=9e0a2641) for method System.Reflection.Emit.OpCode:.ctor(int,int):this
; ============================================================
Unwind Info:
@@ -38,7 +37,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-16.67%) : 25440.dasm - System.Numerics.Tests.Perf_Matrix4x4:CreateRotationXWithCenterBenchmark():System.Numerics.Matrix4x4:this@@ -38,11 +38,9 @@ G_M63428_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0100 {x8}, byre
ldr q17, [@RWD16]
ldr q18, [@RWD32]
ldr q19, [@RWD48]
- str q19, [x8]
- str q16, [x8, #0x10]
- str q17, [x8, #0x20]
- str q18, [x8, #0x30]
- ;; size=32 bbWeight=1 PerfScore 12.00
+ stp q19, q16, [x8]
+ stp q17, q18, [x8, #0x20]
+ ;; size=24 bbWeight=1 PerfScore 10.00
G_M63428_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
@@ -53,7 +51,7 @@ RWD32 dq 0000000000000000h, 3F80000000000000h
RWD48 dq 000000003F800000h, 0000000000000000h
-; Total bytes of code 48, prolog size 8, PerfScore 20.30, instruction count 12, allocated bytes for code 48 (MethodHash=0a46083b) for method System.Numerics.Tests.Perf_Matrix4x4:CreateRotationXWithCenterBenchmark():System.Numerics.Matrix4x4:this
+; Total bytes of code 40, prolog size 8, PerfScore 17.50, instruction count 10, allocated bytes for code 40 (MethodHash=0a46083b) for method System.Numerics.Tests.Perf_Matrix4x4:CreateRotationXWithCenterBenchmark():System.Numerics.Matrix4x4:this
; ============================================================
Unwind Info:
@@ -64,7 +62,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 12 (0x0000c) Actual length = 48 (0x000030)
+ Function Length : 10 (0x0000a) Actual length = 40 (0x000028)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 21886.dasm - System.Xml.XmlBinaryWriter:SetOutput(System.IO.Stream,System.Xml.IXmlDictionary,System.Xml.XmlBinaryWriterSession,bool):this@@ -86,8 +86,8 @@ G_M34423_IG04: ; bbWeight=1, gcrefRegs=780000 {x19 x20 x21 x22}, byrefReg
strb wzr, [x0, #0x26]
add x14, x0, #64
; byrRegs +[x14]
- str xzr, [x14]
- stp xzr, xzr, [x14, #0x08]
+ stp xzr, xzr, [x14]
+ str xzr, [x14, #0x10]
movn w14, #0
; byrRegs -[x14]
str w14, [x0, #0x38]
+0 (0.00%) : 20223.dasm - System.Text.Json.JsonDocument:TryGetValue(int,byref):bool:this@@ -150,8 +150,8 @@ G_M19143_IG05: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=300000 {x20
blr x1
cmp w22, #12
blt G_M19143_IG15
- ldr w22, [x21]
- ldp w23, w1, [x21, #0x04]
+ ldp w22, w23, [x21]
+ ldr w1, [x21, #0x08]
lsr w1, w1, #28
uxtb w1, w1
cmp w1, #8
+0 (0.00%) : 32319.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:CompileSynthesizedMethods(Microsoft.CodeAnalysis.CSharp.TypeCompilationState):this@@ -188,10 +188,10 @@ G_M26982_IG06: ; bbWeight=4, gcVars=00000000000000400000000401000010 {V00
add x14, x14, #16
add x14, x15, x14
; byrRegs +[x14]
- ldr x20, [x14]
- ; gcrRegs +[x20]
- ldp x21, x22, [x14, #0x08]
- ; gcrRegs +[x21-x22]
+ ldp x20, x21, [x14]
+ ; gcrRegs +[x20-x21]
+ ldr x22, [x14, #0x10]
+ ; gcrRegs +[x22]
add x14, x4, #40
mov x15, x22
bl CORINFO_HELP_ASSIGN_REF
libraries_tests.pmi.linux.arm64.checked.mch-8 (-20.00%) : 128351.dasm - Microsoft.CodeAnalysis.Checksum+HashData:FromPointer(ulong):Microsoft.CodeAnalysis.Checksum+HashData@@ -26,19 +26,17 @@ G_M44009_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M44009_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0100 {x8}, byref
; byrRegs +[x8]
- ldr x1, [x0]
- ldr x2, [x0, #0x08]
+ ldp x1, x2, [x0]
ldr w0, [x0, #0x10]
- str x1, [x8]
- str x2, [x8, #0x08]
+ stp x1, x2, [x8]
str w0, [x8, #0x10]
- ;; size=24 bbWeight=1 PerfScore 12.00
+ ;; size=16 bbWeight=1 PerfScore 9.00
G_M44009_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 40, prolog size 8, PerfScore 19.50, instruction count 10, allocated bytes for code 40 (MethodHash=a99b5416) for method Microsoft.CodeAnalysis.Checksum+HashData:FromPointer(ulong):Microsoft.CodeAnalysis.Checksum+HashData
+; Total bytes of code 32, prolog size 8, PerfScore 15.70, instruction count 8, allocated bytes for code 32 (MethodHash=a99b5416) for method Microsoft.CodeAnalysis.Checksum+HashData:FromPointer(ulong):Microsoft.CodeAnalysis.Checksum+HashData
; ============================================================
Unwind Info:
@@ -49,7 +47,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 10 (0x0000a) Actual length = 40 (0x000028)
+ Function Length : 8 (0x00008) Actual length = 32 (0x000020)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 3265.dasm - System.Text.Json.Serialization.Tests.Point_2D_Struct_WithMultipleAttributes_OneNonPublic:.ctor(int):this@@ -19,15 +19,14 @@ G_M61621_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M61621_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str w1, [x0]
- str wzr, [x0, #0x04]
- ;; size=8 bbWeight=1 PerfScore 2.00
+ stp w1, wzr, [x0]
+ ;; size=4 bbWeight=1 PerfScore 1.00
G_M61621_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=8aeb0f4a) for method System.Text.Json.Serialization.Tests.Point_2D_Struct_WithMultipleAttributes_OneNonPublic:.ctor(int):this
+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=8aeb0f4a) for method System.Text.Json.Serialization.Tests.Point_2D_Struct_WithMultipleAttributes_OneNonPublic:.ctor(int):this
; ============================================================
Unwind Info:
@@ -38,7 +37,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 156801.dasm - SerializationTestTypes.KeyValue`2[long,System.Nullable`1[int]]:.ctor(long,System.Nullable`1[int]):this@@ -19,15 +19,14 @@ G_M24332_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M24332_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str x1, [x0]
- str x2, [x0, #0x08]
- ;; size=8 bbWeight=1 PerfScore 2.00
+ stp x1, x2, [x0]
+ ;; size=4 bbWeight=1 PerfScore 1.00
G_M24332_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=e0c2a0f3) for method SerializationTestTypes.KeyValue`2[long,System.Nullable`1[int]]:.ctor(long,System.Nullable`1[int]):this
+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=e0c2a0f3) for method SerializationTestTypes.KeyValue`2[long,System.Nullable`1[int]]:.ctor(long,System.Nullable`1[int]):this
; ============================================================
Unwind Info:
@@ -38,7 +37,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 166976.dasm - Microsoft.CodeQuality.Analyzers.ApiDesignGuidelines.IdentifiersShouldHaveCorrectSuffixAnalyzer:.ctor():this@@ -40,10 +40,10 @@ G_M60409_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movz x0, #0xD1FFAB1E // data for <unknown class>:<unknown field>
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
- ldr x20, [x0]
- ; gcrRegs +[x20]
- ldp x21, x22, [x0, #0x08]
- ; gcrRegs +[x21-x22]
+ ldp x20, x21, [x0]
+ ; gcrRegs +[x20-x21]
+ ldr x22, [x0, #0x10]
+ ; gcrRegs +[x22]
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
+0 (0.00%) : 210048.dasm - System.Net.Http.Tests.StreamToStreamCopyTest+d__5:MoveNext():this@@ -605,8 +605,8 @@ G_M59861_IG08: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=80000 {x19}, by
G_M59861_IG09: ; bbWeight=1.00, gcrefRegs=0000 {}, byrefRegs=80000 {x19}, byref, isz
movn w14, #1
str w14, [x19, #0x18]
- str xzr, [x19]
- stp xzr, xzr, [x19, #0x08]
+ stp xzr, xzr, [x19]
+ str xzr, [x19, #0x10]
add x14, x19, #32
; byrRegs +[x14]
ldr x15, [x14]
@@ -802,8 +802,8 @@ G_M59861_IG21: ; bbWeight=0, gcVars=0000000000000001 {V00}, gcrefRegs=000
ldr x19, [fp, #0x10] // [V00 this]
; byrRegs +[x19]
str w0, [x19, #0x18]
- str xzr, [x19]
- stp xzr, xzr, [x19, #0x08]
+ stp xzr, xzr, [x19]
+ str xzr, [x19, #0x10]
add x0, x19, #32
; byrRegs +[x0]
movz x2, #0xD1FFAB1E // code for <unknown method>
+0 (0.00%) : 239744.dasm - System.Security.Cryptography.Pkcs.Tests.CryptographicAttributeObjectCollectionTests:CopyExceptions()@@ -142,10 +142,10 @@ G_M45722_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
movz x0, #0xD1FFAB1E // data for <unknown class>:<unknown field>
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
- ldr x20, [x0]
- ; gcrRegs +[x20]
- ldp x21, x22, [x0, #0x08]
- ; gcrRegs +[x21-x22]
+ ldp x20, x21, [x0]
+ ; gcrRegs +[x20-x21]
+ ldr x22, [x0, #0x10]
+ ; gcrRegs +[x22]
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
libraries.crossgen2.linux.arm64.checked.mch-8 (-25.00%) : 34883.dasm - System.Numerics.Quaternion:.ctor(float,float,float,float):this@@ -22,17 +22,15 @@ G_M64168_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M64168_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str s0, [x0]
- str s1, [x0, #0x04]
- str s2, [x0, #0x08]
- str s3, [x0, #0x0C]
- ;; size=16 bbWeight=1 PerfScore 4.00
+ stp s0, s1, [x0]
+ stp s2, s3, [x0, #0x08]
+ ;; size=8 bbWeight=1 PerfScore 2.00
G_M64168_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=c0090557) for method System.Numerics.Quaternion:.ctor(float,float,float,float):this
+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=c0090557) for method System.Numerics.Quaternion:.ctor(float,float,float,float):this
; ============================================================
Unwind Info:
@@ -43,7 +41,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 8 (0x00008) Actual length = 32 (0x000020)
+ Function Length : 6 (0x00006) Actual length = 24 (0x000018)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-25.00%) : 169952.dasm - System.Drawing.RectangleF:.ctor(float,float,float,float):this@@ -22,17 +22,15 @@ G_M45207_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M45207_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str s0, [x0]
- str s1, [x0, #0x04]
- str s2, [x0, #0x08]
- str s3, [x0, #0x0C]
- ;; size=16 bbWeight=1 PerfScore 4.00
+ stp s0, s1, [x0]
+ stp s2, s3, [x0, #0x08]
+ ;; size=8 bbWeight=1 PerfScore 2.00
G_M45207_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this
+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this
; ============================================================
Unwind Info:
@@ -43,7 +41,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 8 (0x00008) Actual length = 32 (0x000020)
+ Function Length : 6 (0x00006) Actual length = 24 (0x000018)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-25.00%) : 169953.dasm - System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this@@ -25,17 +25,15 @@ G_M36094_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M36094_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str s0, [x0]
- str s1, [x0, #0x04]
- str s2, [x0, #0x08]
- str s3, [x0, #0x0C]
- ;; size=16 bbWeight=1 PerfScore 4.00
+ stp s0, s1, [x0]
+ stp s2, s3, [x0, #0x08]
+ ;; size=8 bbWeight=1 PerfScore 2.00
G_M36094_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this
+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this
; ============================================================
Unwind Info:
@@ -46,7 +44,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 8 (0x00008) Actual length = 32 (0x000020)
+ Function Length : 6 (0x00006) Actual length = 24 (0x000018)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 65343.dasm - Microsoft.CodeAnalysis.CSharp.ForEachStatementInfo:GetHashCode():int:this@@ -58,12 +58,12 @@ G_M41916_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
; byrRegs +[x19]
;; size=28 bbWeight=1 PerfScore 6.00
G_M41916_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=80000 {x19}, byref
- ldr x20, [x19]
- ; gcrRegs +[x20]
- ldp x21, x22, [x19, #0x08]
- ; gcrRegs +[x21-x22]
- ldp x23, x24, [x19, #0x18]
- ; gcrRegs +[x23-x24]
+ ldp x20, x21, [x19]
+ ; gcrRegs +[x20-x21]
+ ldp x22, x23, [x19, #0x10]
+ ; gcrRegs +[x22-x23]
+ ldr x24, [x19, #0x20]
+ ; gcrRegs +[x24]
;; size=12 bbWeight=1 PerfScore 11.00
G_M41916_IG03: ; bbWeight=1, nogc, extend
add x0, x19, #24
+0 (0.00%) : 136256.dasm - System.Text.RegularExpressions.RegexParser:.ctor(System.String,int,System.Globalization.CultureInfo,System.Collections.Hashtable,int,System.Collections.Hashtable,System.Span`1[int]):this@@ -96,9 +96,9 @@ G_M19169_IG04: ; bbWeight=1, extend
blr x12
ldr x12, [x13], #0x08
str x12, [x14], #0x08
- str xzr, [x0]
- stp xzr, xzr, [x0, #0x08]
- stp xzr, xzr, [x0, #0x18]
+ stp xzr, xzr, [x0]
+ stp xzr, xzr, [x0, #0x10]
+ str xzr, [x0, #0x20]
str wzr, [x0, #0x58]
stp wzr, wzr, [x0, #0x60]
str wzr, [x0, #0x68]
+0 (0.00%) : 173248.dasm - ILCompiler.Diagnostics.PerfMapWriter+PerfmapTokensForTarget:Equals(System.Object):bool:this@@ -65,9 +65,9 @@ G_M34908_IG04: ; bbWeight=0.25, gcrefRegs=80000 {x19}, byrefRegs=100000 {
G_M34908_IG05: ; bbWeight=0.50, gcrefRegs=80000 {x19}, byrefRegs=100000 {x20}, byref, isz
add x11, x19, #8
; byrRegs +[x11]
- ldr w19, [x11]
+ ldp w19, w21, [x11]
; gcrRegs -[x19]
- ldp w21, w22, [x11, #0x04]
+ ldr w22, [x11, #0x08]
adrp x11, [HIGH RELOC #0xD1FFAB1E] // function address
; byrRegs -[x11]
add x11, x11, [LOW RELOC #0xD1FFAB1E]
libraries.pmi.linux.arm64.checked.mch-8 (-25.00%) : 250607.dasm - System.Drawing.RectangleF:.ctor(float,float,float,float):this@@ -21,17 +21,15 @@ G_M45207_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M45207_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str s0, [x0]
- str s1, [x0, #0x04]
- str s2, [x0, #0x08]
- str s3, [x0, #0x0C]
- ;; size=16 bbWeight=1 PerfScore 4.00
+ stp s0, s1, [x0]
+ stp s2, s3, [x0, #0x08]
+ ;; size=8 bbWeight=1 PerfScore 2.00
G_M45207_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this
+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this
; ============================================================
Unwind Info:
@@ -42,7 +40,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 8 (0x00008) Actual length = 32 (0x000020)
+ Function Length : 6 (0x00006) Actual length = 24 (0x000018)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-25.00%) : 250608.dasm - System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this@@ -24,17 +24,15 @@ G_M36094_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M36094_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str s0, [x0]
- str s1, [x0, #0x04]
- str s2, [x0, #0x08]
- str s3, [x0, #0x0C]
- ;; size=16 bbWeight=1 PerfScore 4.00
+ stp s0, s1, [x0]
+ stp s2, s3, [x0, #0x08]
+ ;; size=8 bbWeight=1 PerfScore 2.00
G_M36094_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this
+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this
; ============================================================
Unwind Info:
@@ -45,7 +43,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 8 (0x00008) Actual length = 32 (0x000020)
+ Function Length : 6 (0x00006) Actual length = 24 (0x000018)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-20.00%) : 246149.dasm - System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])@@ -29,20 +29,18 @@ G_M11325_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M11325_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0003 {x0 x1}, byref
; byrRegs +[x0-x1]
- ldr x2, [x0]
- ldr x0, [x0, #0x08]
+ ldp x2, x0, [x0]
; byrRegs -[x0]
rev x2, x2
rev x0, x0
- str x0, [x1]
- str x2, [x1, #0x08]
- ;; size=24 bbWeight=1 PerfScore 9.00
+ stp x0, x2, [x1]
+ ;; size=16 bbWeight=1 PerfScore 6.00
G_M11325_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 40, prolog size 8, PerfScore 16.50, instruction count 10, allocated bytes for code 40 (MethodHash=ae9bd3c2) for method System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])
+; Total bytes of code 32, prolog size 8, PerfScore 12.70, instruction count 8, allocated bytes for code 32 (MethodHash=ae9bd3c2) for method System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])
; ============================================================
Unwind Info:
@@ -53,7 +51,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 10 (0x0000a) Actual length = 40 (0x000028)
+ Function Length : 8 (0x00008) Actual length = 32 (0x000020)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 219072.dasm - Microsoft.Cci.FullMetadataWriter:CreateReferenceVisitor():Microsoft.Cci.ReferenceIndexer:this@@ -89,10 +89,10 @@ G_M64343_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
; byrRegs -[x14]
add x0, x19, #0xD1FFAB1E
; byrRegs +[x0]
- ldr x21, [x0]
- ; gcrRegs +[x21]
- ldp x23, x24, [x0, #0x08]
- ; gcrRegs +[x23-x24]
+ ldp x21, x23, [x0]
+ ; gcrRegs +[x21 x23]
+ ldr x24, [x0, #0x10]
+ ; gcrRegs +[x24]
movz x25, #0xD1FFAB1E
movk x25, #0xD1FFAB1E LSL #16
movk x25, #0xD1FFAB1E LSL #32
+0 (0.00%) : 241792.dasm - System.Formats.Cbor.CborWriter+KeyValuePairEncodingRange:.ctor(int,int,int):this@@ -20,8 +20,8 @@ G_M54047_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M54047_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str w1, [x0]
- stp w2, w3, [x0, #0x04]
+ stp w1, w2, [x0]
+ str w3, [x0, #0x08]
;; size=8 bbWeight=1 PerfScore 2.00
G_M54047_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
+0 (0.00%) : 256640.dasm - System.IO.Pipelines.PipeAwaitable:ExtractCompletion(byref):this@@ -39,10 +39,10 @@ G_M12398_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=24 bbWeight=1 PerfScore 5.50
G_M12398_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0003 {x0 x1}, byref, isz
; byrRegs +[x0-x1]
- ldr x14, [x0]
- ; gcrRegs +[x14]
- ldp x13, x12, [x0, #0x08]
- ; gcrRegs +[x12-x13]
+ ldp x14, x13, [x0]
+ ; gcrRegs +[x13-x14]
+ ldr x12, [x0, #0x10]
+ ; gcrRegs +[x12]
cbnz x12, G_M12398_IG04
;; size=12 bbWeight=1 PerfScore 8.00
G_M12398_IG03: ; bbWeight=0.50, gcrefRegs=7000 {x12 x13 x14}, byrefRegs=0003 {x0 x1}, byref
@@ -71,8 +71,8 @@ G_M12398_IG07: ; bbWeight=0.50, gcrefRegs=F000 {x12 x13 x14 x15}, byrefRe
;; size=4 bbWeight=0.50 PerfScore 1.50
G_M12398_IG08: ; bbWeight=1, gcrefRegs=1E000 {x13 x14 x15 xip0}, byrefRegs=0003 {x0 x1}, byref, isz
; gcrRegs -[x12]
- str xzr, [x0]
- stp xzr, xzr, [x0, #0x08]
+ stp xzr, xzr, [x0]
+ str xzr, [x0, #0x10]
cbnz x14, G_M12398_IG10
;; size=12 bbWeight=1 PerfScore 3.00
G_M12398_IG09: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0002 {x1}, byref
coreclr_tests.run.linux.arm64.checked.mch-20 (-33.33%) : 243626.dasm - testout1+VT_0_4_4:.ctor(int):this@@ -19,23 +19,18 @@ G_M41861_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M41861_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
fmov d16, #1.0000
- str d16, [x0]
- str d16, [x0, #0x08]
- str d16, [x0, #0x10]
- str d16, [x0, #0x18]
- str d16, [x0, #0x20]
- str d16, [x0, #0x28]
- str d16, [x0, #0x30]
- str d16, [x0, #0x38]
- str d16, [x0, #0x40]
- str d16, [x0, #0x48]
- ;; size=44 bbWeight=1 PerfScore 10.50
+ stp d16, d16, [x0]
+ stp d16, d16, [x0, #0x10]
+ stp d16, d16, [x0, #0x20]
+ stp d16, d16, [x0, #0x30]
+ stp d16, d16, [x0, #0x40]
+ ;; size=24 bbWeight=1 PerfScore 5.50
G_M41861_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 60, prolog size 8, PerfScore 20.00, instruction count 15, allocated bytes for code 60 (MethodHash=e0945c7a) for method testout1+VT_0_4_4:.ctor(int):this
+; Total bytes of code 40, prolog size 8, PerfScore 13.00, instruction count 10, allocated bytes for code 40 (MethodHash=e0945c7a) for method testout1+VT_0_4_4:.ctor(int):this
; ============================================================
Unwind Info:
@@ -46,7 +41,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 15 (0x0000f) Actual length = 60 (0x00003c)
+ Function Length : 10 (0x0000a) Actual length = 40 (0x000028)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-16 (-30.77%) : 243598.dasm - testout1+VT_0_7_8:.ctor(int):this@@ -19,21 +19,17 @@ G_M55818_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M55818_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
fmov d16, #1.0000
- str d16, [x0]
- str d16, [x0, #0x08]
- str d16, [x0, #0x10]
- str d16, [x0, #0x18]
- str d16, [x0, #0x20]
- str d16, [x0, #0x28]
- str d16, [x0, #0x30]
- str d16, [x0, #0x38]
- ;; size=36 bbWeight=1 PerfScore 8.50
+ stp d16, d16, [x0]
+ stp d16, d16, [x0, #0x10]
+ stp d16, d16, [x0, #0x20]
+ stp d16, d16, [x0, #0x30]
+ ;; size=20 bbWeight=1 PerfScore 4.50
G_M55818_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 52, prolog size 8, PerfScore 17.20, instruction count 13, allocated bytes for code 52 (MethodHash=a1af25f5) for method testout1+VT_0_7_8:.ctor(int):this
+; Total bytes of code 36, prolog size 8, PerfScore 11.60, instruction count 9, allocated bytes for code 36 (MethodHash=a1af25f5) for method testout1+VT_0_7_8:.ctor(int):this
; ============================================================
Unwind Info:
@@ -44,7 +40,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 13 (0x0000d) Actual length = 52 (0x000034)
+ Function Length : 9 (0x00009) Actual length = 36 (0x000024)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-16 (-28.57%) : 243629.dasm - testout1+VT_0_4_1:.ctor(int):this@@ -19,22 +19,18 @@ G_M56448_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M56448_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
fmov d16, #1.0000
- str d16, [x0]
- str d16, [x0, #0x08]
- str d16, [x0, #0x10]
- str d16, [x0, #0x18]
- str d16, [x0, #0x20]
- str d16, [x0, #0x28]
- str d16, [x0, #0x30]
- str d16, [x0, #0x38]
+ stp d16, d16, [x0]
+ stp d16, d16, [x0, #0x10]
+ stp d16, d16, [x0, #0x20]
+ stp d16, d16, [x0, #0x30]
str d16, [x0, #0x40]
- ;; size=40 bbWeight=1 PerfScore 9.50
+ ;; size=24 bbWeight=1 PerfScore 5.50
G_M56448_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 56, prolog size 8, PerfScore 18.60, instruction count 14, allocated bytes for code 56 (MethodHash=c6a1237f) for method testout1+VT_0_4_1:.ctor(int):this
+; Total bytes of code 40, prolog size 8, PerfScore 13.00, instruction count 10, allocated bytes for code 40 (MethodHash=c6a1237f) for method testout1+VT_0_4_1:.ctor(int):this
; ============================================================
Unwind Info:
@@ -45,7 +41,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 14 (0x0000e) Actual length = 56 (0x000038)
+ Function Length : 10 (0x0000a) Actual length = 40 (0x000028)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 388608.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.SimpleTernaryOpTest__FusedMultiplyAddBySelectedScalar_Vector128_Single_Vector128_Single_3:.ctor():this@@ -350,9 +350,10 @@ G_M34739_IG03: ; bbWeight=4, isz, extend
cmp w0, #3
bls G_M34739_IG05
str s0, [x21, #0x1C]
- ldr x21, [x20]
- ldp x22, x20, [x20, #0x08]
- ; gcrRegs +[x20 x22]
+ ldp x21, x22, [x20]
+ ; gcrRegs +[x22]
+ ldr x20, [x20, #0x10]
+ ; gcrRegs +[x20]
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
+0 (0.00%) : 454528.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.SimpleTernaryOpTest__MultiplyBySelectedScalarWideningUpperAndSubtract_Vector128_UInt32_Vector64_UInt32_1:.ctor():this@@ -268,9 +268,10 @@ G_M34358_IG02: ; bbWeight=4, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
cmp w1, #1
bls G_M34358_IG05
str w0, [x21, #0x14]
- ldr x21, [x20]
- ldp x22, x20, [x20, #0x08]
- ; gcrRegs +[x20 x22]
+ ldp x21, x22, [x20]
+ ; gcrRegs +[x22]
+ ldr x20, [x20, #0x10]
+ ; gcrRegs +[x20]
;; size=764 bbWeight=4 PerfScore 1086.00
G_M34358_IG03: ; bbWeight=4, extend
movz x0, #0xD1FFAB1E
+0 (0.00%) : 458240.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.SimpleTernaryOpTest__MultiplySubtractByScalar_Vector64_Int16:.ctor():this@@ -350,9 +350,10 @@ G_M3199_IG03: ; bbWeight=4, isz, extend
cmp w1, #3
bls G_M3199_IG05
strh w0, [x21, #0x16]
- ldr x21, [x20]
- ldp x22, x20, [x20, #0x08]
- ; gcrRegs +[x20 x22]
+ ldp x21, x22, [x20]
+ ; gcrRegs +[x22]
+ ldr x20, [x20, #0x10]
+ ; gcrRegs +[x20]
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
DetailsImprovements/regressions per collection
Context information
jit-analyze output |
@SwapnilGaikwad do you want us to kick various jitstress/gcstress jobs? |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice diffs.
src/coreclr/jit/emitarm64.cpp
Outdated
return eRO_none; | ||
} | ||
|
||
if (lastInsFmt != fmt) | ||
if (lastInsFmt != fmt && !(lastInsFmt == IF_LS_2B && fmt == IF_LS_2A) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why we have to add additional checks for IF_LS_2B
and IF_LS_2A
? Are they specifically because we are adding vector register support? Why were they not needed previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is here that we use IF_LS_2A
for base (no offset) and IF_LS_2B
for base + offset? Presumably imm/prevImm are correctly zero for 2A?
Would it be easier to read inverted? i.e.,
const bool compatibleFmt = (lastInsFmt == fmt) || (lastInsFmt == IF_LS_2B && fmt == IF_LS_2A) || (lastInsFmt == IF_LS_2A && fmt == IF_LS_2B);
if (!compatibleFmt) {... return eRO_none; }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any general register (non-Vector) diffs from just this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is here that we use
IF_LS_2A
for base (no offset) andIF_LS_2B
for base + offset?
Sure, but we don't do it for GPR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that the title of the PR doesn't specifies the full functionality. Explicit format check is allowing us to catch the consecutive ldr/str where one instruction uses the offset and one without. This is applicable for both general purpose and SIMD/Vector registers.
Are there any general register (non-Vector) diffs from just this change?
Yup, there are multiple such changes. e.g.,
-4 (-16.67%) : 2709.dasm - System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
@@ -20,15 +20,14 @@ G_M30325_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=8 bbWeight=1 PerfScore 1.50
G_M30325_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
; byrRegs +[x0]
- str x1, [x0]
- str x2, [x0, #0x08]
- ;; size=8 bbWeight=1 PerfScore 2.00
+ stp x1, x2, [x0]
+ ;; size=4 bbWeight=1 PerfScore 1.00
G_M30325_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
; ============================================================
Sure, but we don't do it for GPR?
Yup, I think it was missed previously.
Matching the consecutive ldr/str with mixed formatting is letting us further optimise what the previous optimisation would have allowed us. e.g.,
Previously, the following sequence
str s0, [x0]
str s1, [x0, #0x04]
str s2, [x0, #0x08]
str s3, [x0, #0x0C]
may have been optimised to
str s0, [x0]
stp s1, s2, [x0, #0x04]
str s3, [x0, #0x0C]
but now would be optimised to
stp s0, s1, [x0]
stp s2, s3, [x0, #0x08]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to read inverted? i.e.,
Sure, this is more readable. Done 👍
all the gcstress failures are existing ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your contributions!
Use pairwise load/stores for
(Fixes #83773)
(Fixes #35133)Contributes to #35133. We still need to fix #81278 to cover all the cases.