-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unroll Buffer.Memmove for arm64 #83740
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsFollow up to #83638 to do the same on arm64. void CopySpan(Span<byte> dst, ReadOnlySpan<byte> src) =>
src.Slice(0, 10).TryCopyTo(dst); AltJit-arm64: ; Method Prog:CopySpan(System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]):this
G_M41396_IG01:
A9BF7BFD stp fp, lr, [sp, #-0x10]!
910003FD mov fp, sp
G_M41396_IG02:
7100289F cmp w4, #10
54000123 blo G_M41396_IG05
7100285F cmp w2, #10
540000A3 blo G_M41396_IG04
G_M41396_IG03:
F9400060 ldr x0, [x3] ;; <---
F8402062 ldr x2, [x3, #0x02] ;; <---
F9000020 str x0, [x1] ;; <---
F8002022 str x2, [x1, #0x02] ;; <---
G_M41396_IG04:
A8C17BFD ldp fp, lr, [sp], #0x10
D65F03C0 ret lr
G_M41396_IG05:
D2960C00 movz x0, #0xB060 // code for System.ThrowHelper:ThrowArgumentOutOfRangeException()
F2A90A80 movk x0, #0x4854 LSL #16
F2CFFF20 movk x0, #0x7FF9 LSL #32
F9400000 ldr x0, [x0]
D63F0000 blr x0
D43E0000 brk_windows #0
; Total bytes of code: 72
|
Add 3xSIMD unrolling tests
PTAL @dotnet/jit-contrib @jakobbotsch @SingleAccretion (since you reviewed the previous version) 🙂 It is possible some parts can be shared but it doesn't look to be a lot of code to bother probably and there are still differences. @MichalPetryka volanteered to address some TODO-CQ on both x64 and arm64. |
/azp list |
This comment was marked as outdated.
This comment was marked as outdated.
/azp run runtime-coreclr r2r, runtime-coreclr crossgen2, runtime-coreclr jitstress2-jitstressregs, runtime-coreclr jitstressregs, runtime-coreclr jitstress |
Azure Pipelines successfully started running 5 pipeline(s). |
src/coreclr/jit/gentree.cpp
Outdated
assert((cur != nullptr) && "Not enough user arguments in GetUserArgByIndex"); | ||
for (unsigned i = 0; i < index || cur->IsArgAddedLate();) | ||
{ | ||
if (!cur->IsArgAddedLate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use cur->GetWellKnownArg() != WellKnownArg::None
instead? (And then the Remarks "The current implementation doesn't..." can be removed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would ignore the 'this' arg and also ShiftLow/ShiftHigh for x86 shift helpers are arguably "user args" (in that they manifest in the IL). Maybe we should add an IsUserArg()
or IsILArg()
that has these special cases and use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added IsUserArg()
@jakobbotsch @BruceForstall I've addressed your feedback |
SPMI diffs (didn't expect to see them, but looks like Kunal triggered a new SPMI collection yesterday) Size regressions are expected in cases where we unroll large blocks, didn't expect to see TP improvements |
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
@a74nh It looks like this PR introduces a lot more opt. opportunities for #83773 just thought you might want to know 🙂 e.g. @@ -156,27 +156,24 @@ G_M51463_IG11: ; bbWeight=0.50, gcrefRegs=100000 {x20}, byrefRegs=0000 {}
mov w0, #25
bl <unknown method>
; gcrRegs +[x0]
- mov x19, x0
- ; gcrRegs +[x19]
- ldrsb wzr, [x19]
- add x0, x19, #12
- ; gcrRegs -[x0]
- ; byrRegs +[x0]
- add x1, x20, #12
+ ldrsb wzr, [x0]
+ add x1, x0, #12
; byrRegs +[x1]
- mov x2, #50
- movz x3, #0xD1FFAB1E // code for <unknown method>
- movk x3, #0xD1FFAB1E LSL #16
- movk x3, #1 LSL #32
- ldr x3, [x3]
- blr x3
- ; gcrRegs -[x20]
- ; byrRegs -[x0-x1]
- mov x20, x19
- ; gcrRegs +[x20]
- ;; size=52 bbWeight=0.50 PerfScore 6.25
+ add x2, x20, #12
+ ; byrRegs +[x2]
+ ldr q16, [x2]
+ ldr q17, [x2, #0x10]
+ ldr q18, [x2, #0x20]
+ ldr q19, [x2, #0x22]
+ str q16, [x1]
+ str q17, [x1, #0x10]
+ str q18, [x1, #0x20]
+ str q19, [x1 |
Yes, I could emit |
Failures: #83655 |
Follow up to #83638 to do the same on arm64.
AltJit-arm64:
Range: 1..64 bytes (x64 handles 1..128 bytes or 1..256 bytes if we enable AVX-512)