-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower GetElement on arm64 to the correct access sequence #104288
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc4ebd7
to
2705326
Compare
a3b0300
to
95fdd44
Compare
0d4e0f3
to
a6a4b0e
Compare
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
263de58
to
244c8da
Compare
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
CC. @dotnet/jit-contrib, This is the "proper" fix for #104232, which was worked around in #104264. No TP diff, but good asmdiff, such as for Linux Arm64:
It looks like there's more improvements to be had, but those can be handled separately I think. An example is we're generating: - ldr q16, [fp, #-0x68] // [V98 tmp78]
- dup s16, v16.s[1]
- ldr q18, [fp, #-0x68] // [V98 tmp78]
- dup s18, v18.s[2]
- ldr q19, [fp, #-0x68] // [V98 tmp78]
- dup s19, v19.s[3]
+ sub x1, fp, #104 // [V98 tmp78]
+ ; byrRegs +[x1]
+ ldr s16, [x1, #0x04]
+ sub x1, fp, #104 // [V98 tmp78]
+ ldr s18, [x1, #0x08]
+ sub x1, fp, #104 // [V98 tmp78]
+ ldr s19, [x1, #0x0C] This is a net improvement as we're only loading scalars, rather than repeatedly loading vectors, but it does represent a case that probably should've been represented as: ldr s16, [fp, #-0x64]
ldr s18, [fp, #-0x60]
ldr s19, [fp, #-0x5C] I think there's just a general optimization missing here when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Unlike xarch where this could be handled a bit more trivially in codegen, Arm64 isn't as flexible and so we get better and more correct codegen by explicitly lowering to the correct sequence instead.