-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64: loop array indexing inefficiencies #34810
Comments
A case of a simple optimization where a STR or LDR immediately followed by an address variable addition could be transformed to subsume the addition into the STR/LDR instruction was discussed here in the context of the intrinsics. It looks like what you are suggesting would be a sequence of transformations, loop induction variable strength reduction, where we either wouldn't maintain cc @AndyAyersMS |
Just verified that gcc seems to do that optimization but clang doesn't. |
Clang uses a more complicated addressing mode but also equally valid. (your example is missing an
Of course simpler addressing modes are always preferred :) |
Note that we have to be careful with ref/byref creation and reporting. E.g., hoisting
|
The PR where this comment was introduced: dotnet/coreclr#17524 |
Right, if we have an address computation where the full computation tree has a mixture of positive and negative adjustments to the address, we need to be careful not to reassociate too broadly; all the intermediate results must be addresses within the bounds of the parent object. |
Given that ARM doesn't have base + scaled index + offset addressing mode, it seems like we really need to be able to hoist |
Definitely, this will not happen in .NET 6.0. |
I think it worth moving this to 7.0 as I'd expect noticeable perf improvements from it: I tried to implement it via https://github.com/dotnet/runtime/pull/60085/files and even emitted something similar but it needs more work. |
I made some progress on this and re-assigning to myself if you don't mind |
@EgorBo you said that you completed this work. Can you link your PR and close this issue? |
Yes, I believe this can be closed via a series of PRs for addressing modes, mainly and follow ups: |
The line
arr[i] = 1
generates the following code to calculate the address of element to save the value.vs. how x64 generates:
The ARM64 pattern can be optimized to use post-index addressing mode using:
category:cq
theme:optimization
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: