ARM64: loop array indexing inefficiencies #34810

kunalspathak · 2020-04-10T08:25:40Z

public int Test()
{
    int[] arr = new int[10];
    int i = 0;
    while (i < 9)
    {
        if (i >= 2) 
        {
            arr[i] = 1;  // <---- IG04
        }
        i++;
    }
    return 0;
}

The line arr[i] = 1 generates the following code to calculate the address of element to save the value.

...
G_M8556_IG04:
        93407C22          sxtw    x2, x1
        D37EF442          lsl     x2, x2, #2
        91004042          add     x2, x2, #16
        52800023          mov     w3, #1
        B8226803          str     w3, [x0, x2]
...

vs. how x64 generates:

G_M27956_IG04:
       4863CA               movsxd   rcx, edx
       C744881001000000     mov      dword ptr [rax+4*rcx+16], 1

The ARM64 pattern can be optimized to use post-index addressing mode using:

# x1 contains <<base address of arr>>+16
mov w0, 1
str w0, [x1], 4

category:cq
theme:optimization
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

kunalspathak · 2020-04-10T08:26:11Z

@TamarChristinaArm , @BruceForstall

BruceForstall · 2020-04-10T17:34:22Z

A case of a simple optimization where a STR or LDR immediately followed by an address variable addition could be transformed to subsume the addition into the STR/LDR instruction was discussed here in the context of the intrinsics.

It looks like what you are suggesting would be a sequence of transformations, loop induction variable strength reduction, where we either wouldn't maintain i separately, or would maintain both i and <array base> + 16 + i * 4 in the loop.

cc @AndyAyersMS

kunalspathak · 2020-04-10T18:57:29Z

Just verified that gcc seems to do that optimization but clang doesn't.
https://godbolt.org/z/Wp9Xhu

TamarChristinaArm · 2020-04-14T14:09:09Z

Just verified that gcc seems to do that optimization but clang doesn't.
https://godbolt.org/z/Wp9Xhu

Clang uses a more complicated addressing mode but also equally valid. (your example is missing an -O1).

        str     w8, [x9, x8, lsl #2]
        add     x8, x8, #1              // =1
        cmp     x8, #10                 // =10
        b.ne    .LBB0_1

Of course simpler addressing modes are always preferred :)

BruceForstall · 2020-04-23T02:32:54Z

Note that we have to be careful with ref/byref creation and reporting. E.g., hoisting <array base> + 16 out of the loop to create a pointer to the array element base would create a byref pointer that needs to be reported. Note the comment in fgMorphArrayIndex:

// Be careful to only create the byref pointer when the full index expression is added to the array reference.
// We don't want to create a partial byref address expression that doesn't include the full index offset:
// a byref must point within the containing object. It is dangerous (especially when optimizations come into
// play) to create a "partial" byref that doesn't point exactly to the correct object; there is risk that
// the partial byref will not point within the object, and thus not get updated correctly during a GC.
// This is mostly a risk in fully-interruptible code regions.

BruceForstall · 2020-04-23T17:26:00Z

The PR where this comment was introduced: dotnet/coreclr#17524

AndyAyersMS · 2020-04-23T17:51:24Z

Right, if we have an address computation where the full computation tree has a mixture of positive and negative adjustments to the address, we need to be careful not to reassociate too broadly; all the intermediate results must be addresses within the bounds of the parent object.

BruceForstall · 2020-04-23T19:57:51Z

Given that ARM doesn't have base + scaled index + offset addressing mode, it seems like we really need to be able to hoist <object base> [ref] + <array first element offset> [native int] out of a loop as a byref.

kunalspathak · 2021-06-04T18:32:05Z

Definitely, this will not happen in .NET 6.0.

EgorBo · 2021-10-07T13:00:04Z

I think it worth moving this to 7.0 as I'd expect noticeable perf improvements from it:

I tried to implement it via https://github.com/dotnet/runtime/pull/60085/files and even emitted something similar but it needs more work.

EgorBo · 2021-11-01T08:24:09Z

I made some progress on this and re-assigning to myself if you don't mind

JulieLeeMSFT · 2022-02-23T18:18:35Z

@EgorBo you said that you completed this work. Can you link your PR and close this issue?

EgorBo · 2022-02-24T13:43:07Z

Yes, I believe this can be closed via a series of PRs for addressing modes, mainly

[arm64] JIT: Enable CSE/hoisting for "arrayBase + elementOffset" #61293

and follow ups:

Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 10, 2020

kunalspathak added the arch-arm64 label Apr 10, 2020

BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 13, 2020

BruceForstall added this to the Future milestone Apr 13, 2020

BruceForstall changed the title ~~ARM64: Use post-index addressing mode to access array elements~~ ARM64: loop array indexing inefficiencies Apr 23, 2020

BruceForstall mentioned this issue Apr 29, 2020

Code inefficiencies in loop array indexing #35618

Closed

kunalspathak mentioned this issue May 5, 2020

Improving ARM64 Performance in .NET 5.0 – Closing the gap with x64 #35853

Closed

BruceForstall mentioned this issue Oct 17, 2020

Improve JIT loop optimizations (.NET 6) #43549

Closed

25 tasks

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall modified the milestones: Future, 6.0.0 Nov 25, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020

JulieLeeMSFT assigned kunalspathak Mar 23, 2021

JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021

kunalspathak modified the milestones: 6.0.0, Future Jun 4, 2021

JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jun 7, 2021

BruceForstall mentioned this issue Jul 6, 2021

Improve JIT loop optimizations (.NET 7) #55235

Closed

5 tasks

BruceForstall removed this from the Future milestone Oct 7, 2021

BruceForstall added this to the 7.0.0 milestone Oct 7, 2021

EgorBo assigned EgorBo and unassigned kunalspathak Nov 1, 2021

kunalspathak mentioned this issue Nov 8, 2021

[arm64] JIT: Enable CSE/hoisting for "arrayBase + elementOffset" #61293

Merged

BruceForstall mentioned this issue Feb 15, 2022

Improve JIT loop optimizations #65342

Open

20 tasks

EgorBo closed this as completed Feb 24, 2022

ghost locked as resolved and limited conversation to collaborators Mar 26, 2022

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64: loop array indexing inefficiencies #34810

ARM64: loop array indexing inefficiencies #34810

kunalspathak commented Apr 10, 2020 •

edited by BruceForstall

Loading

kunalspathak commented Apr 10, 2020

BruceForstall commented Apr 10, 2020

kunalspathak commented Apr 10, 2020

TamarChristinaArm commented Apr 14, 2020 •

edited

Loading

BruceForstall commented Apr 23, 2020

BruceForstall commented Apr 23, 2020

AndyAyersMS commented Apr 23, 2020

BruceForstall commented Apr 23, 2020

kunalspathak commented Jun 4, 2021

EgorBo commented Oct 7, 2021

EgorBo commented Nov 1, 2021

JulieLeeMSFT commented Feb 23, 2022

EgorBo commented Feb 24, 2022 •

edited

Loading

ARM64: loop array indexing inefficiencies #34810

ARM64: loop array indexing inefficiencies #34810

Comments

kunalspathak commented Apr 10, 2020 • edited by BruceForstall Loading

kunalspathak commented Apr 10, 2020

BruceForstall commented Apr 10, 2020

kunalspathak commented Apr 10, 2020

TamarChristinaArm commented Apr 14, 2020 • edited Loading

BruceForstall commented Apr 23, 2020

BruceForstall commented Apr 23, 2020

AndyAyersMS commented Apr 23, 2020

BruceForstall commented Apr 23, 2020

kunalspathak commented Jun 4, 2021

EgorBo commented Oct 7, 2021

EgorBo commented Nov 1, 2021

JulieLeeMSFT commented Feb 23, 2022

EgorBo commented Feb 24, 2022 • edited Loading

kunalspathak commented Apr 10, 2020 •

edited by BruceForstall

Loading

TamarChristinaArm commented Apr 14, 2020 •

edited

Loading

EgorBo commented Feb 24, 2022 •

edited

Loading