ARM64: Avoid LEA for volatile IND #64354

EgorBo · 2022-01-26T21:25:16Z

Follow up to #62895
e.g.:

volatile int f1;
volatile int f2;

void Test()
{
    f1 = 1;
    f2 = 1;
}

Codgen diff:

; Method Runtime:Test():this
G_M53852_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M53852_IG02:              ;; offset=0008H
-       52800021          mov     w1, #1
-       D5033BBF          dmb     ish
-       B9000801          str     w1, [x0,#8]
-       D5033BBF          dmb     ish
-       B9000C01          str     w1, [x0,#12]
-						;; bbWeight=1    PerfScore 22.50
+       91002001          add     x1, x0, #8
+       52800022          mov     w2, #1
+       889FFC22          stlr    w2, [x1]
+       91003000          add     x0, x0, #12
+       889FFC02          stlr    w2, [x0]
+						;; bbWeight=1    PerfScore 3.50

G_M53852_IG03:              ;; offset=001CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 36

Same byte size, but better PerfScore (no memory barrier)

ghost · 2022-01-26T21:25:23Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Follow up to #62895
e.g.:

volatile int f1;
volatile int f2;

void Test()
{
    f1 = 1;
    f2 = 1;
}

Codgen diff:

; Method Runtime:Test():this
G_M53852_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M53852_IG02:              ;; offset=0008H
-       52800021          mov     w1, #1
-       D5033BBF          dmb     ish
-       B9000801          str     w1, [x0,#8]
-       D5033BBF          dmb     ish
-       B9000C01          str     w1, [x0,#12]
-						;; bbWeight=1    PerfScore 22.50
+       91002001          add     x1, x0, #8
+       52800022          mov     w2, #1
+       889FFC22          stlr    w2, [x1]
+       91003000          add     x0, x0, #12
+       889FFC02          stlr    w2, [x0]
+						;; bbWeight=1    PerfScore 3.50

G_M53852_IG03:              ;; offset=001CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 36

Same byte size, but better PerfScore (no memory barrier)

Author:	EgorBo
Assignees:	EgorBo
Labels:	`area-CodeGen-coreclr`
Milestone:	-

VSadov · 2022-01-27T02:29:11Z

resulting assembly looks correct and better than before

…-addrmode-volatile

EgorBo · 2022-01-27T12:47:18Z

@dotnet/jit-contrib PTAL

There are some size regressions (https://dev.azure.com/dnceng/public/_build/results?buildId=1575482&view=ms.vss-build-web.run-extensions-tab) due to additional instructions to prepare addresses instead of addressing modes, but PerfScore diffs are much better, e.g. libraries.pmi:

Summary of Perf Score diffs:
(Lower is better)

Total PerfScoreUnits of base: 106196855193729.92
Total PerfScoreUnits of diff: 106196773347615.31
Total PerfScoreUnits of delta: -81846114.64 (-0.00 % of base)
Total relative delta: -650.56

NOTE x64/x86 diffs are unrelated (not reproduceable locally) - spmi job picked a wrong baseline jit for them it seems

src/coreclr/jit/lower.cpp

kunalspathak · 2022-01-27T15:28:35Z

NOTE x64/x86 diffs are unrelated (not reproduceable locally) - spmi job picked a wrong baseline jit for them it seems

It picks the same baseline for x64/x86/arm/arm64...so all the diffs are related to that baseline. The question is why it picked wrong baseline, perhaps we didn't catch up or something in jitrollingbuild? @BruceForstall - any idea?

Current hash: 33daf4a951aeed31c8a6128d1822e3a4d24e72b7

Invoking: git merge-base 33daf4a951aeed31c8a6128d1822e3a4d24e72b7 origin/main

Baseline hash: ac0a35b063c7767618468636b782b19a632393b2

Invoking: git log --pretty=format:%H ac0a35b063c7767618468636b782b19a632393b2 -20 -- src/coreclr/jit/*

try 1: 7fcd1f6e45d937b9835c6cf2b0a2f76e0c180e87

try 2: cbc4cd873ed1f2362265e702dc7dfeaa76c5ec16

Warning: the baseline found is not built with the first git hash with JIT code changes; there may be extraneous diffs

I see that arm asmdiffs timesout which we are seeing more recently.

https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-64354-merge-09dda46f624849feb7/unix-arm/1/console.9af3ba57.log?sv=2019-07-07&se=2022-02-16T09%3A14%3A36Z&sr=c&sp=rl&sig=AT7FG1c8tKPdDsxHJBjql8EFj1a2cu6kox%2FzpwXp0Ws%3D

I would probably rebase and double check with official baseline to make sure there is nothing suspicious.

…-addrmode-volatile

EgorBo · 2022-01-27T16:04:52Z

Yeah I noticed it in my other PR today, some superpmi job silently fails and then displays some previous diffs or something like that.

Rebased.

kunalspathak · 2022-01-27T16:14:34Z

Also that might explain the huge diffs followed by timeouts that many of us are seeing.

src/coreclr/jit/lower.cpp

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>

echesakov · 2022-01-27T16:35:33Z

It seems that the snippet

add     x1, x0, #8
mov     w2, #1
stlr    w2, [x1]

can be optimized even further on targets supporting ARMv8.4

mov     w2, #1
stlur    w2, [x0, #8]

@EgorBo As a general thought, I wonder if in

void Test()
{
    f1 = 1;
    f2 = 1;
}

the first write needs to be store-release?

Shouldn't the following be sufficient?

Store(&f1, 1);
StoreRelease(&f2, 1);

EgorBo · 2022-01-27T16:50:21Z

can be optimized even further on targets supporting ARMv8.4

Good idea! Will file an issue, not sure even my M1 supports it 😄

@EgorBo As a general thought, I wonder if in

I am not sure why I used two variables in that snippet, the issue reproduces even with a single.
Overall you're right, we can relax some barriers in theory for combinations of volatile operations, but I'd prefer not doing that as part of this PR

EgorBo · 2022-01-27T18:50:15Z

@kunalspathak no irrelevant diffs after rebase: https://dev.azure.com/dnceng/public/_build/results?buildId=1576187&view=ms.vss-build-web.run-extensions-tab

kunalspathak

LGTM. Thanks!

BruceForstall · 2022-01-27T19:29:14Z

It picks the same baseline for x64/x86/arm/arm64...so all the diffs are related to that baseline. The question is why it picked wrong baseline, perhaps we didn't catch up or something in jitrollingbuild? @BruceForstall - any idea?

The jitrollingbuild pipeline is set to build every time the JIT changes, and not batch changes:

trigger:
  batch: false
  branches:
    include:
    - main
  paths:
    include:
    - src/coreclr/jit/*
    - src/coreclr/inc/jiteeversionguid.h

However, it appears that AzDO doesn't kick off the build immediately; I noticed a delay of about 10 minutes for this particular change. In a few previous cases for JIT changes, there were also delays before the AzDO job kicked off, and in those cases, another change was merged before the kickoff, and AzDO built that change, not the JIT change that kicked off the job. Because of this, when we upload the JIT to Azure Storage, we use a git hash of a non-JIT change, and when superpmi.py looks for a baseline JIT, it doesn't find the hash of the JIT that should have been built.

Summary: there is an assumption that the pipeline builds every JIT change at the git hash of the JIT change. That doesn't appear to be the case with AzDO. I'm not sure if there's a config to force that to happen. E.g., what if 2 JIT changes were merged at almost exactly the same time? AzDO should build both of them, but does it?

Possibly we could change the jitrollingbuild.py upload script to walk back the git log for the built change and use the first hash that actually has JIT changes in it, to work around this problem.

BruceForstall · 2022-01-27T19:32:09Z

Added #64392

Avoid LEA for volatile IND

27e7eb1

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 26, 2022

ghost assigned EgorBo Jan 26, 2022

EgorBo added 2 commits January 27, 2022 11:33

Update lower.cpp

b2b4a3a

Merge branch 'main' of https://github.com/dotnet/runtime into jit-arm…

c4cba34

…-addrmode-volatile

SingleAccretion reviewed Jan 27, 2022

View reviewed changes

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved

Merge branch 'main' of https://github.com/dotnet/runtime into jit-arm…

126ba2b

…-addrmode-volatile

kunalspathak reviewed Jan 27, 2022

View reviewed changes

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved

Update src/coreclr/jit/lower.cpp

12132eb

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>

Address feedback

e1e21cb

kunalspathak approved these changes Jan 27, 2022

View reviewed changes

EgorBo merged commit 5409d71 into dotnet:main Jan 27, 2022

BruceForstall mentioned this pull request Jan 27, 2022

JIT rolling build can build and upload non-JIT changes #64392

Closed

sbomer mentioned this pull request Jan 28, 2022

[OSX/Linux][arm64] Assertion failed 'addOp2Type != TYP_BYREF && addOp2Type != TYP_REF' #64417

Closed

This was referenced Jan 28, 2022

JIT: Don't avoid LEA for byrefs (Fixes rolling builds) #64444

Merged

ARM64: Use STLUR/LDUR on arm64-v8.4+ #64457

Closed

EgorBo mentioned this pull request Feb 9, 2022

AddRemoveFromDifferentThreads<string>.ConcurrentStack benchmark hangs on ARM64 #64980

Closed

ghost locked as resolved and limited conversation to collaborators Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64: Avoid LEA for volatile IND #64354

ARM64: Avoid LEA for volatile IND #64354

EgorBo commented Jan 26, 2022

ghost commented Jan 26, 2022

VSadov commented Jan 27, 2022

EgorBo commented Jan 27, 2022 •

edited

Loading

kunalspathak commented Jan 27, 2022

EgorBo commented Jan 27, 2022

kunalspathak commented Jan 27, 2022

echesakov commented Jan 27, 2022 •

edited

Loading

EgorBo commented Jan 27, 2022

EgorBo commented Jan 27, 2022

kunalspathak left a comment

BruceForstall commented Jan 27, 2022

BruceForstall commented Jan 27, 2022

ARM64: Avoid LEA for volatile IND #64354

ARM64: Avoid LEA for volatile IND #64354

Conversation

EgorBo commented Jan 26, 2022

ghost commented Jan 26, 2022

VSadov commented Jan 27, 2022

EgorBo commented Jan 27, 2022 • edited Loading

kunalspathak commented Jan 27, 2022

EgorBo commented Jan 27, 2022

kunalspathak commented Jan 27, 2022

echesakov commented Jan 27, 2022 • edited Loading

EgorBo commented Jan 27, 2022

EgorBo commented Jan 27, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

BruceForstall commented Jan 27, 2022

BruceForstall commented Jan 27, 2022

EgorBo commented Jan 27, 2022 •

edited

Loading

echesakov commented Jan 27, 2022 •

edited

Loading