Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Memory barrier improvements #62895

Merged
merged 4 commits into from
Jan 5, 2022
Merged

Conversation

kunalspathak
Copy link
Member

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 16, 2021
@ghost ghost assigned kunalspathak Dec 16, 2021
@ghost
Copy link

ghost commented Dec 16, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details
Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@alexrp
Copy link
Contributor

alexrp commented Dec 16, 2021

I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64:

Basically, some of the instructions don't quite have the semantics you might expect.

@kunalspathak
Copy link
Member Author

I see approx. 0.58% and 0.28% improvement in RPS in mvc and json benchmarks respectively. I believe these are in error range.

@kunalspathak
Copy link
Member Author

I haven't reviewed this PR in detail, but I would just advise caution when it comes to memory barriers on ARM64:

Thanks @alexrp . This PR just extends the optimal memory barriers for volatile keyword and for one scenario where we can use dmb ishst.

@kunalspathak kunalspathak marked this pull request as ready for review January 3, 2022 20:32
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@EgorBo
Copy link
Member

EgorBo commented Jan 3, 2022

LGTM, @VSadov could you please take a quick look if you have time
tldr:

volatileVariable = 42;

used to emit a full memory barrier, now it emits a store-only one.

Also, for most cases stores/loads for variables marked as "volatile" now actually don't emit memory barriers at all and use e.g. stlr instead in case of store, basically the same what Volatile.Write emits today, see #60232

@@ -3280,7 +3280,7 @@ void CodeGen::genCodeForStoreInd(GenTreeStoreInd* tree)
else
{
// issue a full memory barrier before a volatile StInd
instGen_MemoryBarrier();
instGen_MemoryBarrier(BARRIER_STORE_ONLY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the typical scenario when this branch is taken? When value is a struct?

Copy link
Member

@VSadov VSadov Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the purpose of this barrier is to have release semantics for storing into volatile variable that does not fit into a single register (otherwise stlr could be used).

I think this needs a full barrier, since unlike stlr, dmb ishst only waits for stores in progress and has no effect on loads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, ldar can be replaced with ldr; dmb ishld , but there is no such equivalency between stlr and dmb ishst; str because ishst is too weak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://developer.arm.com/documentation/100941/0100/Barriers, trying to understand what you stated.

image

image

Basically, for ishst, loads can still be reordered around barrier and hence it is weaker than the ishld where loads/stores need to wait till the barrier is complete.

If we change from full barrier to ishst, we might have a load that should have been completed but got reordered and might end up reading the wrong value (pre-updated value). Is my understanding correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, loads that appear after ishst, in program order, may speculatively happen ahead of the store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static volatile int x;
static volatile int y;

static int xx;
static int yy;

---- one thread:
x = 42;
yy = y;

--- another thread:
y = 42;
xx = x;

Can both xx and yy end up 0 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I will revert the change related to shst then.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@EgorBo
Copy link
Member

EgorBo commented Jan 20, 2022

Improvement on ubuntu-arm64: dotnet/perf-autofiling-issues#2981
and win-arm64: dotnet/perf-autofiling-issues#2977

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants