Implement volatile barrier APIs #107843

hamarb123 · 2024-09-15T23:37:39Z

This implements the proposed Read-ReadWrite and ReadWrite-Write barriers. Note: I haven't implemented any tests yet.

/cc @jkotas @VSadov @kouvel

Now that I'm on the correct branch

dotnet-issue-labeler · 2024-09-15T23:37:48Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2024-09-15T23:37:49Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

VSadov · 2024-09-16T01:06:43Z

my understanding of the x86 memory model is that it allows reads to be reordered after writes (when different addresses) (which is not supposed to be allowed across a Volatile.WriteBarrier with the ReadWrite-Write model if I'm understanding correctly)

reads are not reordered after writes in x86.
reads can happen earlier than preceding writes (i.e. prefetch), and to prevent that you'd indeed need a fill fence, but that is not something that WriteBarrier needs to guarantee.

In short - WriteBarrier needs to wait for reads/writes in progress to complete before allowing more writes.

on x86/x64 Volatile.WriteBarrier is just a compiler fence, similar to Volatile.ReadBarrier.
on arm it is a full fence (dmb[ish]). Sadly, dmb.st does not wait for reads.

hamarb123 · 2024-09-16T01:32:54Z

Hmm, I still don't understand. What's the difference between:

...x
Volatile.ReadBarrier(); //emits nothing
Volatile.WriteBarrier(); //emits nothing
...y

And

...x
Interlocked.MemoryBarrier(); //emits lock ...
...y

@VSadov can you give me an example of some x & y where the behaviour is allowed to be different so I can see an example of what I'm not understanding?

Edit: I think I understand now (leaving this here for my future reference)

read a
write b
Volatile.ReadBarrier(); Volatile.WriteBarrier(); //or swapped
read c
write d

could be re-ordered to: read a, read c, write b, write d, whereas Interlocked.MemoryBarrier(); would stop this re-ordering obviously.

- And fix missed file from jit-format

hamarb123 · 2024-09-16T02:37:26Z

on arm it is a full fence (dmb[ish]). Sadly, dmb.st does not wait for reads.

Would dmb ishst + dmb ishld be enough? (idk if this is actually faster anyway, but maybe it is?)
It would give Load-Load, Load-Store, and Store-Store guarantees according to this.

Using my example from earlier: read a, write b, barrier/s, read c, write d (where these represent arbitrary quantities of reads & writes in any order):

Volatile.WriteBarrier requires: a,b before d
dmb ishst gives b before d
dmb ishld gives a before c,d

So it would seem to me as though dmb ishst + dmb ishld (in either order) should theoretically be enough. Whether it's faster than just a dmb ish is another question obviously (one that is only really relevant if it's a valid approach anyway). If there's something wrong with my analysis, please let me know :)

VSadov · 2024-09-16T03:56:12Z

Would dmb ishst + dmb ishld be enough? (idk if this is actually faster anyway, but maybe it is?)

At first glance it seems that the combination is as good as a full barrier.
If the cost of a full barrier could be reduced by doing two half barriers instead, I'd think that is how hardware would do it, so likely it is not faster.

hamarb123 · 2024-09-16T03:58:18Z

At first glance it seems that the combination is as good as a full barrier. If the cost of a full barrier could be reduced by doing two half barriers instead, I'd think that is how hardware would do it, so likely it is not faster.

It's actually not as strong as a full barrier, since it doesn't give b before c, which is the same thing that x86 doesn't give by default I think based on what you were saying.

VSadov · 2024-09-16T04:22:34Z

At first glance it seems that the combination is as good as a full barrier. If the cost of a full barrier could be reduced by doing two half barriers instead, I'd think that is how hardware would do it, so likely it is not faster.

It's actually not as strong as a full barrier, since it doesn't give b before c, which is the same thing that x86 doesn't give by default I think based on what you were saying.

Ah, right, it still does not order Store-Load. It could be cheaper then, since it guarantees less.

hamarb123 · 2024-09-19T11:04:54Z

Ah, right, it still does not order Store-Load. It could be cheaper then, since it guarantees less.

I did some testing based on code I gave in the use case section of my api proposal issue (converted to C++) on a M-series macbook and got about a 1.4% regression overall (don't interpret that as the pair is exactly 1.4% slower than just dmb ish as there is obviously other code around the dmb instructions, it's just most likely not faster I think) (testing ran for about 25 mins total), so probably no point pursuing this idea further. Not overly surprised, but anyway.

- And fix up some FIXMEs & comments re cpobj and cpblk

JulieLeeMSFT · 2024-10-14T20:20:56Z

@VSadov, could you please review the last comment from the author?

VSadov · 2024-10-14T20:22:02Z

@JulieLeeMSFT I will take a look

VSadov · 2024-10-15T15:37:39Z

The failures in superpmi are because the change is compat-breaking for the old corelib. Thus the new JIT is not a drop-in replacement for the old one.

We need to update JITEEVersionIdentifier guid in jiteeversionguid.h to signal that.

jakobbotsch · 2024-10-15T15:51:52Z

We need to update JITEEVersionIdentifier guid in jiteeversionguid.h to signal that.

FWIW, you can do this simply by running the ThunkGenerator (https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/ThunkGenerator/gen.sh or https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/ThunkGenerator/gen.bat)

VSadov

LGTM! Thanks!!

hamarb123 · 2024-10-15T23:03:40Z

🎉 btw, I haven't added tests for the methods - is this a problem? If so, please give me an idea of what I should write for such a test 😆

VSadov · 2024-10-15T23:25:01Z

🎉 btw, I haven't added tests for the methods - is this a problem? If so, please give me an idea of what I should write for such a test 😆

the read barrier will have good coverage via the cast cache. The cache gets a lot of traffic, so will work better than any targeted test.
the write barrier is not emitted on x64 and emits full fence on arm64. I think some test can be written for this, but having something that would reliable catch regressions would be difficult.
I think we can rely on the fact that a self-referential intrinsic must expand to something and chances that it regresses and starts doing nothing are low, so probably just taking a look at native codegen (to be sure the fences are there) would be sufficient.

EgorBo · 2024-10-15T23:41:34Z

@VSadov btw, I was profiling an app recently (OrchardCMS) on linux-arm64 and various Barrier speculatively executed X PMU counters were mostly pointing to CastCache as the main offender 🙂 E.g.

(dmb_spec event)

hamarb123 · 2024-10-16T21:55:53Z

What else needs to be done here for merging (other than fixing up the jit-ee version guid just before merging)? Do we need other reviews? Thanks.

VSadov · 2024-10-16T22:04:33Z

@dotnet/jit-contrib any concerns/comments on this change? I think this is ready for merging.

We need a new guid though as there was a conflicting change.

hamarb123 added 2 commits September 16, 2024 08:49

Initial commit

aee00b5

Follow-up commit

892d7cd

Now that I'm on the correct branch

hamarb123 requested review from lambdageek and steveisok as code owners September 15, 2024 23:37

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Sep 15, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 15, 2024

jit-format

5fc6d75

Implement Feedback

0b5d490

- And fix missed file from jit-format

build-analysis bot mentioned this pull request Sep 16, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

This was referenced Sep 16, 2024

[browser] Unable to evaluate script: tab crashed #103623

Open

SpawnBrowserAsync on Windows #107771

Open

hamarb123 added 2 commits September 19, 2024 21:20

Fix typos & use appropriate barrier type in mono

9a103d1

Use optimised dmb on arm64 where possible on mini-mono runtime

9cf3fd6

- And fix up some FIXMEs & comments re cpobj and cpblk

JulieLeeMSFT requested a review from VSadov October 14, 2024 16:37

Update Memory-model.md

7171f77

build-analysis bot mentioned this pull request Oct 15, 2024

chrome-DebuggerTests timing out #108078

Open

Move BarrierKind into compiler.h

d193965

This was referenced Oct 15, 2024

[wasm] DebuggerTests.WasmHostProvider.LaunchHostAsync #108071

Open

[wasm] DebuggerTests.Inspector.OpenSessionAsync #108072

Open

[Wasm][Test Failure] ConsolePublishAndRun(config: "Release", aot: True, relinking: False) #106162

Open

Fix build & jit-format

04e210d

hamarb123 added 2 commits October 16, 2024 07:37

Update jiteeversionguid.h

a20dbaa

Merge branch 'main' into main20

ce74b43

VSadov approved these changes Oct 15, 2024

View reviewed changes

hamarb123 added 3 commits October 17, 2024 09:33

Merge remote-tracking branch 'upstream/main' into main20

1786be3

Merge remote-tracking branch 'upstream/main' into main20

df7fd1b

Update jiteeversionguid.h

162430d

This was referenced Oct 17, 2024

WasmTestOnChrome timeouts in CI #105363

Closed

[wasm][mt] mono_wasm_load_runtime () failed RuntimeError: index out of bounds #104271

Open

VSadov merged commit 830ce3a into dotnet:main Oct 17, 2024
163 of 167 checks passed

xtqqczze mentioned this pull request Oct 30, 2024

Consider making Interlocked.ReadMemoryBarrier public #35761

Open

timcassell mentioned this pull request Nov 6, 2024

[API Proposal]: ImmutableInterlocked.VolatileIsDefault #109572

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement volatile barrier APIs #107843

Implement volatile barrier APIs #107843

hamarb123 commented Sep 15, 2024 •

edited

Loading

dotnet-issue-labeler bot commented Sep 15, 2024

dotnet-issue-labeler bot commented Sep 15, 2024

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

VSadov commented Sep 16, 2024

hamarb123 commented Sep 19, 2024 •

edited

Loading

JulieLeeMSFT commented Oct 14, 2024

VSadov commented Oct 14, 2024

VSadov commented Oct 15, 2024 •

edited

Loading

jakobbotsch commented Oct 15, 2024

VSadov left a comment

hamarb123 commented Oct 15, 2024 •

edited

Loading

VSadov commented Oct 15, 2024 •

edited

Loading

EgorBo commented Oct 15, 2024 •

edited

Loading

hamarb123 commented Oct 16, 2024 •

edited

Loading

VSadov commented Oct 16, 2024

Implement volatile barrier APIs #107843

Implement volatile barrier APIs #107843

Conversation

hamarb123 commented Sep 15, 2024 • edited Loading

dotnet-issue-labeler bot commented Sep 15, 2024

dotnet-issue-labeler bot commented Sep 15, 2024

VSadov commented Sep 16, 2024 • edited Loading

hamarb123 commented Sep 16, 2024 • edited Loading

hamarb123 commented Sep 16, 2024 • edited Loading

VSadov commented Sep 16, 2024 • edited Loading

hamarb123 commented Sep 16, 2024 • edited Loading

VSadov commented Sep 16, 2024

hamarb123 commented Sep 19, 2024 • edited Loading

JulieLeeMSFT commented Oct 14, 2024

VSadov commented Oct 14, 2024

VSadov commented Oct 15, 2024 • edited Loading

jakobbotsch commented Oct 15, 2024

VSadov left a comment

Choose a reason for hiding this comment

hamarb123 commented Oct 15, 2024 • edited Loading

VSadov commented Oct 15, 2024 • edited Loading

EgorBo commented Oct 15, 2024 • edited Loading

hamarb123 commented Oct 16, 2024 • edited Loading

VSadov commented Oct 16, 2024

hamarb123 commented Sep 15, 2024 •

edited

Loading

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

VSadov commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 16, 2024 •

edited

Loading

hamarb123 commented Sep 19, 2024 •

edited

Loading

VSadov commented Oct 15, 2024 •

edited

Loading

hamarb123 commented Oct 15, 2024 •

edited

Loading

VSadov commented Oct 15, 2024 •

edited

Loading

EgorBo commented Oct 15, 2024 •

edited

Loading

hamarb123 commented Oct 16, 2024 •

edited

Loading