Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we expose atomic load/store on targets that don't support full atomics #99668

Open
Amanieu opened this issue Jul 24, 2022 · 51 comments
Labels
A-atomic Area: Atomics, barriers, and sync primitives O-bare-metal Target: Rust without an operating system P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Amanieu
Copy link
Member

Amanieu commented Jul 24, 2022

Some embedded targets (thumbv6m, riscv32i) don't support general atomic operations (compare_exchange, fetch_add, etc) but do support atomic load and store operations. We previously supported this on thumbv6m (but not riscv32i, see #98333) by cfging out most of the methods on Atomic* except for load and store. However recent changes to LLVM (#99595) cause it to emit calls to libatomic for load and store (it already does this for the other atomic operations).

It seems that LLVM's support for lowering atomic loads and stores on ARM was an accident that is being reverted. However the reason for this revert is that the directly lowered load/store cannot interoperate correctly with the other atomic operations which are lowered to libcalls. This concern doesn't apply to Rust since we don't expose emulated (read: not lock free) atomics, so there is no interoperability concern (except maybe for FFI with C that uses atomics?).

I see 2 ways we can move forward:

  1. Continue supporting atomic load/store on such targets by lowering them in rustc to a volatile load/store. This will avoid breakage in the embeded Rust ecosystem.
  2. Remove support for atomic load/store on targets that don't support full atomics. Users are expected to switch to using volatile load/store instead.
@Amanieu Amanieu added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Jul 24, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 24, 2022
@jamesmunns
Copy link
Member

Just wanted to add my $0.02, option two, "Remove support for atomic load/store on targets that don't support full atomics", would be a breaking change to (at least) one Tier 2 target: thumbv6m-none-eabi.

It would break a non-trivial number of embedded projects (there are lots of Cortex-M0+ cores out there, including the very popular RP2040), and might not be picked up by a crater run, as crate may not exercise specific targets during the run.

@jamesmunns
Copy link
Member

Also just as a note, we might want to check in that armv5te still builds with these relevant changes. It has "native" load/stores, but no CAS operations. These are provided by an OS-level polyfill for the linux target.

It's likely their load/stores will also be affected by any change here.

@Amanieu
Copy link
Member Author

Amanieu commented Jul 24, 2022

AFAIK armv5te should not be affected by this change, but we should double check with the upgrade to LLVM 15.

@adamgreig
Copy link
Member

Currently the core::ptr docs say:

All accesses performed by functions in this module are non-atomic in the sense of atomic operations used to synchronize between threads. This means it is undefined behavior to perform two concurrent accesses to the same location from different threads unless both accesses only read from memory. Notice that this explicitly includes read_volatile and write_volatile: Volatile accesses cannot be used for inter-thread synchronization.

One downside of requiring users on these platforms to use volatile ops instead is that (in Rust) volatile is a property of an access (so a static UnsafeCell/static mut could still be accessed with a non-volatile operation) while the Atomic types only have atomic operations (via AtomicU8, AtomicBool, etc), and I don't think there's any appetite for VolatileU32 or similar.

There is the atomic-polyfill crate which is quite popular in embedded Rust and provides CAS operations on thumbv6 etc while directly forwarding core::sync::atomic on other platforms. If option 2 was chosen, probably this crate could help smooth out a transition.

@taiki-e
Copy link
Member

taiki-e commented Jul 24, 2022

the atomic-polyfill crate

That is unsound on multi-core systems. (See tokio-rs/bytes#461 (comment) for an approach to handle such cases soundly.)

@adamgreig
Copy link
Member

adamgreig commented Jul 24, 2022

That is unsound on multi-core systems.

I thought it used the critical-section crate, which requires users to provide a critical section implementation for their system - so on a multi-core system you need to provide something with the appropriate guarantees, and it won't compile if no implementation is provided. It doesn't simply disable all interrupts on the CM0 core to obtain the CAS locks.

Edit: The currently-released critical-section 0.2.7 does by default provide a "disable all interrupts" critical section for thumbv* platforms, which is unsound on multi-core, though users on multi-core systems can still provide their own implementation which is sound. The next release of critical-section removes all the built-in implementations entirely, and it's expected most platform support crates will provide an implementation instead, which should be sound for their respective platform. In any event I don't think it gets in the way of having atomic-polyfill provide replacement Atomic* types that can do load/store on thumbv6 if need be.

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-critical +T-compiler

@rustbot rustbot added P-critical Critical priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 24, 2022
@taiki-e
Copy link
Member

taiki-e commented Jul 24, 2022

requires users to provide a critical section implementation for their system - so on a multi-core system you need to provide something with the appropriate guarantees, and it won't compile if no implementation is provided

Note that rust-embedded/critical-section@a3cdfd8 has not been released yet. The releases available on crates.io do not work this way.

And even if it were released, atomic-polyfill is still unsound on multi-core systems because it mixes native atomic load/store and critical-section.

@taiki-e
Copy link
Member

taiki-e commented Jul 24, 2022

FWIW, my portable-atomic crate implements the equivalent of the atomic types that was be removed in #99595 (by using inline assembly). It also provides equivalents on other platforms where atomic load/store is not provided (riscv without A-extension, msp430, avr).

@RalfJung
Copy link
Member

Remove support for atomic load/store on targets that don't support full atomics. Users are expected to switch to using volatile load/store instead.

That seems in conflict with us telling people for years now that volatile operations are not atomic. And they aren't, so this strategy risks subtle miscompilations.

@taiki-e
Copy link
Member

taiki-e commented Jul 24, 2022

using volatile load/store

If we implement this in the compiler (i.e., the first way that Amanieu mentioned), I think it is safe to use volatile load/store -- If there are tests to ensure that these operations are lowered properly by LLVM, we can guarantee compiler that abusing UB of using non-atomic volatile load/store as atomic load/store will never be shipped.

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 24, 2022
Revert "Mark atomics as unsupported on thumbv6m"

This is a breaking change for the `thumbv6m` target. See rust-lang#99668 for discussion on how we can proceed forward from here.

This reverts commit 7514610.

cc `@nikic`
@RalfJung
Copy link
Member

RalfJung commented Jul 24, 2022

If we implement this in the compiler (i.e., the first way that Amanieu mentioned), I think it is safe to use volatile load/store

If we do this after LLVM, yes.
But if we tell LLVM that these are (implicitly non-atomic) volatile accesses, it might optimize accordingly. Checking how LLVM lowers them is not enough, it does not take into account the effect of optimizations. So at that point inline assembly would be the safer choice.

@nikic
Copy link
Contributor

nikic commented Jul 24, 2022

I think the ideal solution would be to add a new target feature to LLVM, which makes the promise that no atomic CAS will be used, and thus allow lowering atomic load/store without libcalls (I've commented to that effect here: https://reviews.llvm.org/D120026#3674333). That would restore the status quo.

We should probably also document the special assumptions made by this target (or any other with "only load/store atomics"), which should boil down to:

  • Linking against non-Rust code using atomics is not supported (and may be silently unsound).
  • Atomic CAS can only be safely emulated on single-core systems.
  • To support atomic CAS on multi-core systems, a custom target spec that disables all atomics should be used and all atomic operations should go through a polyfill (with a shared lock) instead.

@Mark-Simulacrum Mark-Simulacrum added this to the 1.64.0 milestone Jul 24, 2022
@nikic nikic added P-high High priority and removed P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 24, 2022
@nikic
Copy link
Contributor

nikic commented Jul 24, 2022

Dropping regression label and downgrading priority as the change is reverted now. We still need a solution for the LLVM upgrade of course.

Quoting Eli's response from https://reviews.llvm.org/D120026#3674604:

I see two possible approaches to restore the functionality you want.

One, you could add a target feature that says "I have 32-bit (or 64-bit) atomics". That would override the normal rule for setMaxAtomicSizeInBitsSupported. The target environment would be expected to provide whatever _sync* functions are necessary. If you don't actually use any atomic operations that require CAS, you won't see any calls, so everything works even if the functions don't actually exist. (Or it's possible to write an operating system that implements lock-free CAS on Cortex-M0, the same way the Linux kernel implements atomics on old ARM cores.)

Two, you could relax the constraint that atomic load/store are required to be atomic with respect to cmpxchg. We can add a value to syncscope to represent this. This would allow mixing code that uses load-store atomics with code that requires "real" atomics.

That said, I'm not sure how you use load-store atomics in practice. I mean, I guess you can use Dekker's alogorithm, or some kinds of lock-free queues, but that seems a bit exotic. Or do you use some sort of lock implemented by disabling interrupts?

I think option one is what we want, and it seems like a pretty elegant solution. With a custom target spec one could even set atomic_cas: true, as long as one provides the necessary __sync libcalls.

I am kinda curious about the last question though -- what do people actually use pure load/store atomics for?

@Amanieu
Copy link
Member Author

Amanieu commented Jul 24, 2022

I believe load/store atomics are mainly used for synchronization with an interrupt handler. Somewhat like volatile sig_atomic_t in C, hence my suggestion to lower it to volatile load/store.

@adamgreig
Copy link
Member

I think almost all use cases are indeed synchronisation with interrupt contexts, but then again that's pretty much the only concurrency present in many bare-metal thumbv6-m systems. Usually multi-core Cortex-M0 systems (like RP2040) have to use some hardware-specific primitive for cross-core synchronisation, which is outside the scope of atomics.

Many uses are probably in application firmware which doesn't tend to be as easy to find on GitHub, but there are a couple of examples in widely used libraries:

I expect another significant use case is that atomics provide a safe interior-mutable type for a static, so if you want to coordinate a bool flag or an int or something with an interrupt, you need to use a static, and an Atomic* type lets you easily load/store it without using unsafe or a mutex or anything else. Since all the word-sized operations are atomic anyway you could just do this with a regular static u32, but that would need a static mut or an UnsafeCell etc. This is often in a context where it's not incorrect to only use load/store, for example a timer interrupt incrementing a timestamp which many other contexts read from (only one writer, many readers).

I'll see if I can find any other actual application examples too.

@Lokathor
Copy link
Contributor

If you're looking for examples: on the GBA (ARMv4T / ARM7TDMI) I do exactly what you're describing with UnsafeCell so that the interrupt handler can communicate with the main program (gba_cell.rs). Currently this is technically unsound to mix with non-volatile accesses (though in practice there's never been an issue).

@Lokathor
Copy link
Contributor

Lokathor commented Sep 3, 2022

load/store appears to still not work with ARMv4T, though it should

See #101300

@pnkfelix
Copy link
Member

T-compiler reviewed this as part of 2022 Q3 P-high review

What is current situation here? It seems like there are still unresolved questions about what we want to tell people to do;

but also, I see that @nikic added target changes on the LLVM side for ARM and RISC-V: https://reviews.llvm.org/D130480 and https://reviews.llvm.org/D130621
do we need to be leveraging the +atomic-32 and +forced-atomics features in some fashion within the rustc interface to LLVM in order to make progress here?

@nikic
Copy link
Contributor

nikic commented Oct 28, 2022

We already use +atomic-32 for relevant ARM/Thumb targets.

Once we update to LLVM 16, we can consider using +forced-atomics on RISC-V (it would allow landing the change from #98333).

I think at this point in time, the only thing that can be done here is adding upstream LLVM support for +forced-atomics to more targets, like MSP430 and AVR mentioned above.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 28, 2022

#101300 is still open. The TLDR of that issue is that, at least on some targets, the fake atomic access has extremely bad performance, to the point that it might drive people to use inline assembly instead.

@cr1901
Copy link
Contributor

cr1901 commented Oct 28, 2022

@nikic I have no bandwidth at this time to add LLVM support for +forced-atomics or lowering atomics for MSP430, but I'm not against the idea and neither is the maintainer last I checked.

If this is only simple thing that just drops some bits and lowers atomic stuff to normal loads / stores, then probably it could be fine

Right now, MSP430 doesn't lower atomic operations correctly in LLVM (long story)

This is still true.

@pkubaj
Copy link
Contributor

pkubaj commented Nov 9, 2022

powerpc (32-bit) is also affected.

@pnkfelix
Copy link
Member

Discussed at 2023 Q1 P-high review

My current understanding is that there is no one objecting to adding support to +forced-atomics for the targets that need it, but no one is taking up the task of actually doing so for all such targets.

I'm not clear on who would be best to take ownership of that, but maybe that doesn't matter yet; the first order of business is to collectively agree that is our strategy going forward.

@Amanieu @nikic am I right in inferring that you're both aligned on the plan of leveraging +forced-atomics where we need it, and the main work items here are adding support for it upstream with LLVM? (And I guess maybe there are some efficiency issues with the current support, as flagged e.g. in #101300 ?)

@nikic
Copy link
Contributor

nikic commented Apr 14, 2023

We have +forced-atomics support for RISCV in LLVM 16, so we could give adding that to our target specs (and enabling atomic load/store) a try.

I believe the only real concern is that it makes Rust's atomics ABI-incompatible with C's atomics (unless the C code also uses +forced-atomics), but I think people value having atomics in Rust at all much more than being able to work with atomics across an ABI boundary.

@cr1901
Copy link
Contributor

cr1901 commented Apr 14, 2023

I believe the only real concern is that it makes Rust's atomics ABI-incompatible with C's atomics (unless the C code also uses +forced-atomics)

I've been thinking about this; does anyone have a concrete example of where calling C code using libatomic from Rust code using Rust atomics is unsound/memory-unsafe (single core and multi core case )? Or is it more a "precaution that interop will subtly break"?

I can't visualize an example after thinking about it, because I already can't think of a way to share Rust atomics and use as if they were C _Atomic_s; Rust uses repr(C) on a newtype, and C _Atomics are not guaranteed to be the same size as the underlying type (do gcc and clang accommodate this?).

@asl
Copy link

asl commented Apr 23, 2023

We have +forced-atomics support for RISCV in LLVM 16, so we could give adding that to our target specs (and enabling atomic load/store) a try.

FWIW, I'm planning to find some spare time to address some atomic weirdness on MSP430 in LLVM.

@wesleywiser
Copy link
Member

Visited during the compiler team's P-high review. Based on the most recent discussion points, we believe this can relabeled P-medium as +forced-atomic support has landed in LLVM for all relevant Tier 1 or Tier 2 targets and interoperability of passing pointers to atomic values between C and Rust in this context does not seem to be commonly used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives O-bare-metal Target: Rust without an operating system P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests