Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should atomic loads and failing atomic RMWs be writes for the purpose of the aliasing model and data races? #355

Open
RalfJung opened this issue Aug 5, 2022 · 25 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2022

Specifically, should this code be UB?

use std::sync::atomic::{AtomicI32, Ordering};

fn main() {
    let x = &AtomicI32::new(0);
    let y = x as *const AtomicI32 as *const i32;
    let y = unsafe { &*y };
    x.compare_exchange(1, 2, Ordering::Relaxed, Ordering::Relaxed).unwrap_err();
    let _val = y;
}

Right now Miri accepts this, since the failed RMW is just considered a read by the aliasing model. But maybe it should be considered a write? We almost certainly want to disallow it on read-only memory since that pagefaults on x86, so making it a write also for other concerns seems more consistent.

OTOH, that could mean that a failing RMW that races with a non-atomic read might be considered a data race, and I am not sure if that is the right semantics.

Thanks to @chorman0773 for bringing up the question.

@RalfJung RalfJung changed the title Should atomic RMWs be write for the purpose of the aliasing model and data races? Should failing atomic RMWs be writes for the purpose of the aliasing model and data races? Aug 5, 2022
@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2022

OTOH, that could mean that a failing RMW that races with a non-atomic read might be considered a data race, and I am not sure if that is the right semantics.

Oh and for the weak memory model I think this would be plain wrong; failed RMWs do not participate in the mo order to my knowledge.

@chorman0773
Copy link
Contributor

We almost certainly want to disallow it on read-only memory since that pagefaults on x86, so making it a write also for other concerns seems more consistent.

This was my consideration as well - "Read-only memory" doesn't mean much of anything (that I'm aware of), but allocations that you can only get SRO to, so doing something that is invalid on SRO (or equivalent) is what I saw as necessary.

Oh and for the weak memory model I think this would be plain wrong; failed RMWs do not participate in the mo order to my knowledge.

At least llvm seems to think that lowering an identity rmw to a read is acceptable (as long as it preserves the ordering semantics). Other than the ordering, IDK if I can think of too many ways its observable without a preexisting data race.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2022

At least llvm seems to think that lowering an identity rmw to a read is acceptable

That's just a bug: llvm/llvm-project#56450

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2022

"Read-only memory" doesn't mean much of anything (that I'm aware of)

At least in Miri each allocation has a flag indicating whether it is mutable or not. I don't know yet if we will need this for the spec as well.

@chorman0773

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@chorman0773

This comment was marked as off-topic.

@chorman0773

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2022

It was pointed out that even atomic loads can fail on read-only memory, because they might be implemented via compare_exchange loops. This indicates to me we should have a special case checking, for all atomic accesses, that they occur on writeable memory, but otherwise consider read-only atomic operations (loads and failing RMWs) to be read-only for the purpose of the aliasing model.

@chorman0773
Copy link
Contributor

Right yeah, because of AtomicU64 on i386. I don't think there's a way w/o relying on SSE existing to do just an atomic load of 8 bytes on 32-bit x86, but cmpxchg8b exists.

@comex
Copy link

comex commented Aug 6, 2022

I’ve written C++ code before that performs atomic loads on read-only memory: it’s shared memory which a different process has read-write access to but the current process only has read access to. I know shared memory is a whole topic of its own, with the question of whether ‘atomic volatile’ is needed, but still, I’d argue that atomic loads should only require read access on targets where it’s truly necessary. Similarly to how targets vary in which widths of atomics they support, they can vary in whether they support truly read-only access to atomics.

I can also think of hypothetical use cases for atomic access to read-only memory that don’t involve communication over shared memory. Something like: some objects of a given type are mutable, while others are immutable and come from the program’s read-only data segment. There are a large number of immutable objects, so it’s valuable to keep them in a read-only segment to ensure the kernel can share the memory between multiple instances of the program, saving on RAM. When reading a field from an object of that type, you must use an atomic load in case it’s a mutable object which is being concurrently mutated, but it might also be an immutable object. (On OSes with copy-on-write plus overcommit, you could just put the immutable objects in a read-write segment and rely on the assumption that the pages will never actually be faulted for write - assuming you’re on a target where atomic loads aren’t writes. But other OSes are more pessimistic and either account for memory as if all the copy-on-write pages will be faulted for write (Windows) or don’t support copy-on-write at all (some embedded kernels).)

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2022

I’ve written C++ code before that performs atomic loads on read-only memory: it’s shared memory which a different process has read-write access to but the current process only has read access to.

Ah, interesting. This would then require us to spell out explicitly for which targets we guarantee read-only atomic loads, I think? Having target-dependent UB is not great but I am not sure if it an be avoided here.

@taiki-e
Copy link
Member

taiki-e commented Aug 6, 2022

This would then require us to spell out explicitly for which targets we guarantee read-only atomic loads, I think? Having target-dependent UB is not great but I am not sure if it an be avoided here.

AFAIK1, compare_exchange (CAS or LL/SC) is not used in pointer-width or smaller relaxed atomic loads, so I wonder if we could allow this based on size and ordering rather than specific targets.

Footnotes

  1. ARMv5te, SPARC64, MPS430, AVR, and all targets supported by atomic-maybe-uninit (x86, x86_64, ARM (v6+), AArch64, RISC-V, MIPS32r2, MIPS64r2, PowerPC, and s390x).

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2022

But what about this problem you mentioned before?

"emulate 8-bit/16-bit atomic RMWs using 32-bit atomic operations" to be exact. (Some architectures lack instructions for atomic RMWs that are smaller than the word size (32-bit).

@taiki-e
Copy link
Member

taiki-e commented Aug 6, 2022

But what about this problem you mentioned before?

"emulate 8-bit/16-bit atomic RMWs using 32-bit atomic operations" to be exact. (Some architectures lack instructions for atomic RMWs that are smaller than the word size (32-bit).

AFAIK, in those targets, there are no instructions for 8-bit/16-bit RMW, but there are instructions for 8-bit/16-bit load/store. So, only RMWs are emulated by 32-bit CAS or LL/SC. (e.g., riscv)

@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2022

Oh I see, so that is an example for a successful RMW to also mutate some neighboring bytes, but it doesn't make any read-only operations mutating.

@thomcc
Copy link
Member

thomcc commented Aug 7, 2022

I've definitely done this, in that I've created turned an &AtomicFoo into a &Foo (in order to implement Deref) despite there being a small window where other threads may execute (failed) CASes against the atomic (because they "lost the race").

Concretely, this was done in https://crates.io/crates/lazy_id, which I believed to be sound. It's not super widely used (I do have some code that uses this in interesting ways, but I haven't gotten around to pulling it out of my old game engine and open sourcing it), but I expect other code has similar patterns for the same reason -- ensuring that only one thread tries to perform initialization is less efficient than letting them all knowing only one will succeed the CAS.

At the time it seemed slightly dodgy but was hard for me to imagine how it wasn't allowed. Sadly, this kind of thing isn't that rare, and there's usually[ no real way for to prevent this short of taking a lock. Although for this, I suppose I could just not implement Deref.

(Note that for this, it's obviously mutable memory, so only forbidding it on read-only memory is fine)

EDIT: Oof, the load-only case is much worse. As noted, that breaks several cases with cross-process shared memory, and risks encouraging worse UB on them.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 7, 2022

I've definitely done this, in that I've created turned an &AtomicFoo into a &Foo (in order to implement Deref) despite there being a small window where other threads may execute (failed) CASes against the atomic (because they "lost the race").

Thanks for sharing, so this would be a usecase for disallowing failing CASes on read-only memory but still not considering them a write for the purpose of the aliasing model.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 7, 2022

I’ve written C++ code before that performs atomic loads on read-only memory: it’s shared memory which a different process has read-write access to but the current process only has read access to.

I can also think of hypothetical use cases for atomic access to read-only memory that don’t involve communication over shared memory.

Based on what @taiki-e said here, these schemes only work if all atomic loads are relaxed -- as other, stronger atomic loads might be implemented with a compare-exchange.

@taiki-e
Copy link
Member

taiki-e commented Aug 7, 2022

(Replying to Ralf's comment in rust-lang/miri#2464 (comment))

We can possibly relax this rule later if there is consensus that all targets supported by Rust will support relaxed atomic loads of pointer size and smaller on read-only memory, but how sure are we no targets will violate this in the future?

When natively executing binaries, I guess it is unlikely that there are architectures that break the rule I said in rust-lang/miri#2464 (I will investigate more targets supported by LLVM later).

However, I noticed that rule could be broken when using simulators or similar tools. For example, if we emulate 64-bit architecture's 64-bit atomic load by using 32-bit host's 64-bit atomic load, and host's 64-bit atomic load uses CAS, the rule will be broken. I wonder if this could also occur in wasm64 with threads and atomics enabled...

@RalfJung RalfJung changed the title Should failing atomic RMWs be writes for the purpose of the aliasing model and data races? Should atomic loads and failing atomic RMWs be writes for the purpose of the aliasing model and data races? Nov 1, 2022
@RalfJung
Copy link
Member Author

RalfJung commented Nov 1, 2022

@RalfJung
Copy link
Member Author

We discussed quite a bit whether atomic loads and failing RMWs can work on read-only memory, but I think this question here still remains open:

OTOH, that [making failing RMW a write for data races] could mean that a failing RMW that races with a non-atomic read might be considered a data race, and I am not sure if that is the right semantics.

@chorman0773
Copy link
Contributor

chorman0773 commented Nov 27, 2022 via email

@RalfJung
Copy link
Member Author

It would be possible, sure. Whether that makes any sense though is a different question...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants