[RFC] AtomicPerByte (aka "atomic memcpy") #3301

m-ou-se · 2022-08-14T17:39:35Z

bjorn3 · 2022-08-14T17:57:35Z

ibraheemdev · 2022-08-14T19:50:27Z

This could mention the atomic-maybe-uninit crate in the alternatives section (cc @taiki-e).

5225225 · 2022-08-14T19:56:24Z

With some way for the language to be able to express "this type is valid for any bit pattern", which project safe transmute presumably will provide (and that exists in the ecosystem as bytemuck and zerocopy and probably others), I'm wondering if it would be better to return an AtomicPerByteRead<T>(MaybeUninit<T>) which we/the ecosystem could provide a safe into_inner (returning a T) if T is valid for any bit pattern.

This would also require removing the safe uninit method. But you could always presumably do an AtomicPerByte<MaybeUninit<T>> with no runtime cost to passing MaybeUninit::uninit() to new.

That's extra complexity, but means that with some help from the ecosystem/future stdlib work, this can be used in 100% safe code, if the data is fine with being torn.

Lokathor · 2022-08-14T20:10:23Z

The "uninit" part of MaybeUninit is essentially not a bit pattern though. That's the problem. Even if a value is valid "for all bit patterns", you can't unwrap uninit memory into that type.

not without the fabled and legendary Freeze Intrinsic anyway.

T-Dark0 · 2022-08-14T20:14:29Z

On the other hand, AnyBitPatternOrPointerFragment isn't a type we have, nor really a type we strictly need for this. Assuming tearing can't deinitialize initialized memory, then MaybeUninit would suffice I think?

programmerjake · 2022-08-15T01:01:21Z

note that LLVM already implements this operation:
llvm.memcpy.element.unordered.atomic Intrinsic
with an additional fence operation for acquire/release.

comex · 2022-08-15T02:26:31Z

The trouble with that intrinsic is that unordered is weaker than monotonic aka Relaxed, and it can't easily be upgraded. There's no "relaxed fence" if the ordering you want is Relaxed; and even if the ordering you want is Acquire or Release, combining unordered atomic accesses with fences doesn't produce quite the same result. Fences provide additional guarantees regarding other memory accessed before/after the atomic access, but they don't do anything to restore the missing "single total order" per address of the atomic accesses themselves.

text/3301-atomic-memcpy.md

taiki-e · 2022-08-15T05:30:32Z

text/3301-atomic-memcpy.md

+- In order for this to be efficient, we need an additional intrinsic hooking into
+  special support in LLVM. (Which LLVM needs to have anyway for C++.)


How do you plan to implement this until LLVM implements this?

I don't think it is necessary to explain the implementation details in the RFC, but if we provide an unsound implementation until the as yet unmerged C++ proposal is implemented in LLVM in the future, that seems to be a problem.

(Also, if the language provides the functionality necessary to implement this soundly in Rust, the ecosystem can implement this soundly as well without inline assembly.)

I haven't looked into the details yet of what's possible today with LLVM. There's a few possible outcomes:

We wait until LLVM supports this. (Or contribute it to LLVM.) This feature is delayed until some point in the future when we can rely on an LLVM version that includes it.

Until LLVM supports it, we use a theoretically unsound but known-to-work-today hack like ptr::{read_volatile, write_volatile} combined with a fence. In the standard library we can more easily rely on implementation details of today's compiler.

We use the existing llvm.memcpy.element.unordered.atomic, after figuring out the consequences of the unordered property.

Until LLVM supports appears, we implement it in the library using a loop of AtomicUsize::load()/store()s and a fence, possibly using an efficient inline assembly alternative for some popular architectures.

I'm not fully sure yet which of these are feasible.

text/3301-atomic-memcpy.md

m-ou-se · 2022-08-15T08:12:50Z

The trouble with that intrinsic is that unordered is weaker than monotonic aka Relaxed, and it can't easily be upgraded. There's no "relaxed fence" if the ordering you want is Relaxed; and even if the ordering you want is Acquire or Release, combining unordered atomic accesses with fences doesn't produce quite the same result. Fences provide additional guarantees regarding other memory accessed before/after the atomic access, but they don't do anything to restore the missing "single total order" per address of the atomic accesses themselves.

I'm very familiar with the standard Rust and C++ memory orderings, but I don't know much about llvm's unordered ordering. Could you give an example of unexpected results we might get if we were to implement AtomicPerByte<T>::{read, write} using llvm's unordered primitive and a fence? Thanks!

(It seems monotonic is behaves identically to unordered for loads and stores?)

text/3301-atomic-memcpy.md

ojeda · 2022-08-15T10:48:45Z

text/3301-atomic-memcpy.md

+  but it's easy to accidentally cause undefined behavior by using `load`
+  to make an extra copy of data that shouldn't be copied.
+
+- Naming: `AtomicPerByte`? `TearableAtomic`? `NoDataRace`? `NotQuiteAtomic`?


Given these options and considering what the C++ paper chose, AtomicPerByte sounds OK and has the advantage of having Atomic as a prefix.

AtomicPerByteMaybeUninit or AtomicPerByteManuallyDrop to also resolve the other concern around dropping? Those are terrible names though...

ojeda · 2022-08-15T10:56:59Z

cc @ojeda

Thanks! Cc'ing @wedsonaf since he will like it :)

thomcc · 2022-08-15T17:11:50Z

Unordered is not monotonic (as in, it has no total order across all accesses), so LLVM is free to reorder loads/stores in ways it would not be allowed to with Relaxed (it behaves a lot more like a non-atomic variable in this sense)

In practical terms, in single-thread scenarios it behaves as expected, but when you load an atomic variable with unordered where the previous writer was another thread, you basically have to be prepared for it to hand you back any value previously written by that thread, due to the reordering allowed.

Concretely, I don't know how we'd implement relaxed ordering by fencing without having that fence have a cost on weakly ordered machines (e.g. without implementing it as an overly-strong acquire/release fence).

That said, I think we could add an intrinsic to LLVM that does what we want here. I just don't think it already exists.

(FWIW, another part of the issue is that this stuff is not that well specified, but it's likely described by the "plain" accesses explained in https://www.cs.tau.ac.il/~orilahav/papers/popl17.pdf)

thomcc · 2022-08-15T19:45:35Z

CC @RalfJung who has stronger opinions on Unordered (and is the one who provided that link in the past).

I think we can easily implement this with relaxed in compiler-builtins though, but it should get a new intrinsic, since many platforms can implement it more efficiently.

bjorn3 · 2022-08-15T20:12:07Z

We already have unordered atomic memcpy intrinsics in compiler-builtins. For 1, 2, 4 and 8 byte access sizes.

thomcc · 2022-08-15T20:26:01Z

I'm not sure we'd want unordered, as mentioned above...

thomcc · 2022-08-16T02:25:12Z

To clarify on the difference between relaxed and unordered (in terms of loads and stores), if you have

static ATOM: AtomicU8 = AtomicU8::new(0);
const O: Ordering = ???;

fn thread1() {
    ATOM.store(1, O);
    ATOM.store(2, O);
}

fn thread2() {
    let a = ATOM.load(O);
    let b = ATOM.load(O);
    assert!(a <= b);
}

thread2 will never assert if O is Relaxed, but it could if O is (the hypothetical) Unordered.

In other words, for unordered, it would be legal for 2 to be stored before 1, or for b to be loaded before a. In terms of fences, there's no fence that "upgrades" unordered to relaxed, although I believe (but am not certain) that stronger fences do apply to it.

programmerjake · 2022-08-16T03:16:12Z

something that could work but not be technically correct is:
compiler acquire fence
unordered atomic memcpy
compiler release fence

those fences are no-ops at runtime, but prevent the compiler from reordering the unordered atomics -- assuming your on any modern cpu (except Alpha iirc) it will behave like relaxed atomics because that's what standard load/store instructions do.

thomcc · 2022-08-16T03:19:51Z

Those fences aren't always no-ops at runtime, they actually emit code on several platforms (rust-lang/rust#62256). It's also unclear what can and can't be reordered across compiler fences (rust-lang/unsafe-code-guidelines#347), certainly plain stores can in some cases (this is easy to show happening in godbolt).

Either way, my point has not been that we can't implement this. We absolutely can and it's probably even straightforward. My point is just that I don't really think those existing intrinsics help us do that.

tschuett · 2022-08-18T20:08:11Z

I like MaybeAtomic, but following C++ with AtomicPerByte sounds reasonable.
The LLVM guys started something similar in 2016:
https://reviews.llvm.org/D27133

text/3301-atomic-memcpy.md

RalfJung · 2022-08-20T16:24:57Z

text/3301-atomic-memcpy.md

+        loop {
+            let s1 = self.seq.load(Acquire);
+            let data = read_data(&self.data, Acquire);
+            let s2 = self.seq.load(Relaxed);


There's something very subtle here that I had not appreciated until a few weeks ago: we have to ensure that the load here cannot return an outdated value that would prevent us from noticing a seqnum bump.

The reason this is the case is that if there is a concurrent write, and if any
part of data reads from that write, then we have a release-acquire pair, so then we are guaranteed to see at least the first fetch_add from write, and thus we will definitely see a version conflict. OTOH if the s1 reads-from some second fetch_add in write, then that forms a release-acquire pair, and we will definitely see the full data.

So, all the release/acquire are necessary here. (I know this is not a seqlock tutorial, and @m-ou-se is certainly aware of this, but it still seemed worth pointing out -- many people reading this will not be aware of this.)

(This is related to this comment by @cbeuw.)

Yeah exactly. This is why people are sometimes asking for a "release-load" operation. This second load operation needs to happen "after" the read_data() part, but the usual (incorrect) read_data implementation doesn't involve atomic operations or a memory ordering, so they attempt to solve this issue with a memory ordering on that final load, which isn't possible. The right solution is a memory ordering on the read_data() operation.

Under a reordering based atomic model (as CPUs use), a release load makes sense and works. Release loads don't really work unless they are also RMWs (fetch_add(0)) under the C11 model.

Yeah, the famous seqlock paper discusses "read dont-modify write" operations.

RalfJung · 2022-08-20T16:31:12Z

text/3301-atomic-memcpy.md

+while the second one is basically a memory fence followed by series of `AtomicU8::store`s.
+Except the implementation can be much more efficient.
+The implementation is allowed to load/store the bytes in any order,
+and doesn't have to operate on individual bytes.


The "load/store bytes in any order" part is quite tricky, and I think means that the specification needs to be more complicated to allow for that.

I was originally thinking this would be specified as a series of AtomicU8 load/store with the respective order, no fence involved. That would still allow merging adjacent writes (I think), but it would not allow reordering bytes. I wonder if we could get away with that, or if implementations actually need the ability to reorder.

For a memcpy (meaning the two regions are exclusive) you generally want to copy using increasing address order ("forward") on all hardware I've ever heard of. Even if a forward copy isn't faster (which it often is), it's still the same speed as a reverse copy.

I suspect the "any order is allowed" is just left in as wiggle room for potentially strange situations where somehow a reverse order copy would improve performance.

The "load/store bytes in any order" part is quite tricky, and I think means that the specification needs to be more complicated to allow for that.

A loop of relaxed load/store operations followed/preceded by an acquire/release fence already effectively allows for the relaxed operations to happen in any order, right?

I was originally thinking this would be specified as a series of AtomicU8 load/store with the respective order, no fence involved.

In the C++ paper they are basically as:

for (size_t i = 0; i < count; ++i) { reinterpret_cast<char*>(dest)[i] = atomic_ref<char>(reinterpret_cast<char*>(source)[i]).load(memory_order::relaxed); } atomic_thread_fence(order);

and

atomic_thread_fence(order); for (size_t i = 0; i < count; ++i) { atomic_ref<char>(reinterpret_cast<char*>(dest)[i]).store( reinterpret_cast<char*>(source)[i], memory_order::relaxed); }

A loop of relaxed load/store operations followed/preceded by an acquire/release fence already effectively allows for the relaxed operations to happen in any order, right?

Yes, relaxed loads/stores to different locations can be reordered, so specifying their order is moot under the as-if rule.

In the C++ paper they are basically as:

Hm... but usually fences and accesses are far from equivalent. If we specify them like this, calling code can rely on the presence of these fences. For example changing a 4-byte atomic acquire memcpy to an AtomicU32 acquire load would not be correct (even if we know everything is initialized and aligned etc).

Fence make all preceding/following relaxed accesses potentially induce synchronization, whereas release/acquire accesses only do that for that particular access.

RalfJung · 2022-08-20T16:39:57Z

CC @RalfJung who has stronger opinions on Unordered (and is the one who provided that link in the past).

Yeah, I don't think we should expose Unordered to users in any way until we are ready and willing to have our own concurrency memory model separate from that of C++ (or until C++ has something like unordered, and it's been shown to also make sense formally). There are some formal memory models with "plain" memory accesses, which are similar to unordered (no total mo order but race conditions allowed), but I have no idea if those are an accurate model of LLVM's unordered accesses. Both serve the same goal though, so there's a high chance they are at least related: both aim to model Java's regular memory accesses.

We already have unordered atomic memcpy intrinsics in compiler-builtins. For 1, 2, 4 and 8 byte access sizes.

Well I sure hope we're not using them in any way that actually becomes observable in program behavior, as that would be unsound.

m-ou-se · 2024-07-03T13:50:51Z

The only thing left I'm still struggling with is the signature of the store method(s):

      pub fn store(&self, value: MaybeUninit<T>, ordering: Ordering);
// or
      pub fn store(&self, value: &MaybeUninit<T>, ordering: Ordering);
// or
      pub fn store(&self, value: T, ordering: Ordering);
// or
      pub fn store(&self, value: T, ordering: Ordering) where T: Copy;
// or
      pub fn store(&self, value: &T, ordering: Ordering);
// or
      pub fn store(&self, value: &T, ordering: Ordering) where T: Copy;

Or a combination of these (store and store_from).

Taking by value fits the most basic use case, but consuming the value can be annoying if you need to attempt a store multiple times. However, taking by reference can get weird for non-Copy/needs-drop types. Wrapping it in a MaybeUninit makes the Drop situation clearer, but taking that by reference can be annoying if you have a &T and need a &MaybeUninit<T> here. :/

Amanieu · 2024-07-12T20:50:42Z

Taking by value fits the most basic use case, but consuming the value can be annoying if you need to attempt a store multiple times.

I don't see a use case where you wouldn't have some sort of external synchronization that guarantees only 1 write is happening at a time, so this is less of a concern.

I think that store operations should only accept T instead of MaybeUninit<T>. In almost all use cases, users will want to store complete values.

Therefore I would recommend:

Removing load_from and store_to: these are redundant with load and store.
Changing store_from_slice to take &[T] instead of &[MaybeUninit<T>]. Most users will want the &[T] version and there are functions to convert between initialized and uninitialized slices.'
Removing the From<MaybeUninit<T>> impl. This is surprising and should be handled through an explicit constructor instead.

joshlf · 2024-09-30T14:03:50Z

@Amanieu pointed out that this could be used to solve the problem of how to express shared memory in a way that is sound. (In particular, memory which is shared with external processes outside of Rust's visibility - separate OS processes, user land (if implementing a kernel), a kernel (if implementing a hypervisor), etc.). I'll refer to this as the "IPC use case" for brevity.

I think it'd be good to mention IPC explicitly in this RFC. It differs a bit from the existing use cases which are described. In particular, the existing use case assumes that you're trying to synchronize with other threads which are known to the Rust abstract machine. As a consequence, it needs to uphold whole-program correctness - the code emitted for a particular thread is tasked with not exhibiting UB on that thread but also not causing UB to be exhibited in other threads of the same process. It can also take advantage of the fact that the other threads are doing the same thing.

By contrast, the IPC use case is only concerned with preventing UB in a single thread, but it must do so in the face of arbitrary memory writes by the other process, possibly malicious ones.

It may be the case, as @Amanieu suggests, that the existing solution in this RFC is already sufficient. But IMO it'd be good to explicitly articulate that since it's not obvious (at least to me) that any solution to the stated problem would also be a solution the IPC problem.

Thanks for this RFC, btw! The shared memory use case is a thorn we've had in Fuchsia for years. It would be awesome to have a solution to it.

GoldsteinE · 2024-09-30T14:09:14Z

Isn’t it overkill for IPC? You just need to synchronize amount of bytes you’re going to read/write and then plain volatile operations should be enough, because you already did the synchronization.

joshlf · 2024-09-30T15:23:09Z

Isn’t it overkill for IPC? You just need to synchronize amount of bytes you’re going to read/write and then plain volatile operations should be enough, because you already did the synchronization.

It's more complicated than that unfortunately. Volatile is both too much and not enough.

It's too much because it forces the compiler to emit read and write instructions for every memory read and write, which prevents it from doing various optimizations such as coalescing reads, eliding writes that are performed multiple times, etc. That's not actually necessary for IPC. All you care about from a correctness standpoint is that all of your reads happen after any memory fence and that all of your writes happen before any memory fence (consistent w/ where those fences appear in source code).

It's not enough because volatile still assumes that no other threads are concurrently reading/writing the same memory. Using volatile to perform concurrent modifications to the same memory is UB (see the write_volatile docs). Now, if your IPC partner is behaving nicely, this isn't a problem since they'll only be writing to the memory before a memory fence, and then not touching it afterwards once you're reading from it. But if you don't trust your IPC partner (it's considered part of another security domain, it's a user space program and you're the kernel, etc), then you have to assume that your IPC partner might be writing to the memory concurrently. Thus, for your code to be sound in the face of arbitrary, possibly malicious behavior, it has to be sound in the face of arbitrary, concurrent memory writes.

DemiMarie · 2024-09-30T18:19:40Z

How will this interact with transmutes? What is the proper way to access the field of a struct? One could automatically generate accessor functions I guess.

RalfJung · 2024-09-30T18:36:28Z

"UB in a single thread" is unfortunately not really a thing -- UB is a property of the entire execution. So it's not entirely clear what exactly we could say here. Furthermore, it is already the case that it is generally okay to use atomics to communicate with non-Rust threads, this RFC "just" makes atomics more powerful, so atomic memcpy itself isn't really IPC-specific at all.

joshlf · 2024-09-30T21:48:21Z

"UB in a single thread" is unfortunately not really a thing -- UB is a property of the entire execution. So it's not entirely clear what exactly we could say here. Furthermore, it is already the case that it is generally okay to use atomics to communicate with non-Rust threads...

How do we square the circle here? I'm imagining it's one of the following two options:

We don't have any way of formally modeling (in the AM) what happens in shared-memory IPC, but we know that the compiler doesn't do anything silly in practice, so we consider it to be fine
We know how to formally model shared-memory IPC, but we don't do it by talking about "UB in a single thread"

The reason I ask is that my naive understanding of your comment is just that this isn't something we could in principle model, but I assume there's more to it than that.

programmerjake · 2024-09-30T22:24:02Z

could we model it as where all processes at the other end of the IPC act as if they're a set of threads where every memory op is a relaxed atomic or a fence? I think this should work since we know on all reasonable ISAs that every load/store op acts as at least a set of relaxed atomics. that way, it's only UB in our process if we do something on our threads that would cause UB (e.g. non-atomic read that races with the other process's writes).

This way it isn't UB in our process even if another process has UB, if we code defensively by using atomics.

Diggsey · 2024-09-30T23:09:00Z

We know how to formally model shared-memory IPC, but we don't do it by talking about "UB in a single thread"

There's nothing special about memory shared between processes - I mean on linux threads are literally processes. Anything that can be used to synchronize between threads can also be used to synchronize between processes.

edit: there is a difference if you need your IPC communication method to be a security boundary - in that case you also need to consider that the other process may do anything to that memory at any time. If all you're doing is reading using AtomicPerByte then that should never cause UB in your process though.

joshlf · 2024-09-30T23:15:10Z

edit: there is a difference if you need your IPC communication method to be a security boundary - in that case you also need to consider that the other process may do anything to that memory at any time. If all you're doing is reading using AtomicPerByte then that should never cause UB in your process though.

Yeah, this is the use case - treating shared-memory IPC as a security boundary. IIUC that's what @RalfJung was responding to by saying that it's not well-defined since UB is a whole-program property. (I'm sure Ralf will push back and clarify that "not well-defined" is not an accurate characterization of what he said, but I've learned not to try to capture the subtleties 😛 )

Diggsey · 2024-09-30T23:53:41Z

When reasoning about security, I would argue there are other ways to prove that the boundary is solid:

If you can prove that for any malicious program M, there is a non-malicious program N that does the exact same shared memory operations at the hardware level, and that your program has no UB when combined with N, then you don't even need to consider M because they are literally the same program. For x86 this seems plausible since relaxed atomics generally compile down to no additional synchronization.

ie. the step from abstract machine to actual hardware is not injective, and if two different programs in the abstract machine translate to the same actual program, then it's clearly indistinguishable which program was used because they are identical.

DemiMarie · 2024-10-01T02:31:10Z

@Diggsey: Rust cannot be fully specified in terms of an abstract machine. Rust is a systems programming language, and that means that it must also make guarantees about how the abstract machine is implemented in terms of the concrete machine that the hardware and OS actually implement.

RalfJung · 2024-10-01T09:12:02Z

Can we please move shared-memory IPC with atomics off of this RFC thread? As I said, this isn't specific to atomic-per-byte memcpy at all, so this is really the wrong place for that discussion.

VorpalBlade · 2024-12-14T13:36:20Z

What is the current state / blocker for progress on this RFC? Today I ran across another case where I needed this. It doesn't seem like much happened since end of summer.

m-ou-se · 2024-12-16T11:13:54Z

@VorpalBlade This is still stuck on the details of the API. See #3301 (comment)

VorpalBlade · 2024-12-16T12:21:16Z

Why not have multiple store methods (perhaps not all 6, but enough to cover the use cases)? They could dispatch to the same underlying intrinsic internally.

It isn't like rust doesn't already do this in the standard library: foo, foo_mut, unchecked_foo etc. Though perhaps coming up with suitable names will be just as difficult.

m-ou-se · 2024-12-16T14:13:32Z

Because that would just result in confusion and unexpected behaviour. E.g. it's unclear what reasonable behaviour would be for types that need to be dropped.

programmerjake · 2024-12-16T18:02:11Z

what if the only option was:

pub fn store(&self, value: &MaybeUninit<T>, ordering: Ordering);

and to make storing a copy more ergonomic, MaybeUninit gains:

impl<T> MaybeUninit<T> { // maybe have ?Sized bound? icr if that works with unions
    pub const fn from_ref(v: &T) -> &Self {
        // Safety: &Self can't be written to, so this works
        unsafe { &*(v as *const T as *const Self) }
    }
}

that way if you want to store a copy of some type, you just use: a.store(MaybeUninit::from_ref(&my_value), Ordering::Relaxed)
and my_value will still be dropped later. or you can just use a reference if that's all you have access to.

and if you want my_value to not be dropped, just write:
a.store(&MaybeUninit::new(my_value), Ordering::Relaxed)

do remember that atomic memcpy is not terribly common so being a bit more verbose is fine.

DemiMarie · 2024-12-16T21:00:36Z

What about only providing the intrinsic as an unsafe raw pointer operation, and letting users write their own higher-level wrappers?

arielb1 · 2024-12-17T16:40:20Z

What about only providing the intrinsic as an unsafe raw pointer operation, and letting users write their own higher-level wrappers?

The intrinsic seems more fundamental to me than the API around it.

Add atomic memcpy RFC.

d5393a7

m-ou-se added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Aug 14, 2022

Add number in atomic memcpy rfc.

e864d8d

m-ou-se mentioned this pull request Aug 14, 2022

What about: seqlocks, load-release/store-acquire? rust-lang/unsafe-code-guidelines#323

Open

taiki-e reviewed Aug 15, 2022

View reviewed changes

Fix typo.

d12abe9

ojeda reviewed Aug 15, 2022

View reviewed changes

m-ou-se added 2 commits August 15, 2022 13:18

Fix types of C++ API.

eb68c3a

Better wording.

e802133

cbeuw reviewed Aug 19, 2022

View reviewed changes

text/3301-atomic-memcpy.md Show resolved Hide resolved

RalfJung reviewed Aug 20, 2022

View reviewed changes

Add note on uninitialized state.

af86156

Update.

520ab88

m-ou-se removed the I-libs-api-nominated Indicates that an issue has been nominated for prioritizing at the next libs-api team meeting. label Jul 16, 2024

Amanieu mentioned this pull request Sep 30, 2024

What about: volatile, concurrency, and interaction with untrusted threads rust-lang/unsafe-code-guidelines#152

Open

RalfJung mentioned this pull request Oct 8, 2024

How can I tell the Rust compiler &mut [u8] has changed after a DMA operation rust-lang/unsafe-code-guidelines#537

Open

RalfJung mentioned this pull request Jan 8, 2025

Can we have VolatileCell rust-lang/unsafe-code-guidelines#411

Open

ojeda mentioned this pull request Mar 4, 2025

Rust wanted features Rust-for-Linux/linux#354

Open

40 tasks

This was referenced Mar 12, 2025

atomic_load_unordered and atomic_store_unordered should not be used rust-lang/compiler-builtins#788

Open

remove element_unordered_atomic intrinsics rust-lang/compiler-builtins#789

Merged

		- In order for this to be efficient, we need an additional intrinsic hooking into
		special support in LLVM. (Which LLVM needs to have anyway for C++.)

[RFC] AtomicPerByte (aka "atomic memcpy") #3301

Are you sure you want to change the base?

[RFC] AtomicPerByte (aka "atomic memcpy") #3301

Conversation

m-ou-se commented Aug 14, 2022 • edited Loading

bjorn3 commented Aug 14, 2022

ibraheemdev commented Aug 14, 2022 • edited Loading

5225225 commented Aug 14, 2022 • edited Loading

Lokathor commented Aug 14, 2022

T-Dark0 commented Aug 14, 2022

programmerjake commented Aug 15, 2022

comex commented Aug 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-ou-se commented Aug 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ojeda commented Aug 15, 2022

thomcc commented Aug 15, 2022 • edited Loading

thomcc commented Aug 15, 2022

bjorn3 commented Aug 15, 2022

thomcc commented Aug 15, 2022

thomcc commented Aug 16, 2022

programmerjake commented Aug 16, 2022

thomcc commented Aug 16, 2022

tschuett commented Aug 18, 2022

RalfJung Aug 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibraheemdev Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Aug 20, 2022 • edited Loading

m-ou-se commented Jul 3, 2024 • edited Loading

Amanieu commented Jul 12, 2024

joshlf commented Sep 30, 2024

GoldsteinE commented Sep 30, 2024 • edited Loading

joshlf commented Sep 30, 2024

DemiMarie commented Sep 30, 2024

RalfJung commented Sep 30, 2024

joshlf commented Sep 30, 2024

programmerjake commented Sep 30, 2024 • edited Loading

Diggsey commented Sep 30, 2024 • edited Loading

joshlf commented Sep 30, 2024

Diggsey commented Sep 30, 2024 • edited Loading

DemiMarie commented Oct 1, 2024

RalfJung commented Oct 1, 2024 • edited Loading

VorpalBlade commented Dec 14, 2024

m-ou-se commented Dec 16, 2024

VorpalBlade commented Dec 16, 2024

m-ou-se commented Dec 16, 2024

programmerjake commented Dec 16, 2024

DemiMarie commented Dec 16, 2024

arielb1 commented Dec 17, 2024

m-ou-se commented Aug 14, 2022 •

edited

Loading

ibraheemdev commented Aug 14, 2022 •

edited

Loading

5225225 commented Aug 14, 2022 •

edited

Loading

m-ou-se commented Aug 15, 2022 •

edited

Loading

thomcc commented Aug 15, 2022 •

edited

Loading

RalfJung Aug 20, 2022 •

edited

Loading

ibraheemdev Aug 23, 2022 •

edited

Loading

RalfJung commented Aug 20, 2022 •

edited

Loading

m-ou-se commented Jul 3, 2024 •

edited

Loading

GoldsteinE commented Sep 30, 2024 •

edited

Loading

programmerjake commented Sep 30, 2024 •

edited

Loading

Diggsey commented Sep 30, 2024 •

edited

Loading

Diggsey commented Sep 30, 2024 •

edited

Loading

RalfJung commented Oct 1, 2024 •

edited

Loading