-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
## Pre-Pre-RFC: core::arch::{load, store}
and stricter volatile semantics
#321
Comments
I like this, except:
This seems like it should be a trait bound.
This part is a breaking change and doesn't seem well motivated to me. For writes: Writing padding bits is potentially a security concern due to the potential to leak memory contents, but it doesn't seem inherently unsound; any undefined bits should just be implicitly frozen to an arbitrary value. As for unspecified layout, if by that you mean things like For reads: Just because volatile is well-suited for dealing with untrusted or potentially-corrupted memory doesn't mean that's the only possible use case. You may happen to know for whatever reason that the load will return a valid value. Perhaps you're reading it from an MMIO register; perhaps you're abusing volatile to implement atomics (bad idea but, in the right circumstances, not unsound); perhaps the load doesn't have to be volatile but is anyway due to some generics madness. All of these cases seem dubious enough to be worth a lint, but I'm skeptical that they should be hard errors even with the new functions, let alone the existing already-stable functions. |
Agreed, lint added.
I generally assume that MMIO devices are not automatically trustworthy, but your point stands. |
That can't be done without forcing deoptimization of any program that may call this. To prevent deoptimization it would be better to say that it can access any memory which an opaque function that gets passed the pointer as argument may access. That would for example not include stack variables which don't have their address taken. |
Is this also the semantics of using inline assembly? The goal is that volatile operations will always operate on whatever happens to be at that memory address; the compiler can’t just say “I know this volatile load or store will have undefined behavior if X” and optimize accordingly. The situation you are referring to is supposed to be covered by, “At the same time, a call to load or store does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.” The reason for these seemingly contradictory requirements is that I want volatile memory access to be usable for in-process debuggers and crash dumpers. These programs need to be able to access whatever happens to be at an arbitrary caller-provided memory location and know the compiler will not try to outsmart them. This is also why using these functions to dereference null or dangling pointers is explicitly permitted. Just because you can use these functions to read from a piece of memory does not mean that Rust makes any guarantees whatsoever about what you will find there, or that your attempt won’t cause a hardware fault of some sort. Similarly, just because you can use these functions to write to a piece of memory does not mean that Rust makes any guarantees as to what impact that will have on other code. If you misuse them and something breaks, you get to keep both pieces. |
@bjorn3: do you have suggestions for better phrasing here? The intent is that you can use volatile memory access to peek and poke at whatever memory you want, but the consequences of doing so are entirely your responsibility. For instance, I might need to test that my program’s crash handler triggers successfully when I dereference a null pointer, or that a hardened memory allocator detects modification of freed memory and properly aborts. |
I believe so.
It has to for any optimization to be possible.
Didn't see that sentence. I agree that covers my situation.
Those things are UB either way. Individual compilers just do a best effort at trying to make them work the way a user expects them to work when optimizations are disabled. When optimizations are enabled it is even more in a best effort basis. For example it may not be possible to change function parameters if the compiler found that a function argument is constant and optimized accordingly. |
Good to know!
To elaborate, what I am not okay with is the compiler optimizing out the entire basic block as unreachable code. Compilers have a nasty habit of doing that, so I wanted to be absolutely clear that is not permitted.
Thank you.
That behavior is perfectly acceptable (though ideally it would be reflected in the debug info). I wonder if our definitions of UB are slightly different. To mean, an execution with UB has no meaning at all, and the compiler is allowed to prune any basic block that invokes UB. |
If the code is unreachable, then of course the compiler is permitted to optimize it away. The compiler is allowed to assume that no UB occurs within the program when determining what is reachable (and for all other optimizations). This is true for all code: there's no special case for atomics. |
According to the model mentioned above (a volatile memory access is an opaque function call or inline assembler), the compiler does not actually know that |
Right, but whether or not I'd also question whether it makes sense to be quite so lenient with |
That’s fine.
I decided to err on the side of the simplest possible semantics. |
Thanks for writing this up! I was confused what the difference to
This sounds contradictory: a byte array (
"loads via store" seems odd; I assume "loads and stores" is what you meant? I think this is too weak. In particular, I think the compiler should be able to assume that the given address is the only memory (that the compiler knows about) that this operation can access, and that it will not have other side-effects (like synchronization) either. So for example if This also makes these operations quite different from inline assembly, which is allowed to do much more. With inline assembly, the intention is that even swapping out the assembly block by some other code at runtime (that still adheres to the same clobber annotations) is allowed. I don't think we want that for
What is an "interlocked" load/store?
I am very strongly opposed to using
"Otherwise" here sounds like "in case T cannot be safely transmuted as described", but I doubt that is what you mean. "non-atomic" is a bad choice here, since assembly languages don't even distinguish atomic and non-atomic accesses -- that is a surface language thing. And the entire point of your proposal is that these accesses are atomic, in the sense of not introducing data races. But this reminds me, the interactions with the concurrency model need to be specified better I think. If there are overlapping concurrent non-atomic writes to the same location, that is UB. If there are non-atomic reads concurrent to an overlapping
What is wrong with using LLVM volatile atomic accesses? |
Ralf I'm 99% certain that "always valid" is meant to be read as "all initialized bit patterns are valid", not that it's also allowing uninit memory. |
That still leaves provenance as a concern -- transmuting pointers with provenance to integers is subtle at best, making (arrays of) integer types like |
I'm pretty sure that everything you just said also applies if you "raw" read/write a pointer with memory, right? Because provenance isn't a physical concept that's actually stored in a device. So if you raw read a pointer from memory you already don't know the provenance, regardless of the array of u8 part or not. It would have to be treated as an int read with an automatic int2ptr cast, or something like that. |
Sure (except on CHERI ;). let mut loc = &mut 0i32;
let ptr = arch::load(&loc as *const _ as *const *mut i32);
let _val = *ptr; // UB, since this ptr does not have the right provenance to access `loc` |
Hm. What are the provenance rules if |
Making |
@andyhhp what are your thoughts? What semantics do you want for volatile loads and stores? |
You’re welcome!
Yes and no. Volatile in C/C++/LLVM does have unclear semantics with respect to concurrency, but it also has unclear semantics in other ways as well, which means that those writing low-level code might find themselves resorting to
In my mind,
Yes
I’m not sure about that. @andyhhp @marmarek @HW42 thoughts?
That is a fair point. The reason I mentioned
That’s valid, though most use-cases I can think of will at very least want a newtype. What matters here is that I need to be able to write: fn peek_u32(s: string_from_debug_prompt) -> Result<u32, std::num::TryFromIntError> {
let s: usize = s.parse()?;
Ok(core::arch::load(s as *const u32))
} and not have the compiler second-guess me. I know that converting some user-provided string to a usize, casting it to a pointer, and then dereferencing the pointer is almost always wrong (and doesn’t work on CHERI), but kernel debuggers actually need to do it, and the compiler shouldn’t be breaking them.
Yeah, I wrote this when I was quite tired, so there are definitely some thinkos.
Fair point. What I meant is that the compiler does not need to insert any memory barriers.
You can certainly use it as a “release read”; I am not sure about the others. My intuition is that a volatile load should imply a compile-time memory barrier, but I will leave that to the people who actually use them in anger.
Nothing, other than that I don’t known what the semantics of LLVM volatile atomic accesses are 🙂. More generally, the main uses I can think of for volatile operations are in machine-dependent code, which is why my proposal defers to hardware semantics in so many places. If I am manipulating page tables in a kernel or hypervisor, or accessing MMIO registers in a driver, I need to know that
Welcome to the messy world of interacting with hardware 😄.
Agreed.
The problem with LLVM volatile atomic accesses is that their semantics are not known to be what I would want for the kind of low-level applications I am envisioning. In particular, my understanding is that their semantics are architecture-independent, whereas I believe lots of use-cases explicitly want semantics that are architecture-dependent. Some synchronization primitives, for instance, explicitly rely on the fact that modern hardware (DEC Alpha is long dead) does not reorder dependent loads even if there is no hardware memory barrier. |
Which other ways are you referring to here? That seems worth discussing in depth in the RFC.
We have to describe its effect in the Abstract Machine though.
I think "where you get the pointer from" and having volatile (for lack of a better term) semantics on the access are entirely orthogonal concerns. So I don't think this should affect the type signature of these operations. Integers have no provenance, so whether you parsed that integer from a string or got it via a ptr-to-int cast makes no difference, even with regular Rust memory accesses.
"release read"? In C11, release is for writes and acquire for reads, so a "release read" doesn't make a ton of sense to me. Also after what you said earlier about "non-interlocked" semantics, I would strongly expect that these operations do not have any synchronizing side-effect (e.g., the compiler can treat them as (Interestingly, I just noticed LLVM LangRef says
My thinking was that we want to give an architecture-independent description that is enough to tell us what the compiler can and cannot do. The exact effect of course is architecture-dependent, but unlike I fully agree re: accessing NULL and shenanigans like that. I am pretty sure that volatile atomic accesses are guaranteed not to tear (that is AFAIK generally true for atomic accesses). I think people want LLVM volatile (atomic) accesses to be what you describe here, so IMO it would make sense to use them and then work with LLVM in case things don't work out. Either way, the Rust semantics should be described independently of LLVM, so we can change the implementation strategy as needed. But given that these things are so hard to describe, drawing comparisons with things that are at least somewhat established makes sense IMO. |
Your example about synchronization patterns worries me quite a bit; I am not sure we want to encourage using this kind of operation as a replacement for our regular atomics. Relying on "dependent loads" establishing some sort of synchronization seems like a problem to me, and not least because the entire notion of a "dependency" is on very shaky grounds in a surface language; there's a reason For example, it is generally a permissible transformation for a compiler to turn let i = make_i();
work_on_i(i); into let i = make_i();
if i == 0 {
work_on_i(0);
} else {
work_on_i(i);
} However, if In that sense, I don't think your idea of an architecture-dependent semantics works (or maybe I understood what you meant by it). The only way to exploit this kind of "dependent load" pattern is to have both loads in the same |
Thanks. Good to know that we agree on that.
It’s actually more subtle: volatile accesses should be guaranteed not to tear if the argument is properly aligned, but if the argument is not aligned, the behavior is hardware-dependent. On x86, accesses that cross a cache line may tear, while on some platforms unaligned access actually faults. Volatile accesses are a raw interface and should not try to hide this behavior.
That is a good idea, but I would like to make sure that the bugs are worked out before this gets pushed.
The problem is that Linux RCU has a hard dependency on
Release loads are needed for seqlocks, and the alternatives available in Rust have such poor performance that in practice people wind up using an approach that is technically UB. Rust doesn’t have the |
It's a bit off topic but I got curious about what a "release load" is and whether we need one, since like Ralf this does not make any sense to me as a concept. Here's a simplified version of the code from the blog post focusing on order of operations: fn read() {
let seq1 = seq.load(Acquire); // R1
let result = data.read_volatile(); // R2
let seq2 = seq.load(Release); // R3
// do stuff
}
static seq: AtomicUsize = _; // W0
static data: UnsafeCell<T> = _; // W0
fn write() {
seq.compare_and_swap(_, _, Acquire); // W1
data.write_volatile(_); // W2
seq.store(_, Release); // W3
} It is claimed that we want operation R3 to be a release load, because we want to ensure that R2 is not moved after R3. This makes some sense from the perspective of "roach motel" reorderings, but from the "message passing" interpretation a release load doesn't make sense since load operations can't broadcast information to synchronize on. To break it down, the litmus test situation showing why it would be bad to reorder R2 and R3 is where R2 reads the result of W2, and R3 reads the result prior to W1, leading to a disallowed cycle since What is needed to make this work is to add a read-read fence between R2 and R3. This effectively upgrades the prior |
This should (hopefully!) be enough for seqlocks, but I am not sure if it is enough for RCU. |
I just made an edit that should hopefully clean up some stuff. I should probably make an actual RFC out of this. |
You may be fully aware of the following, but I'll say it anyway, since I've seen suggestions floating around before that RCU and seqlocks are uniquely problematic in Rust. Today, rustc compiles atomics, volatile accesses (the existing ones), and inline assembly to the same LLVM IR as Clang does their C counterparts, at which point LLVM's optimizer applies the same optimizations to it. rustc does additionally have some MIR optimizations which are specific to Rust, but they're pretty minimal and unlikely to cause a problem here. So when it comes to relying on data dependencies being preserved, the Linux kernel's "wing it" approach should work equally well in Rust as in C. Which is to say: it does work most of the time in practice, and most examples where it breaks are somewhat pathological/unrealistic code. But on the other hand, it's hard to rule out that there might be some real code where it breaks, and LLVM and GCC's optimizers will only get smarter in the future. We shouldn't encourage typical Rust users to rely on it, but we also should make sure the Linux project doesn't see it as a reason to avoid adopting Rust. This RFC's choice to use inline assembly rather than LLVM volatile accesses doesn't help: as Ralf showed, the main problematic optimization for data dependencies can happen regardless of what construct is used for the store to memory. I support this RFC's choice to use inline assembly, because LLVM's volatile semantics are unclear and might hypothetically be weaker than desired. But in practice I would be shocked if inline assembly blocked any optimizations that LLVM volatile doesn't, except by accident (e.g. due to a bug in LLVM volatile, or due to LLVM being overly conservative with inline assembly in a way that isn't related to the guarantees needed here). Though note: LLVM volatile doesn't have an implicit compiler barrier, so if we decide to add one to this API, the proper comparison would be to LLVM volatile combined with an explicit compiler barrier. |
I agree, and I also think we should work on a solution for both Rust and C that doesn’t have undefined behavior.
👍
@andyhhp @marmarek: would a compiler barrier be desired here? |
That is certainly not the case in general as volatile accesses can be arbitrarily large. I doubt my
Is doing this with inline assembly an option (in the way I described above, where the syntactic dependency is all within the same asm block)? If not, then we'd still be no worse off than C if we tell people to use I literally can't make sense of it -- I think of release/acquire in terms of models like this, and "release load" just doesn't make sense as a concept. 'Release' means that the write event generated by this operation captures the 'view' of the current thread such that when that write event is later observed by an 'acquire' read event, the attached 'view' is integrated into the 'view' of the thread that did the 'acquire'. A load does not generate a write event, so saying it is 'release' is a category error. The blog post thinks of atomic operations as being define in terms of reorderings, but that is not how they actually work -- defining an operation in terms of its allowable optimization is a sure path to madness. The main issue with seqlocks (as far as I am aware) is the data read, which people want to be non-atomic but that is not allowed under C11 because it's a data race. Using volatile reads is a gross and non-compliant hack to work around that. The proper solution IMO is to take a hint from the LLVM memory model (which is different from the C++ memory model) and say that a racy read returns (After re-reading this old classic): okay I think I see the problem, but that paper also discusses solutions -- and a "release load" is not one of them. ;) It also mentions a fence-based solution that should work in Rust as well? But anyway -- since "release read" doesn't make sense in the memory model, I don't think So I think we should keep "gaps in the concurrency memory model" entirely out of this thread, since those issues fundamentally cannot be solved by just adding a few new opaque functions in IOW, please let's not mix up the following two mostly unrelated concerns:
|
I was thinking “pages marked read-only in the page tables” (which will indeed SIGSEGV), but you are correct regarding actual ROM.
That is correct.
Pretty much! I wanted something that people writing drivers and other very low-level code could count on to work, without having to constantly worry if what they were doing was UB.
Not really. An FFI call could do the same thing via e.g. inline assembler. |
From an Abstract Machine perspective, it's true that there are many cases where any UB caused by a store must be immediate rather than delayed until a later load. After all, a load that, at the assembly level, occurs later than the wild write may only be that way due to reordering: at the Abstract Machine level it may have occurred earlier. That's why popping a protector has to be immediate UB: the compiler, when compiling the function that made the call that pushed the protector, may have reordered loads across that call. And this reasoning applies regardless of whether the store is volatile or has any other special marking, because when the compiler reorders accesses across calls, it can't know whether the function being called contains any 'special' operations. On the other hand, not all memory accesses that exist at a low level exist in the Abstract Machine. When the kernel pages some memory out to disk, it reads the memory as part of writing it to a file somewhere; later on, it reads the file and writes the memory (or rather, it writes some arbitrary newly-allocated piece of memory that's mapped at the same virtual address). But the Abstract Machine doesn't care. The kernel provides an abstraction of memory to userland, satisfying basic properties like "loads return the values written by prior stores". Meanwhile, the Rust compiler takes source code targeting the Abstract Machine and turns it into machine code targeting that lower-level abstraction. Anything the kernel does to implement that abstraction doesn't need to be mapped back up to the Abstract Machine. That logic applies to the kernel. It also applies to something like CRIU that exists in userland – even running in the same process as the Rust code – but performs a job similar to what a kernel traditionally does. The question is: can that logic ever apply to volatile accesses performed by the program itself? Can we justify an 'illegal' volatile store to some memory by saying: we'll put the original value back before the program could perform any loads of that memory – and I don't mean Abstract Machine loads, but lower-level loads – therefore the store is unobservable, therefore from an Abstract Machine perspective we can pretend the stores don't exist? To me, this sounds reasonable in theory, but difficult to guarantee in the face of all changes that might hypothetically be made to the implementation. In fact, there are situations where it doesn't work today. Both the kernel and CRIU can suspend the entire program while mucking with its memory, but that's not an option if the program itself mucks with memory. @DemiMarie mentioned "suspend[ing] all other threads in the program" (besides the one mucking with memory), but that can be problematic. If you take a lock, you might deadlock on a lock that's held by one of those suspended threads. So while the other threads are suspended, you must not call any code that might take a lock, including the memory allocator. What if you never touch the standard library? Well, the compiler sometimes generates calls to runtime functions to implement certain operations, and the dynamic linker can interpose its own runtime code into calls within the program (this would normally happen when calling a function implemented in a shared library, as part of lazy symbol binding). That runtime code can take locks. Lazy symbol binding normally takes a lock, and while other runtime calls usually don't, there are probably obscure situations where they can do so (say, when using sanitizers or segmented stacks). What if you manage to get past any potential deadlock issues (or if there are no other threads in the first place)? Well, you're still assuming that the implementation won't insert some access to the target memory into the middle of your sequence of volatile operations. Why would that happen? It might be an access from earlier or later that was reordered, but such reordering can be prevented with barriers. However, I can think of more exotic cases. Suppose some stack variable is supposed to be live but unaccessed in a particular range of code and also has a known constant value. The compiler could hypothetically decide to reuse the memory for something else, then put the constant value back later. (It's not clear whether the compiler can legally reuse memory for another stack variable whose address is taken, but it can certainly do it if the address isn't taken, or for data that isn't part of the Abstract Machine.) |
I think that makes sense: we can do a bunch of things at the low level and then "resume" the correspondence to the abstract machine and it doesn't have to know that anything untoward happened. Except... this is rust code doing the low level code itself! The compiler is handling everything in sight around the On the whole, while I can see some use cases for it I worry that this tool is just too dangerous to use in most cases. It is extremely easy to cause UB with it, and Miri can't help because it can't model the hardware to say whether the thing you did makes sense. |
Indeed so. At a minimum one would likely need
Miri should pretend that these are normal stores and loads. |
This means the store is potentially diverging (as in, it might not return), which puts tight limits on reordering code around it.
Definitely. Like all inline assembly, this can only do things to the AM state that could also have been done by regular Rust code. I feel like you essentially just want to piggy-back on the specification of inline assembly (that we have to figure out anyway), and then basically just say "this generates inline assembly for the appropriate load/store instruction on this platform". All inline assembly blocks should ideally come with a description of their AM-visible effects, and a safety comment explaining why those are well-defined to occur here --
I just want unicorns and rainbows, is that too much to task? ;) What you are asking is impossible (or maybe I am misunderstanding). If we want to have optimizations, all code must play by the aliasing rules, including volatile, FFI, and inline assembly. |
You can already write drivers using volatile and inline asm. I'm essentially with Ralf, and will even go slightly farther: If there's more magical ways to do things that are added to the language that's more ways people have to consider when trying to figure out if they did something right. LLVM already basically shrugs and calls volatile "some sort of hardware specific thing, you break it you bought it", which is about all that these load/store ops seem to be anyway, except that these have specific size requirements. We can just issue a warning on tearing volatile access if that's such a concern. |
Well, I like the idea of simply not using LLVM volatile any more and making direct use of inline assembly -- that would make us independent of the LLVM volatile semantics when it comes to questions like how volatile interacts with atomics, or with uninit memory. |
Yeah, that is my thought too. Volatile has very poorly specified semantics, and as far as I can tell this only makes writing drivers harder than it needs to be. Inline assembly needs to be supported anyway, and implementing volatile in terms of it means one less thing the compiler needs to know about. There might be some magic needed to work around const generics limitations, but even that should be able to be removed eventually. |
I think this phrasing is unfortunate as it seems to rule out some legitimate use cases. Consider the case where the kernel, hypervisor, wasm host, browser parent process or similar that you are writing is dealing with accessing the potentially-hostile client memory by 128-bit SIMD instruction to load/store vectors that logically have some smaller-than-128-bits elements (e.g. 16 8-bit elements) and you may want to access using an address that is aligned to the smaller subelements. On x86_64, there should be some facility that generates It seems to me that it would be better to have four functions than two: The unaligned versions shouldn't purport to provide any guarantees against tearing. Such guarantees are unnecessary when the use case is that a well-behaved client stays away from the shared memory during the privileged access and a badly-behaved client is welcome to UB itself but not to UB the more privileged kernel/hypervisor/wasm host/browser parent. |
```
let x = 5u8;
let p = &x as *const u8 as *mut u8;
core::arch::store(p,4u8);
println!("{}",x);
```
…On Sat, Aug 13, 2022 at 17:36 Ralf Jung ***@***.***> wrote:
Well, I'm talking about a *mut T that originates from a &T.
You are being exceedingly vague. Please give a concrete example.
—
Reply to this email directly, view it on GitHub
<#321 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD2ZHYR36JDHHHCTOFKTVZAIOXANCNFSM5Q2UG65Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Looks like GH now submitted some emails sent months ago, I had the same happen to me... That is UB, already at the A more interesting option arises around reads: we could declare that aliasing-violating reads return |
I would prefer
In this case, yes, since const SOME_MMIO_ADDRESS_THAT_IGNORES_WRITES: usize = 0xdeaddead;
fn main() {
let x = SOME_MMIO_ADDRESS_THAT_IGNORES_WRITES as *const u8 as &u8;
core::arch::store(SOME_MMIO_ADDRESS_THAT_IGNORES_WRITES as *mut u8, 4u8);
println!("{}", x);
} Because the MMIO address ignores anything written to it, the call to Fundamentally, Another case where one might need to use |
If we decide that freeze is something Rust should have (which IMO requires a separate RFC), then I agree volatile loads should freeze.
This relates to other questions we discussed some time ago, about whether the compiler can reorder Whenever they affect regular Rust-visible memory, however, they must fully follow the memory model rules. |
From a driver perspective, I suspect the answer is “no”. I am not aware of any reason why two different MMIO operations should commute. |
I don't think And similar for sound and DMA and so on. |
Ah sorry I meant reordered wrt non-volatile regular accesses. That is allowed for regular volatile. We had that entire discussion about barriers because of this somewhere.
|
Honestly, volatile and non-volatile should probably should not be reordered with each other, but LLVM does it so, we might be stuck with that. I guess unless we use inline asm? The discussion you're thinking of is probably the compiler fence topic: #347 |
@DemiMarie I wonder whether there is a particular blocker on this issue/RFC? Do we have a plan to move forward? |
@fbq I am not aware of any. I’m fine with making a full RFC out of this. |
Great to hear! I think |
@fbq One last question: Should there be separate functions for MMIO and non-MMIO? The reason is that MMIO is often trapped and emulated (e.g. in virtualized environments) and so must use very specific instructions the emulator can handle. These instructions may be suboptimal for non-MMIO uses. |
Specifically, for MMIO I want the RFC to guarantee not only the effect of these functions, but also the exact machine instructions these functions use to perform the load or store operation. For non-MMIO, this is overkill. |
I think these APIs should be designed for MMIO. Any other use is incidental, and likely a misuse. The Linux kernel memory model is an extremely special case; all other code should use proper atomics, so we shouldn't have the LKMM affect our volatile APIs. |
I agree with @RalfJung, for Rust language, these APIs should be designed for MMIO, for a particular Rust project (e.g. Linux kernel), developers might think these as an asm library, which can avoid re-inventing the wheel if possible (and it's handy to have some primitives that can access shared memory without causing UB, if the race is intended). Of course, this is probably a temporarily case for Linux kernel, I think we would like to move a proper memory model/atomics in the future. |
I think a non-MMIO use of these functions is likely either:
|
core::arch::{load, store}
This proposes new
load
andstore
functions (incore::arch
), for raw hardware loads and stores, and a concept of an always-valid type that can safely be cast to a byte array. It also defines volatile accesses in terms of these functions.The functions proposed here have the same semantics as raw machine load and store instructions. The compiler is not permitted to assume that the values loaded or stored are initialized, or even that they point to valid memory. However, it is permitted to assume that
load
andstore
do not violate Rust’s mutability rules.In particular, it is valid to use these functions to manipulate memory that is being concurrently accessed or modified by any means whatsoever. Therefore, they can be used to access memory that is shared with untrusted code. For example, a kernel could use them to access userspace memory, and a user-mode server could use them to access memory shared with a less-privileged user-mode process. It is also safe to use these functions to manipulate memory that is being concurrently accessed via DMA, or that corresponds to a memory-mapped hardware register.
The core guarantee that makes
load
andstore
useful is this: A call toload
orstore
is guaranteed to result in exactly one non-tearing non-interlocked load from or store to the exact address passed to the function, no matter what that address happens to be. To ensure this,load
andstore
are considered partially opaque to the optimizer. The optimizer must consider them to be calls to functions that may or may not dereference their arguments. It is even possible that the operation triggers a hardware fault that some other code catches and recovers from. Hence, the compiler can never prove that a given call tocore::arch::load
andcore::arch::store
will have undefined behavior. In other ways, a call toload
orstore
does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.The actual functions are as follows:
Performs a single memory access (of size
size_of::<T>()
) onptr
. The compiler must compile each these function calls into exactly one machine instruction. If this is not possible, it is a compile-time error. The typesT
for which a compiler can successfully generate code for these calls is dependent on the target architecture. Using aT
that cannot safely be transmuted to or from a byte array is not forbidden, but is often erroneous, and thus triggers a lint (see below). Provided thatptr
is properly aligned, these functions are guaranteed to not cause tearing. Ifptr
is not properly aligned, the results are architecture-dependent.The optimizer is not permitted to assume that
ptr
is dereferenceable or that it is properly aligned. This allows these functions to be used for in-process debuggers, crash dumpers, and other applications that may need to access memory at addresses obtained from some external source, such as a debug console or/proc/self/maps
. Ifload
is used to violate the aliasing rules (by accessing memory the compiler thinks cannot be accessed), the value returned may be non-deterministic and may contain sensitive data. Ifstore
is used to overwrite memory the compiler can assume will not be modified, subsequent execution (after the call tostore
returns) has undefined behavior.The semantics of
volatile
A call to
ptr::read_volatile
desugars to one or more calls toload
, and a call toptr::write_volatile
desugars to one or more calls tostore
. The compiler is required to minimize tearing to the extent possible, provided that doing so does not require the use of interlocked or otherwise synchronized instructions.const fn core::arch::volatile_non_tearing::<T>() -> bool
returnstrue
ifT
is such that tearing cannot occur for naturally-aligned accesses. It may still occur for non-aligned accesses (see below).Unaligned volatile access
The compiler is not allowed to assume that the arguments of
core::{ptr::{read_volatile, write_volatile}, arch::{load, store}}
are aligned. However, it is also not required to generate code to handle unaligned access, if doing so would cause a performance penalty for the aligned case. In particular, whether the no-tearing guarantee applies to unaligned access is architecture dependent. On some architectures, it is even possible for unaligned access to cause a hardware trap.New lints
Use of
core::ptr::{read_volatile, write_volatile}
with a type that cannot be safely transmuted to and from a byte slice will trigger adubious_type_in_volatile
lint. Use ofcore::arch::{load, store}
with such types will trigger adubious_type_in_load_or_store
lint. Both areWarn
by default. Thanks to @comex for the suggestion!Lowering
LLVM volatile semantics are still unclear, and may turn out to be weaker than necessary. It is also possible that LLVM volatile requires
dereferenceable
or otherwise interacts poorly with some of the permitted corner-cases. Therefore, I recommend loweringcore::{arch::{load, store}, ptr::{read_volatile, write_volatile}}
to LLVM inline assembly instead, which is at least guaranteed to work. This may change in the future.The text was updated successfully, but these errors were encountered: