-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should Rust assume that there are unknown threads of execution? #215
Comments
The illusion that Rust has a global view of the system goes out of the window the moment it makes an FFI call. This happens long before Even without these, Rust cannot make any assumptions about what an FFI call which returns a shared memory region might do. For all Rust knows the other end of the shared memory region could be C code within the same process (add a |
On windows, there's also |
@Amanieu Rust already assumes that FFI calls cannot do certain things. For example, we assume that an FFI function won't randomly read and write our entire address space, as tolerating this behavior would make almost any optimization of Rust memory accesses externally observable, and therefore invalid under an as-if rule of optimization legality. This is in spite of the fact that a function implemented using, say, x86 assembly, could perfectly do such a thing. I think that assuming a global view of concurrency in the application would be consistent with this prior decision of disallowing certain behavior from FFI entitites for the sake of enabling more optimizations on the Rust side. Conversely, I think that not assuming such a global view of concurrency would greatly harm rustc's future ability to optimize concurrency primitives such as atomics, by effectively making atomic as pessimistic as volatile ("assume that all reads and writes are externally observable") as soon as there is any remote possibility of memory sharing with the outside world. Which happens as soon as a pointer to a memory region has ever been sent to opaque FFI for any purpose, even if that was just a As an alternative, we could have both maximal optimization capabilities in Rust code and enable FFI concurrency / interprocess communication by having both non-volatile atomics (for intra-process / pure Rust communication) and volatile atomics (for FFI threads, MMIO and interprocess communication). This is the direction which I would personally advocate. As an aside, there is prior art for differentiating intra-process and inter-process synchronization at the operating system level. For example, this is the main semantic difference between Windows's Mutex and CriticalSection APIs (which in turn allows CriticalSection to be much simpler and faster in the common case than Mutex). |
Note that LLVM has an attribute specifically for this, Also consider that thread creation itself is an FFI call, where the thread closure is passed in as an argument. For this FFI call, any memory referred to by the thread closure must clearly be externally observable, yet there is no way for rustc to know this just from the signature of the call. My point is that, unless specifically marked with attributes, any memory whose address is passed to or from an FFI call should be assumed to be externally accessible. And indeed this is how LLVM currently works. |
Sorry for not being more specific, your observations that unknown function calls can do many things that could prevent the compiler from applying any optimizations are correct. Trying to be more specific, another potential way of specifying a shared memory region would be to encode a pointer to it in a static variable: #![no_core] #![no_std] #![no_main]
extern "C" {
static FOO: Channel<T>;
}
#[no_mangle] fn main() { ... } Do you think it is legal for a Rust compiler to optimize the atomic accesses of the channel to non-atomic ones if it can prove from just looking at the Rust program that it has not been possible for other threads of execution to be created ? |
In your code example the channel is an
How can the compiler even prove that no threads are created? The only case where this optimization is valid is on platforms that support neither threading nor signals, e.g. emscripten. |
If such other threads exist. If they don't, then that does not hold.
You mention that creating a thread requires an unknown function call. If the program contains no unknown function calls, then no threads are created by that program. |
All programs without unknown function calls always either exit without doing anything but consuming CPU, or they crash. Either way they are not very useful. Even something as simple as printing something to stdout requires an unknown function call to libc. Even if libc is visible to LTO, it still requires inline asm. |
#![no_core] #![no_std] #![no_main] #![no_entry] ...
extern "C" {
static FOO: Channel<i32>;
}
#[no_mangle]
fn main() {
FOO.write(0);
loop {
if FOO.read() == 42 {
println!("Hello world");
}
}
} That program calls unknown functions, but it doesn't do so before interfacing with shared memory, so it could not have spawned a second thread that modifies Can the compiler optimize that program to: |
No because external C code may run before main (via static constructors, LD_PRELOAD, etc), and this code may modified the publicly visible |
So you are saying that this code can spawn threads that remain active after How is this code represented in the abstract machine ? (e.g. how should |
It seems natural to assume that any unknown code that runs before the rust main (via any mechanism) can do anything at all that code can do, including spawn threads. |
Compiler abstract machines have somewhat supernatural characteristics though, as that is needed to permit code optimization. |
In this case an unknown function did get called before calling main. The function _start from libc is an unknown function which is the actual caller of main. |
@bjorn3 I've expanded the |
Then, yes, with a correct linker script and some sort of embedded environment you could attain rust that immediately executes with no previous code happening. |
Ignoring all sorts of problems like "there would be no println" and "your code didn't initialize the channel" and all that. |
Sure, but Rust as a language doesn't know that |
Well, that's the question this issue asks: does it? Can the Rust abstract machine assume that Or does the Rust abstract machine somehow model that other threads of execution might have been created before main and have spawned unknown code? What can this unknown code then do? How does miri model this ? What can or cannot @Amanieu makes the point that, in practice, such code exists, but we already optimize under the assumption that there are many things that such code is not allowed to do, so if we allow such code to exists, are any of those optimizations that we are already doing unsound ? (e.g. can that code escape a pointer to the |
I don't know about the Rust abstract machine, but miri does not support FFI except for a small whitelisted subset, and therefore avoids this issue (it also doesn't support threads).
The main assumption used by the optimizer (LLVM) is that external code cannot modify an "allocated object" (each local variable is allocated on the stack with a separate |
Since you were talking about If you are arguing that such code is allowed to spawn threads that modify Rust memory, then either (*) EDIT: e.g. erroring on any non-atomic memory access to memory that such "before-main" threads of execution are allowed to access.
IIUC under your model, the threads of execution before main can access the |
Agreed.
They better don't, we want to prove things about them. ;)
I feel we talked about this on Zulip but it is probably a good idea to write this down somewhere a bit more permanently... The way I view this discussion, what y'all are talking about is the initial state of the Rust Abstract Machine (R-AM). We could say that the compiler may assume that the program given by the user is started in the R-AM with empty memory and no other thread. That would permit some of the optimizations that were mentioned here. However, it seems more reasonable (and indeed it better matches what compilers do in practice) to say that the code compiled by the compiler must work for any initial state of the R-AM. More precisely, for any such initial state, if that execution of the R-AM does not enter the "UB" error state, then the optimized program must behave the same way. So, there can already be some memory allocated, there can already be other threads running, and so on. Formally, this assumption gives us, I think, what we want: rustc cannot just blindly remove atomicity from the example in the OP just because This also means there's not really anything for Miri to model. Miri just picks a particular initial state when running the program. And likewise does the user when running the real program -- normally there is no other thread so what Miri does is fine. If the user runs the program in another initial state, say with another thread in parallel, they'd have to also run that thread in Miri (assuming Miri supports threads one day). But that's no different from, say, having to test Miri with the actual content of the file system if that's what you want to test. |
To follow-up on what I said earlier -- actually, I realized this morning that the usual way this is handled is by considering "contextual refinement" instead of "refinement" when defining compiler correctness. That's a strict generalization of this "initial state" business. "refinement" means "every behavior exhibited by the target program is also a possible behavior of the source program". We have to state it this way because of non-determinism; in a deterministic language we could equivalently say "the source and the target program behave the same way". (Defining "behavior" and its equality here is subtle but orthogonal -- it amounts to something about the traces of externally visible events.) "contextual refinement" means "pick some context (a program with a hole), and then the context filled with the compiled program must refine the context filled with the source program". In other words, refinement is a "whole-program notion", one that assumes total knowledge of everything that happens. Contextual refinement is a modular concept, one that assumes we are just compiling part of the whole program here and no matter with which context this gets put together eventually, the compiled program must behave the same as (or more precisely, refine) the source program. This is what typical compilers have to do anyway; all we need to be careful about is, when compiling |
The dumb example I have in mind is: fn foo() -> u32 {
let x = AtomicU32::new(0);
x.write(42, Ordering::SeqCst);
x.read(Ordering::SeqCst)
}
fn main() {
assert_eq!(foo(), 42); // Is this always true?
} If when the abstract machine is initialized there are other threads of execution, and those threads have pointers to or own the What if fn foo() -> (u32, usize) {
static x: AtomicU32 = AtomicU32::new(0);
x.write(42, Ordering::SeqCst);
(x.read(Ordering::SeqCst), &x as *const _ as usize)
}
fn main() {
assert_eq!(foo().0, 42); // Is this always true?
} ? (those other threads before main could call |
I think we should forbid accessing stack allocations, which dont have their address leaked (like already assumed by LLVM), but assume other threads of execution. That would make the above example guaranteed to not panic, but when the |
@gnzlbg In the first case the address of In the second case the address is indeed leaked (I'm assuming you mean |
Agreed with @Amanieu. This matches the "contextual refinement" definition:
|
I want Rust to optimize my Rust programs under the assumption that the code before main doesn't do any of this by default. |
Do you have an even remotely realistic motivating example for that wish? We could certainly add a flag or so to basically let the compiler assume, when compiling a binary, that the context is empty. But that's not usually how compilers work, it seems. |
I mostly use two tier 1 targets (x86_64-apple-darwin and x86_64-unknown-linux-gnu) and in neither does the code before |
The question is, what cost? What's a remotely realistic program (fragment) where useful (read: significant positive impact on some measure of codegen quality) optimization can be performed with this assumption, that could not (with reasonable effort) be performed without this assumption? (It's also not true that the targets you mention can't have code before |
Yes, I want to exclude that code from my programs.
What do you mean by "reasonable effort" ? If I'm writing a single threaded program I want the compiler to be able to optimize away atomic memory accesses. I don't want to have to pick a completely set of dependencies that do not use atomics instead. Part of Rust's design has been to limit what life before main can do. This proposal, however, appears to suggest introducing complex life before main in the language which, among other things, is able to spawn threads that interact with the Rust program, or maybe even something like: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4b63a3b131423e3964816577756ed8c5 , which does not match what, e.g., the current Rust Design FAQ says (sect. "1.4 There is no life before or after main"). If you are writing a program that requires assuming that life before main is complex for correctness, currently, without such an assumption, you can still write a correct program with a reasonable amount of effort (e.g. using volatile or atomic volatile instead of just normal load/stores or atomics), only disabling compiler optimizations for those memory accesses for which you need the stronger guarantees. OTOH, if we allow complex life before main, it is not clear to me how we could allow single-threaded programs to state that life before main is not complex such that this information can be used for optimizations, in such a way that makes sure that, e.g., code using Maybe this issue could be rephrased as: "How complex is life before main allowed to be?" (until now the goal was that there is absolutely no life before main). |
Rust the language does not provide any access to life-before-main features since it is not considered a good feature. It is, incidentially and at the user's own peril, possible to use it by wielding low-level non-portable facilities such as It is especially extreme to suggest that some vague subset of plausible uses of life-before-main (say, starting a logger thread) ought to lead to UB when used in a program where the entry point happens to be written in Rust.
I was talking about reasonable effort on the part of the optimizer, not programmer. To put it bluntly, you are writing paragraphs upon paragraphs predicated on these hypothetical optimizations mattering, but so far there is not a shred of evidence that they actually matter. Optimizations, and UB reserved for optimization purposes, needs to be worthwhile, both to justify the spec and implementation effort and to outweight the costs of imposing more UB on users. There are several a priori reasons to doubt that "assume an initial R-AM state with no other threads running" would have any measurable benefit in practice:
The way to solve this, as suggested before, is to showcase a somewhat realistic program, a plausible sequence of compiler optimizations of that program which require assuming that no other threads existed on entry, and hard numbers showing that the program is singificantly faster/smaller/etc. after the optimizations. The last part is key. It is easy to conjure up examples where the final assembly looks nicer, but what matters is whether it (for example) actually runs faster. |
I don't think this is true. Precisely because Rust does not have life before main, a lot of Rust code needs to initialize statics on first use (e.g. if the initialization logic is not supported by One example of a In a single threaded program, its usage expands to something like this: static CACHE: AtomicU32 = ...;
fn main() {
for _ in 0.. { // some hot computation loop
let mut cache = CACHE.load(Ordering::Acquire);
if cache.is_uninitialized() {
cache.initialize();
CACHE.store(cache, Ordering::Release);
}
// dispatch to avx implementation if available
if test_bit(cache, "avx") {
avx_function()
} else {
fallback_function()
}
}
} Under the model that external threads could have been created before main that modify Rust statics, the target-feature cache needs to be re-loaded at every loop iteration, since its value can change. If no external threads that can concurrently modify the static can be spawned before |
Static initialization is generally performed by the OS, I thought, as part of its general contract with an executable to have a sane setup before the program gets started. Or, in an embedded context, well you probably have some sort of asm script that copies some stuff around or something like that. |
That sounds like an extremely fragile optimization. It would only work if:
We know that in reality, once the cache is initialized it cannot be de-initialized. If only we had a way to communicate those semantics to the compiler, they would justify performing the same optimization in arbitrary functions, regardless of threading, regardless of FFI calls or unknown writes. That would be a much more robust approach. Potential avenues include:
|
I don't know how relevant this is here, but GNU libstdc++ has an optimization where its |
That FAQ also says things like "match must be exhaustive". So should we stop supporting FFI with languages that do not make such requirements? That is a FAQ for Rust itself, but we don't get to make the rules for how Rust interacts with the outside world. If I had my way, we'd not support FFI with C or C++ as those are messy and complicated. We'd not have padding in unions (as in, all their bytes get copied when passed to another function) and everything could be nice. Heck, we'd not even have unions, they have basically no use in pure Rust. But alas, we can't just wish the rest of the world away. We support interacting with C code where unions lose some of their bytes when passed by-value, and we likewise support interacting with C code that has life-before-main. I am honestly quite flabbergasted that you out of all people suggest to severely restrict the interoperability of Rust with other languages like that.
I consider that as "poor man's
Good observation! Even in sequential code this optimization @gnzlbg asks for is far from trivial as literally any uncontrolled write anywhere in the program could break it. Having a compiler-understood notion of "write-once variables" would be very useful here. |
Not super relevant to this thread, but since this is a weirdly persistent misconception: There are significant pure-Rust use cases for unions. As the RFC puts it: "A native union mechanism would also simplify Rust implementations of space-efficient or cache-efficient structures relying on value representation, such as machine-word-sized unions using the least-significant bits of aligned pointers to distinguish cases." (on topic, I basically agree with the (consensus?) view that the status quo of "there is other code out there, but it can only mess with memory that I explicitly give it references to" should remain, and the optimizations suggested so far can and should be enabled in other, more robust ways) |
I really have no idea what you are talking about. Feels like you are building up a huge strawman that's kind of off-topic for this thread (What does FFI have to do with what Rust programs are allowed to do before main starts?), and kind of debunking it yourself.
If that is what I did, that makes two of us. But I don't recall talking about this anywhere in this thread.
👍 |
The target specification can specify that the target doesn't support threads: https://github.com/rust-lang/rust/blob/4d1bd0db7f489b22c6d8aa2385937a95412c015b/compiler/rustc_target/src/spec/mod.rs#L2090 This will cause LLVM to replace all atomics with non-atomic loads and stores and assume that global variables don't get changed from another thread. The eBPF, UEFI and msp430 targets enable this option. And for wasm there is a special case to enable it when the atomics feature is disabled: https://github.com/rust-lang/rust/blob/4d1bd0db7f489b22c6d8aa2385937a95412c015b/compiler/rustc_codegen_llvm/src/back/write.rs#L205-L207 |
Interesting. But I guess even on those targets we should assume that there might have been "something" that happens before |
Something happening before main is what needs to happen to give those statics their initial value on many platforms. otherwise they'd just be blank. |
That's different, that's just a detail of how we set up the real machine to match the AM state. This is about other stuff happening, things rustc didn't ask the platform to do. |
@bjorn3 I believe that if LLVM's BTW, reading my past self comments on this issue (5 years ago!) makes me ashamed. As i've expressed on Zulip since, I think the whole "unknown threads of execution beyond main" idea is a reasonable way to reduce the amount of use cases for volatile, which provides a both simpler and more performant programming model for inter-process communication. With this proposal, message passing across processes or across threads of the same process, has the same programming model and exhibits in practice the same performance, and this only comes at the cost of preventing "theoretical" optimizations that are impossible in practice for any real-world application. I do not believe it gets any better than this. |
@Amanieu provided the following example here, where the
Channel
should be placed in inter-process shared memory:If Rust proves that the Rust process is single-threaded, is it a sound optimization to replace the atomic loads/stores of the
AtomicBool
by non-atomic memory accesses ? If that optimization is sound, a different process attempting to access thebool
introduces a data-race and the behavior is undefined.What assumptions does LLVM do?
Note: replacing the atomic loads and stores with volatile atomic loads and stores would ensure that the example above is correct independently of the answer to these questions.
The text was updated successfully, but these errors were encountered: