-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several functions perform out-of-bounds memory accesses (which is UB) #559
Comments
OOB accesses using Rust accesses (including atomic and volatile) are always UB. The only way to do OOB accesses is using inline assembly. Then the inline asm block has to ensure that the access can never cause a program abort (so it should be on the same page as an address that we know we could access from Rust). |
If I understand rust-lang/rust#113923 correctly, compiler-builtins can now participate in LTO, so there's no way to make an argument like "this crate is part of the privileged runtime so it can violate some of the usual rules of UB that we apply to all other code". This is a soundness issue. |
AFAIK the calls to compiler-builtins intrinsics are inserted somewhere in the backend well after optimizations. In fact that PR needed to explicitly mark these functions as used to prevent LLVM from optimizing all these intrinsics away as dead code. |
That sounds plausible for calls to the LLVM builtin (making that a libcall happens pretty late), but if the user directly calls the C function |
It is possible that LLVM will replace explicit memcpy calls with the llvm.memcpy intrinsic. I don't know if it is in practice guaranteed to do this and if it is LLVM itself that does this, or clang when compiling C code though. |
There now is a "masked load" intrinsic that can be used to do a large load that is partially OOB. Would that help to fix the OOB in this code without regressing performance? |
I think it would take enough time to set up the mask that you could likely just read the initial byte chunk in a loop and have it be just as fast. |
Masked loads can still be useful for code size reductions if it allows to have only the vectorized loops without the pre/post scalar ones. |
I think major issue is a discrepancy between notion of UB as per Rust and as per CPU. From CPU standpoint it is perfectly ok to read whole word from memory into register or cache if permissions allows that. And for most (all?) CPUs permissions can be set at minimum of word alignment (e.g. 32-bit, 64-bit, etc) for platforms with physical memory protection or at page-level. So the fact that it is ok to read 1-byte also means it is ok to read the whole word. Might be some target architecture flag can be set to indicate a case where strict byte/half-word reads shall be used if there are architectures where this property doesn't hold (webassembly? not sure where it is enforced at byte level) . As for UB - yes, it is illegal to use extra bytes in the register besides intended, but if they are excluded from further computations, say by masking, it shall not matter for how they were read from memory. Under the hood CPU already read all bytes in cache line (if present), and if unaligned loads are supported it will issue multiple word reads on the bus and merge result. But with atomic case above issue is worse - since atomic transaction can't be done in a single operation where atomicity is guaranteed by hardware, some kind of software synchronization (e.g. mutex) should be used instead as it is possible in the event of real contention that only half of word will be updated and picked by other core/thread. The code implementing 'best effort' to get this atomic property can behave incorrectly. Might be atomics shall only be aligned? |
CPU-level reasoning is simply void when you talk about Rust code (unless it's inline assembly). So the CPU standpoint is really entirely irrelevant here. |
For Rust code - yes - it is internal implementation of hardware and reads of At the same time for LLVM it start to make sense at lower level - what is the best way on specific hardware to read required data from memory while avoiding bringing uninitialized data? And techniques like masking, shifts seems fine for me. Yes, it can be more efficient to do 2 word reads and couple shifts, than to do 4/8 byte reads and more shifts. But it also means LLVM better to expose this sort of intrinsics and lower it on its level. At the same time, certain optimization may be possible if information is available at IR level - e.g. reading unaligned word from an array sequentially, where you can reuse result from reading values for previous element instead of reading 2 words again. With atomic reads it is the same - if hardware doesn't support unaligned atomic operation - this operation shall probably cause a configurable warning of suboptimal implementation and LLVM can try to make a best effort to emulate it, but this probably shall be done at LLVM level, not at the Rust level. Some implementation requires support functions - and these support functions can be written in any language. If this is an intent for the code above - it is inherently wrong and should be rewritten with use of explicit synchronization, which itself can be OS specific. |
I don't know what you are arguing for here. The code here is uncontroversiallly buggy. All these hardware-specific things you mention are entirely missing the point. This is true for Rust code and also for LLVM. LLVM IR is an abstract language with a subtle (and often not clearly specified) semantics, and out-of-bounds reads are UB. You can't weasel your way out of that by waving your hands and saying something about hardware. Please read this blog post that I already linked above to understand why this point is so important.
Then you are wrong. They are not fine. There is an alternative universe in which Rust and LLVM are specified in a way that allows these techniques, but we do not live in that universe. (And doing so has other downsides, discussing which is entirely off-topic here.) It doesn't matter whether they "seem fine", it matters what the involved languages (Rust, LLVM IR) document in their specification (which reflects what they assume in the analysis and transformation passes). And what they document is unambiguous: out-of-bounds reads and writes are UB. |
So, switching the discussion a bit to finding a path forward: Should we simply rewrite the current code to have correctness at the possible expense of performance, and then people can get back that performance in later PRs if they can show that correctness is always maintained by each PR's changes, or what? I guess this is a T-libs discussion to have. Can we nominate the issue to come up at a meeting? |
I don't see a point for disagreement. I totally agree with UB treatment of uninitialized memory at IR and higher. And Rust code shall not be ok with any out-of-bound reads.
They are performed by hardware level all the time. While it is totally fine for LLVM to declare axioms about UB and uninitialized memory for higher levels, these operations are still performed under the hood. What I'm saying is that at the level where IR is translated to machine code these operations may appear and sometimes what is called by generated code might be written in some higher than assembler language. E.g. imagine a platform where only word reads exists, but you need to deal with bytes. You'd have to emulate byte operations somehow. Now add uninitialized padding to equation. What is the proper solution here? LLVM might be better off with providing intrinsics and hooks to implement it where support from OS is needed. Very similar to how certain operation are handled by LLVM now. There should be some low level to implement safe way to get unaligned data suitable for particular platform and |
For the code above I'd think that correct implementation will be having conditionals compilation to have different paths for those platforms which don't care about alignment e.g. X86-64 and those which needs special implementations, potentially relying on OS for synchronization/mutex, to be correct. |
I don't think we can do that in this repo; we need to file an issue in the main Rust repo for that. |
I found this code which openly admits to doing OOB accesses:
compiler-builtins/src/mem/impls.rs
Lines 64 to 70 in 2a67ad7
This code also looks suspicious, e.g. if this is used to implement 2-byte atomic accesses then there is OOB here:
compiler-builtins/src/arm_linux.rs
Lines 57 to 89 in 56172fc
I haven't done a thorough audit so there might be more. I found these by grepping for
atomic_load_unordered
.The text was updated successfully, but these errors were encountered: