-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarify rules for ZST Boxes #77844
clarify rules for ZST Boxes #77844
Conversation
r? @sfackler (rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, that is a pity. Good catch anyways!
library/alloc/src/boxed.rs
Outdated
//! produces a valid pointer, but a pointer pointing into previously allocated memory that since got | ||
//! freed is not valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! produces a valid pointer, but a pointer pointing into previously allocated memory that since got | |
//! freed is not valid. | |
//! produces a valid pointer, [unless it is a pointer pointing into previously allocated memory that since got | |
//! freed][`::core::ptr`]. |
Otherwise it may not be 100% clear which of the two rules "win"s in case of, for instance:
dealloc(ptr : *mut ()); // deallocated
Box::<()>::from_raw(ptr as usize as *mut ())
// ^^^^^^^^^^^^
// aligned non-zero integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should probably change the text to say "integer literal", would that also solve your concern?
So for clarity, it is memory safe to magic up a boxed ZST if the pointer does not come via a "standard allocator"? For example: use core::ptr;
use core::mem;
// Assumes it's safe to bypass the ZST's constructor(s)
unsafe fn zst_box<T>() -> Box<T> {
assert_eq!(mem::size_of::<T>(), 0);
// `NonNull::dangling` ensures that the pointer is well-aligned
// and does not come from an allocator.
let ptr = ptr::NonNull::dangling().as_ptr();
Box::from_raw(ptr)
} |
Yes. That code does not violate |
I am tempted to just r+ this, but it seems like it at least warrants a cc @rust-lang/lang. Going to r? @nikomatsakis as T-lang lead. |
Nominating to mention it at our next meeting but I expect to r+ and land after that. |
library/alloc/src/boxed.rs
Outdated
//! For zero-sized values, the `Box` pointer still has to be [valid] for reads and writes and | ||
//! sufficiently aligned. In particular, casting any aligned non-zero integer literal to a raw | ||
//! pointer produces a valid pointer, but a pointer pointing into previously allocated memory that | ||
//! since got freed is not valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any limitations here -- i.e., it's only a problem if the integer value is somehow derived from a freed value via some trackable provenance -- or is it UB even if you just pull an integer out of thin air that happens to be the same as a freed value?
If the latter, can we give some concrete advice about what value people should use for zero-sized values? I at least am feeling a bit confused about what kind of value is valid to use, given that it can't be null but must be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good questions. Unfortunately, LLVM is not specified precisely enough to properly answer them. We are building on quicksand here and can just take a guess about how it will behave. ;)
My personal interpretation is that it is okay to do this if the pointer has no provenance. That should always be the case for pointers created from an integer literal, or any other computations where there are no pointers involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about adding some text to that effect, even if it is quite tentative? What does the compiler do if we create a Box::new(T)
where struct T;
has some kind of high alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying I should use the term "provenance"? I had avoided that, because I don't think we are defining it anywhere outside the UCG glossary.
What does the compiler do if we create a Box::new(T) where struct T; has some kind of high alignment?
This is implemented in the stdlib actually, not the compiler. It uses the alignment as ptr value:
rust/library/alloc/src/alloc.rs
Line 163 in c4f836a
0 => Ok(NonNull::slice_from_raw_parts(layout.dangling(), 0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I am not suggesting you should use the word provenance necessarily. What I was suggesting is saying something like "The recommended way to build a Box
to a ZST is to use an integer equal to the alignment as the pointer value."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And/or we could mention ptr::NonNull::dangling()
as a canonically-valid-and-always-up-to-date implementation 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be even better, good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment along these lines.
To clarify the state of this PR: I am waiting for an addition to the comment that suggests the use of |
library/core/src/ptr/mod.rs
Outdated
//! valid for zero-sized accesses. This corresponds to writing your own allocator; allocating | ||
//! zero-sized objects is not very hard. In contrast, when you use the standard allocator, after | ||
//! memory got deallocated, even zero-sized accesses to that memory are invalid. | ||
//! The canonical way to obtain a pointer that is valid for zero-sized accesses is [`NonNull::dangling`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rather unfortunate naming: the text says "the pointer must not be dangling" and then proceeds to recommend calling NonNull::dangling
...
Is there another term we could use instead of the first "dangling"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the pointer must not be "dangling" in the sense of pointing to deallocated memory
+ the pointer must not be pointing to deallocated memory
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something like this, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great!👌
@bors r+ rollup |
📌 Commit a7677f7 has been approved by |
clarify rules for ZST Boxes LLVM's rules around `getelementptr inbounds` with offset 0 are a bit annoying, and as a consequence we have no choice but say that a `Box<()>` pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this. This is based on conversations on the LLVM mailing list. * Here's my initial mail: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130452.html * The first email of the March part of that thread: https://lists.llvm.org/pipermail/llvm-dev/2019-March/130831.html * First email of the April part: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131693.html The conclusion for me at least was that `getelementptr inbounds` with offset 0 is *not* the identity function, but can sometimes return `poison` even when the input is a regular pointer -- specifically, it returns `poison` when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers. Note that there [are formal proposals](https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf) for LLVM semantics where `getelementptr inbounds` with offset 0 isn't quite the identity function but never returns `poison` (it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.
Rollup of 8 pull requests Successful merges: - rust-lang#77844 (clarify rules for ZST Boxes) - rust-lang#79067 (Refactor the abi handling code a bit) - rust-lang#79182 (Fix links to extern types in rustdoc (fixes rust-lang#78777)) - rust-lang#79231 (Exhaustively match in variant count instrinsic) - rust-lang#79238 (Direct RUSTC_LOG (tracing/log) output to stderr instead of stdout.) - rust-lang#79256 (Fix typos) - rust-lang#79264 (Get rid of some doctree items) - rust-lang#79272 (Support building clone shims for arrays with generic size) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
LLVM's rules around
getelementptr inbounds
with offset 0 are a bit annoying, and as a consequence we have no choice but say that aBox<()>
pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this.This is based on conversations on the LLVM mailing list.
The conclusion for me at least was that
getelementptr inbounds
with offset 0 is not the identity function, but can sometimes returnpoison
even when the input is a regular pointer -- specifically, it returnspoison
when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers.Note that there are formal proposals for LLVM semantics where
getelementptr inbounds
with offset 0 isn't quite the identity function but never returnspoison
(it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.