Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify rules for ZST Boxes #77844

Merged
merged 3 commits into from
Nov 21, 2020
Merged

clarify rules for ZST Boxes #77844

merged 3 commits into from
Nov 21, 2020

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Oct 12, 2020

LLVM's rules around getelementptr inbounds with offset 0 are a bit annoying, and as a consequence we have no choice but say that a Box<()> pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this.

This is based on conversations on the LLVM mailing list.

The conclusion for me at least was that getelementptr inbounds with offset 0 is not the identity function, but can sometimes return poison even when the input is a regular pointer -- specifically, it returns poison when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers.

Note that there are formal proposals for LLVM semantics where getelementptr inbounds with offset 0 isn't quite the identity function but never returns poison (it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 12, 2020
@jyn514 jyn514 added A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools T-lang Relevant to the language team, which will review and decide on the PR/issue. A-zst Area: Zero-sized types (ZST). labels Oct 12, 2020
Copy link
Contributor

@danielhenrymantilla danielhenrymantilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, that is a pity. Good catch anyways!

Comment on lines 67 to 68
//! produces a valid pointer, but a pointer pointing into previously allocated memory that since got
//! freed is not valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! produces a valid pointer, but a pointer pointing into previously allocated memory that since got
//! freed is not valid.
//! produces a valid pointer, [unless it is a pointer pointing into previously allocated memory that since got
//! freed][`::core::ptr`].

Otherwise it may not be 100% clear which of the two rules "win"s in case of, for instance:

dealloc(ptr : *mut ()); // deallocated
Box::<()>::from_raw(ptr as usize as *mut ())
//                  ^^^^^^^^^^^^
//                  aligned non-zero integer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should probably change the text to say "integer literal", would that also solve your concern?

@ChrisDenton
Copy link
Member

So for clarity, it is memory safe to magic up a boxed ZST if the pointer does not come via a "standard allocator"? For example:

use core::ptr;
use core::mem;

// Assumes it's safe to bypass the ZST's constructor(s)
unsafe fn zst_box<T>() -> Box<T> {
    assert_eq!(mem::size_of::<T>(), 0);
    // `NonNull::dangling` ensures that the pointer is well-aligned
    // and does not come from an allocator.
    let ptr = ptr::NonNull::dangling().as_ptr();
    Box::from_raw(ptr)
}

@RalfJung
Copy link
Member Author

So for clarity, it is memory safe to magic up a boxed ZST if the pointer does not come via a "standard allocator"? For example:

Yes. That code does not violate Box memory layout. (It could be UB for other reasons as your comment indicates, e.g. T := ! makes this UB.)

@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 30, 2020
@Dylan-DPC-zz
Copy link

r? @Mark-Simulacrum

@Mark-Simulacrum
Copy link
Member

I am tempted to just r+ this, but it seems like it at least warrants a cc @rust-lang/lang. Going to r? @nikomatsakis as T-lang lead.

@nikomatsakis
Copy link
Contributor

Nominating to mention it at our next meeting but I expect to r+ and land after that.

@nikomatsakis nikomatsakis added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 9, 2020
//! For zero-sized values, the `Box` pointer still has to be [valid] for reads and writes and
//! sufficiently aligned. In particular, casting any aligned non-zero integer literal to a raw
//! pointer produces a valid pointer, but a pointer pointing into previously allocated memory that
//! since got freed is not valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any limitations here -- i.e., it's only a problem if the integer value is somehow derived from a freed value via some trackable provenance -- or is it UB even if you just pull an integer out of thin air that happens to be the same as a freed value?

If the latter, can we give some concrete advice about what value people should use for zero-sized values? I at least am feeling a bit confused about what kind of value is valid to use, given that it can't be null but must be aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good questions. Unfortunately, LLVM is not specified precisely enough to properly answer them. We are building on quicksand here and can just take a guess about how it will behave. ;)

My personal interpretation is that it is okay to do this if the pointer has no provenance. That should always be the case for pointers created from an integer literal, or any other computations where there are no pointers involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about adding some text to that effect, even if it is quite tentative? What does the compiler do if we create a Box::new(T) where struct T; has some kind of high alignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are saying I should use the term "provenance"? I had avoided that, because I don't think we are defining it anywhere outside the UCG glossary.

What does the compiler do if we create a Box::new(T) where struct T; has some kind of high alignment?

This is implemented in the stdlib actually, not the compiler. It uses the alignment as ptr value:

0 => Ok(NonNull::slice_from_raw_parts(layout.dangling(), 0)),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am not suggesting you should use the word provenance necessarily. What I was suggesting is saying something like "The recommended way to build a Box to a ZST is to use an integer equal to the alignment as the pointer value."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And/or we could mention ptr::NonNull::dangling() as a canonically-valid-and-always-up-to-date implementation 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be even better, good point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment along these lines.

@nikomatsakis nikomatsakis added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Nov 19, 2020
@nikomatsakis
Copy link
Contributor

To clarify the state of this PR: I am waiting for an addition to the comment that suggests the use of dangling, as suggested here.

//! valid for zero-sized accesses. This corresponds to writing your own allocator; allocating
//! zero-sized objects is not very hard. In contrast, when you use the standard allocator, after
//! memory got deallocated, even zero-sized accesses to that memory are invalid.
//! The canonical way to obtain a pointer that is valid for zero-sized accesses is [`NonNull::dangling`].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather unfortunate naming: the text says "the pointer must not be dangling" and then proceeds to recommend calling NonNull::dangling...

Is there another term we could use instead of the first "dangling"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- the pointer must not be "dangling" in the sense of pointing to deallocated memory
+ the pointer must not be pointing to deallocated memory

maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something like this, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great!👌

@RalfJung RalfJung removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 20, 2020
@RalfJung RalfJung added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 20, 2020
@nikomatsakis
Copy link
Contributor

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Nov 20, 2020

📌 Commit a7677f7 has been approved by nikomatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Nov 21, 2020
clarify rules for ZST Boxes

LLVM's rules around `getelementptr inbounds` with offset 0 are a bit annoying, and as a consequence we have no choice but say that a `Box<()>` pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this.

This is based on conversations on the LLVM mailing list.
* Here's my initial mail: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130452.html
* The first email of the March part of that thread: https://lists.llvm.org/pipermail/llvm-dev/2019-March/130831.html
* First email of the April part: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131693.html

The conclusion for me at least was that `getelementptr inbounds` with offset 0 is *not* the identity function, but can sometimes return `poison` even when the input is a regular pointer -- specifically, it returns `poison` when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers.

Note that there [are formal proposals](https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf) for LLVM semantics where `getelementptr inbounds` with offset 0 isn't quite the identity function but never returns `poison` (it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2020
Rollup of 8 pull requests

Successful merges:

 - rust-lang#77844 (clarify rules for ZST Boxes)
 - rust-lang#79067 (Refactor the abi handling code a bit)
 - rust-lang#79182 (Fix links to extern types in rustdoc (fixes rust-lang#78777))
 - rust-lang#79231 (Exhaustively match in variant count instrinsic)
 - rust-lang#79238 (Direct RUSTC_LOG (tracing/log) output to stderr instead of stdout.)
 - rust-lang#79256 (Fix typos)
 - rust-lang#79264 (Get rid of some doctree items)
 - rust-lang#79272 (Support building clone shims for arrays with generic size)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 6cd02a8 into rust-lang:master Nov 21, 2020
@rustbot rustbot added this to the 1.50.0 milestone Nov 21, 2020
@RalfJung RalfJung deleted the zst-box branch November 22, 2020 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-zst Area: Zero-sized types (ZST). S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.