Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for layout information behind pointers #69835

Open
2 tasks
CAD97 opened this issue Mar 8, 2020 · 24 comments
Open
2 tasks

Tracking Issue for layout information behind pointers #69835

CAD97 opened this issue Mar 8, 2020 · 24 comments
Labels
A-allocators Area: Custom and system allocators A-raw-pointers Area: raw pointers, MaybeUninit, NonNull B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. S-tracking-design-concerns Status: There are blocking design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@CAD97
Copy link
Contributor

CAD97 commented Mar 8, 2020

The feature gate for the issue is #![feature(layout_for_ptr)].

This tracks three functions:

  • core::mem::size_of_val_raw<T: ?Sized>(val: *const T) -> usize
  • core::mem::align_of_val_raw<T: ?Sized>(val: *const T) -> usize
  • core::alloc::Layout::for_value_raw<T: ?Sized>(t: *const T) -> Layout

These provide raw-pointer variants of the existing mem::size_of_val, mem::align_of_val, and Layout::for_value.

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also uses as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Unresolved Questions

  • What should the exact safety requirements of these functions be? It is currently possible to create raw pointers that have metadata that would make size wrap via ptr::slice_from_raw_parts. Trait vtable pointers are currently required to always be valid, but this is not guaranteed and an open question whether this is required of invalid pointers.
  • How should this interact with extern types? As this is a new API surface, we could potentially handle them properly whereas the _of_val cannot. Additionally, extern types may want to access the value to determine the size (e.g. a null terminated cstr).

rust-lang/lang-team#166 is tangentially related, as it serves to document what requirements currently exist on pointee types and getting a known layout from them.

Implementation history

@CAD97 CAD97 added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Mar 8, 2020
@jonas-schievink jonas-schievink added A-allocators Area: Custom and system allocators B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 8, 2020
@dtolnay dtolnay added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jul 18, 2020
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 25, 2020
…ruppe

re-add Layout::for_value_raw

Tracking issue: rust-lang#69835

This was accidentally removed in rust-lang#70362 56cbf2f.
Originally added in rust-lang#69079.
@KodrAus KodrAus added Libs-Tracked Libs issues that are tracked on the team's project board. A-raw-pointers Area: raw pointers, MaybeUninit, NonNull labels Jul 29, 2020
@clarfonthey
Copy link
Contributor

Potential bikeshed: maybe these methods should be moved to the ptr module instead of mem, where they can drop the _raw suffix. There already is some precedent for methods in these modules sharing names (e.g. ptr::swap versus mem:;swap) and I think it feels more natural.

@LegionMammal978
Copy link
Contributor

The safety documentation on these functions is somewhat inaccurate. It states:

This function is only safe to call if the following conditions hold: [...]

  • If the unsized tail of T is:
    • a slice, then the length of the slice tail must be an initialized integer, and the size of the entire value (dynamic tail length + statically sized prefix) must fit in isize.

But the size of a custom slice DST is not necessarily the sum of the size of its prefix and the size of its slice tail. If the alignment of the prefix is greater than the alignment of the slice type, the compiler will insert additional padding following the slice, which is counted in the full DST size.

I stumbled on this issue when investigating custom repr(Rust) slice DSTs. Except when the slice type is a generic parameter subject to an unsizing coercion, such a DST cannot be constructed at all, since there is no sound way for users to query its layout. These functions nearly allow users to determine the size of such DSTs, but the safety requirements prevent this. We cannot directly get the layout of a null pointer with the desired length, since we have no way to determine the size of the prefix. We cannot even extrapolate from the layout of a 0-length DST pointer, since the compiler could add arbitrary padding once the length is increased. It would be nice if there were a fallible way to query the layout information, since this would trivially allow such DSTs to be allocated and initialized.

@CAD97
Copy link
Contributor Author

CAD97 commented Apr 23, 2022

But the size of a custom slice DST is not necessarily the sum of the size of its prefix and the size of its slice tail. If the alignment of the prefix is greater than the alignment of the slice type, the compiler will insert additional padding following the slice, which is counted in the full DST size.

Well, the padding is statically sized, so in that sense, it's part of the statically sized prefix.

I also agree that there should be a fallible way to query layout information from pointer metadata, and have an experimental PR #95832 open to determine the cost of making size_of sound to call for arbitrarily sized slice tails (by saturating).

@LegionMammal978
Copy link
Contributor

Well, the padding is statically sized, so in that sense, it's part of the statically sized prefix.

In what sense is it statically sized? Consider this test program (playground):

#![feature(layout_for_ptr)]

use std::{mem, ptr};

const PREFIX: usize = 4;

#[repr(C)]
struct DST {
    align: [u64; 0],
    prefix: [u8; PREFIX],
    slice: [u8],
}

fn main() {
    for len in 0..32 {
        let ptr = ptr::slice_from_raw_parts(ptr::null::<()>(), len);
        let ptr = ptr as *const DST;
        let size = unsafe { mem::size_of_val_raw(ptr) };
        print!("{} ", size - PREFIX - len);
    }
    println!();
}

You can adjust the size of the prefix. With a 4-byte prefix, this is the output:

4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 

These are the differences between the PREFIX + len and the total size for each len. Clearly, this padding is dependent on the length of the slice.

@CAD97
Copy link
Contributor Author

CAD97 commented Apr 24, 2022

Ah, right, there is trailing padding to the alignment. I just read your comment wrong the first time.

And also, if it's `#[repr(Rust)], it's good to reiterate that there are formally no layout guarantees anyway, so the compiler is within its rights to do wackier things if it wanted to.

The trailing slice is always at the same offset.
#![feature(layout_for_ptr)]

use std::{mem, ptr};

const PREFIX: usize = 4;

#[repr(C)]
struct DST {
    align: [u64; 0],
    prefix: [u8; PREFIX],
    slice: [u8],
}

fn main() {
    for len in 0..32 {
        let ptr = ptr::slice_from_raw_parts(ptr::null::<()>(), len);
        let ptr = ptr as *const DST;
        let offset = unsafe {
            ptr::addr_of!((*ptr).slice)
                .cast::<u8>()
                .offset_from(ptr.cast())
        };
        print!("{offset} ");
    }
    println!();
}

prints all 4s.

You could argue that "dynamic tail length" includes the dynamic padding to alignment, or we could just add a "+ alignment padding" clause.

One way around the issue of #[repr(Rust)] with nongeneric slice tails not being usable is to always go through #[repr(C)] SliceTail<T, U> { prefix: T, tail: [U] } so you can rely on being able to determine/precalculate the layout, but this is obviously not ideal.

@LegionMammal978
Copy link
Contributor

That makes sense wrt. repr(Rust); I was mainly surprised to learn that this trailing padding can exist also in repr(C). The safety comments on these functions are the only real mentions of an "unsized tail", so I'd assumed that the tail is placed fully after and independently of the prefix. Even in repr(Rust), every field in the prefix must have a fixed offset, so padding is the only thing that can occur after the tail.

@CAD97
Copy link
Contributor Author

CAD97 commented Apr 26, 2022

Potential bikeshed: maybe these methods should be moved to the ptr module instead of mem, where they can drop the _raw suffix. [@clarfonthey]

What about Layout::for_value_raw? The feature is layout_for_ptr, and my original pre-PR draft used Layout::for_ptr, but that has a high chance of being misread as the layout to store the pointer, rather than for the pointee. Otherwise, I think I fully agree, but without a good name for the Layout function, keeping the parallel between of_val[_raw] and for_value[_raw] seems beneficial.

@zopsicle
Copy link
Contributor

Perhaps it should be called Layout::for_value_raw_unchecked, so that a fallible version Layout::for_value_raw can be added later. (Although it is confusing to have an unchecked version without a checked version.)

@joshtriplett joshtriplett added the S-tracking-design-concerns Status: There are blocking design concerns. label Jul 13, 2022
@joshtriplett
Copy link
Member

joshtriplett commented Jul 13, 2022

We reviewed this in today's @rust-lang/lang meeting.

It seems like these fit into the general story of pointee metadata, and we should consider them together with that. They may potentially want to make use of that metadata rather than operating directly on the pointers. cc #81513, which tracks the more general question.

@CAD97
Copy link
Contributor Author

CAD97 commented Jul 13, 2022

cc also rust-lang/lang-team#166 which includes an argument that it may make sense to restrict custom pointee layout to be calculable from the pointer metadata and not allow it to be address/pointee-sensitive.

@Jules-Bertholet
Copy link
Contributor

Shouldn't these functions guarantee that they are safe to call at least when casting the pointer to a reference and calling size_of_val/align_of_val on that would be safe?

@CAD97
Copy link
Contributor Author

CAD97 commented Oct 21, 2022

It's not explicitly stated, but it the case that if the pointer is valid to reborrow as a reference that these functions are safe to call. The listed conditions should be a proper subset of "pointer valid to reborrow."

@Jules-Bertholet
Copy link
Contributor

@CAD97 they aren't really a proper subset, as they don't account for new unsized types that might be added to Rust in the future.

@Jules-Bertholet
Copy link
Contributor

I made a PR to fix this: #103372

@joshlf
Copy link
Contributor

joshlf commented Oct 25, 2023

Unresolved Questions

In the interest of stabilizing something, would it be possible to choose the most conservative interpretations for these questions, leaving the door open to future work to loosen the requirements? Ie:

  • What should the exact safety requirements of these functions be? It is currently possible to create raw pointers that have metadata that would make size wrap via ptr::slice_from_raw_parts. Trait vtable pointers are currently required to always be valid, but this is not guaranteed and an open question whether this is required of invalid pointers.

For this, add a safety precondition that the pointer not result in size wrapping. Continue to require as a safety precondition that vtable pointers are always valid.

IIUC, inside the stdlib it's okay to rely on behavior that is not promised publicly, right?

  • How should this interact with extern types? As this is a new API surface, we could potentially handle them properly whereas the _of_val cannot. Additionally, extern types may want to access the value to determine the size (e.g. a null terminated cstr).

For this, document that size_of_val_raw may panic on any input, and that the caller must handle this possibility. Probably worth calling out explicitly that the conditions under which it panics may be narrowed in the future, and potentially eliminated entirely.

@jswrenn
Copy link
Member

jswrenn commented Oct 25, 2023

I've filed a PR with the above recommendations: #117185

This only required documenting the overflow preconditions, since the possibility of panics was already documented and the implementation details of Rc should expressly not be publicly documented.

@CAD97
Copy link
Contributor Author

CAD97 commented Oct 26, 2023

For this, add a safety precondition that the pointer not result in size wrapping. Continue to require as a safety precondition that vtable pointers are always valid.

It's already a requirement for size_of_val_raw and friends that "the entire size must fit in isize". Unless you mean to add the requirement to ptr::slice_from_raw_parts; that's a breaking change and not happening. That function's not even unsafe.

Those conditions are listed there as the reasons the function needs to be unsafe, and the functionally minimum precondition for this functionality.

IIUC, inside the stdlib it's okay to rely on behavior that is not promised publicly, right?

Yes, but I think you somehow have something backwards. Only permitting size_of_val_raw on pointers to a valid allocation (but which may contain an invalid value) is the most conservative precondition, and is significantly more conservative than the currently documented one. Functionally the only more conservative precondition possible is that it would be sound to do size_of_val(&*ptr) instead, at which point these functions aren't even necessary. Plus, the minimal precondition of "points to a valid allocation" is the one T-lang has signed off that they're comfortable committing to always being possible; guaranteeing more would require T-lang signoff again. (Or maybe T-opsem now.)

For this, document that size_of_val_raw may panic on any input, and that the caller must handle this possibility.

If you're going for the most conservative, it would be to just forbid usage with extern types initially. It's quite common in fact for "raw" unchecked versions of functions to be UB in cases where the safe function would panic or return a safe, if potentially misleading, default result.

In the interest of stabilizing something, would it be possible to choose the most conservative [answers to] these questions

I will agree there. To do so, my proposed replacement for the safety section:

If T is Sized, this function is safe to call, and is equivalent to calling size_of::<T>(). The pointer provided is irrelevant, and may even be null.

If the unsized tail of T is a [slice] or [trait object], then the pointer must be validly dereferencable to an [allocated object] capable of containing an entire value as described by the fat pointer (statically sized prefix plus dynamically sized tail). However, no actual reads are done, meaning that the validity of other pointers/references are not impacted. The most common way to fulfill this requirement is by pointing to a T value that did exist at some point, although it may have since been dropped, overwritten, or otherwise invalidated since.

Other unsized tail kinds can exist, such as (unstable) [extern type]s. It is not permitted to call this function with unsized T with these tail kinds. This means that having &T is not a sufficient precondition to call this function. In that case, you should use [size_of_val] instead.

[allocated object]: super::ptr#allocated-object

This is very slightly more permissive than "valid pointer to T or dropped T", but in a way consistent with the preexisting reality of untyped memory and pointer reinterpret casting being valid.

I'm not able to make a PR for this right now, but I should be able to sometime tomorrow.

@RalfJung
Copy link
Member

What is the motivation for requiring the pointer to point to an actual allocation (or for ptr.offset_bytes(size) to not wrap)? I don't see a good reason why we would add that precondition.

@CAD97
Copy link
Contributor Author

CAD97 commented Oct 26, 2023

The motivation is merely in being conservative and only stabilizing the minimal capability which the language team has already signed off on. It's also a condition marginally simpler to explain than a requirement closer to the actual operational requirement. No other API exists (on stable yet) which has a requirement on pointers weaker than "dereferencable" but stronger than the implicit safety invariant of pointers (whatever that is1).

I would much prefer the condition as it's currently (and I originally) specified it, though I might rephrase it a bit given post-2019 understanding of safety/validity requirements. But the underlying presupposition is that a stricter requirement would be easier to stabilize.

I'm not quite sure which teams stabilizing the functions as currently specified would fall under. If it's exclusively the domain of libs-api, I could honestly open a stabilization PR a year ago with justifications that the current choices are the correct ones. It's the presumed need to get lang signoff on extending what's possible in stable Rust (and having hacked together partially functional workarounds for code where I've wanted to use this) that's precluded me actively trying to force this through to stable.

Since this safety requirement fundamentally needs to talk about the pointer metadata in order to be weaker than "dereferencable," it's thus also entangled with feature(ptr_metadata). And that's a much bigger gate to try to stabilize (and I still need to revive a change request for it). Removing that implicit dependency is a benefit to specifying the requirement here in terms of dereferencability.

Footnotes

  1. Relevant: feature(trait_upcasting) makes it necessarily a safety invariant of pointers to dyn Trait that the vtable pointer references a valid vtable. I believe this got lang signoff as a requirement at some point, and as such could be removed from the requirements for size_of_val_raw. But given the nature of being a low level pointer function which shouldn't be doing such an upcast, it's probably still good form to mention it.

@RalfJung
Copy link
Member

Stabilization should definitively involve t-lang, since this extends what can be expressed in Rust.

Since this safety requirement fundamentally needs to talk about the pointer metadata in order to be weaker than "dereferencable," it's thus also entangled with feature(ptr_metadata).

The current spec doesn't require ptr_metadata I think? You need to recurse over the type to find the tail field, but I don't think it's needed to talk about ptr_metadata.

Dealing with dangling pointers is kind of the point of raw pointers, so I would find it somewhat odd that we would require dereferenceability here.

@joshlf
Copy link
Contributor

joshlf commented Oct 26, 2023

Without getting into the weeds of which other features are implicated, I will say that from a user's perspective, this feature wouldn't really be useful if the memory needs to be dereferenceable. The reason I'm advocating for stabilization is to support this design, which synthesizes a raw pointer to a slice DST type in order to determine its minimum size (ie, the size of the value with 0 trailing slice elements). If dereferenceability is a safety precondition, then that use case isn't supported. Speaking more generally, most (maybe all?) cases in which a user has a raw pointer which satisfies the dereferenceability condition, it is possible to call size_of_val(&*ptr) or something similar, and get the same effect as size_of_val_raw.

jswrenn added a commit to jswrenn/rust that referenced this issue Oct 26, 2023
This commit implements the recommendation of [1] to make the
safety preconditions of the raw pointer layout utilities more
conservative, to ease the path towards stabilization. In the
future, we may (if we choose) remove some of these restrictions
without breaking forwards compatibility.

[1]: rust-lang#69835 (comment)
@LegionMammal978
Copy link
Contributor

The reason I'm advocating for stabilization is to support google/zerocopy#29, which synthesizes a raw pointer to a slice DST type in order to determine its minimum size (ie, the size of the value with 0 trailing slice elements). If dereferenceability is a safety precondition, then that use case isn't supported.

One problem, as I was saying above, is that the use case still isn't supported without either a) the guarantee that a repr(Rust) slice DST with 0 elements fits within isize::MAX, or b) a fallible version of size_of_val_raw() that can indicate that the layout wouldn't fit within isize::MAX. As far as I can tell, a) is not guaranteed, since such repr(Rust) slice DSTs don't necessarily fit within isize::MAX in the general case, so they wouldn't necessarily fit in the 0-length case either.

@joshlf
Copy link
Contributor

joshlf commented Oct 27, 2023

The reason I'm advocating for stabilization is to support google/zerocopy#29, which synthesizes a raw pointer to a slice DST type in order to determine its minimum size (ie, the size of the value with 0 trailing slice elements). If dereferenceability is a safety precondition, then that use case isn't supported.

One problem, as I was saying above, is that the use case still isn't supported without either a) the guarantee that a repr(Rust) slice DST with 0 elements fits within isize::MAX, or b) a fallible version of size_of_val_raw() that can indicate that the layout wouldn't fit within isize::MAX. As far as I can tell, a) is not guaranteed, since such repr(Rust) slice DSTs don't necessarily fit within isize::MAX in the general case, so they wouldn't necessarily fit in the 0-length case either.

Ah good point. I've asked about (a) here: rust-lang/unsafe-code-guidelines#465 (comment)

@CAD97
Copy link
Contributor Author

CAD97 commented Nov 16, 2023

Recording here: the fact that valid vtable pointer is considered a safety precondition of *const dyn Trait means that it would be possible to provide a safe fn checked_size_of_pointee<T: ?Sized>(val: *const T) -> Option<usize> for all current unsized tail kinds which returns None for isize overflow. I would very much like to have this, and its viability has an impact on these "unchecked" versions. The main question then is what we think we might want to permit future/custom unsized kinds to do (e.g. the theoretical MetaSized bound).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators A-raw-pointers Area: raw pointers, MaybeUninit, NonNull B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. S-tracking-design-concerns Status: There are blocking design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests