Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elaborate on the invariants for references-to-slices #121965

Closed
wants to merge 1 commit into from

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Mar 4, 2024

The length limit on slices is clearly a safety invariant, and I'd like it to also be a validity invariant. With function parameter metadata making progress in LLVM, I'd really like to be able to use it when &[_] is passed as a scalar pair, in particular.

The documentation for references is cagey about what exactly is a validity invariant, so for now just elaborate on the consequences of the existing safety rules on slices -- the length restriction follows from the size_of_val restriction -- as a way to help discourage people from trying to violate them.

I also made the existing warning stronger, since I'm fairly sure it's already UB to violate at least the "references must be non-null" rule, rather than it just being that it "might be UB in the future".

cc @joshlf @RalfJung

@rustbot
Copy link
Collaborator

rustbot commented Mar 4, 2024

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 4, 2024
@joboet
Copy link
Member

joboet commented Mar 4, 2024

Given that slice::from_raw_parts already states that "the total size len * mem::size_of::<T>() of the slice must be no larger than isize::MAX" and that its behaviour is undefined otherwise, I'd say that this is entirely uncontroversial. Still, I'd appreciate some team sign-off on this, I think this concerns lang?

@rustbot label +I-lang-nominated
(feel free to remove/adjust so that this reaches the right people)

@rustbot rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label Mar 4, 2024
@RalfJung RalfJung added T-opsem Relevant to the opsem team T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 4, 2024
@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

The length limit on slices is clearly a safety invariant, and I'd like it to also be a validity invariant.

That sounds like we should go through t-opsem FCP, maybe joint with t-lang.

@scottmcm could you spell out the motivation for why it should be a validity invariant?
This does pose some extra complications for the opsem, since so far the validity invariants of metadata was always the same no matter whether it was in a raw pointer or a reference.

The authoritative location for validity invariants is the Reference ("behavior considered undefined"), so could you prepare an accompanying reference PR?

/// `isize::MAX / size_of::<E>()`. (Raw pointers may have longer lengths, but
/// references must not. For example, compare the documentation of
/// [`ptr::slice_from_raw_parts`](ptr/fn.slice_from_raw_parts.html) and
/// [`slice::from_raw_parts`](slice/fn.from_raw_parts.html).)
Copy link
Member

@RalfJung RalfJung Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to apply to all types with slice tails, or deliberately restricted to only slices and str?

In &(i32, [i32]), is the max length reduced by one to make sure that the entire type always fits into isize?

@Jules-Bertholet
Copy link
Contributor

Does this affect the design of the unstable Pointee trait? (As in, the validity invariant of reference metadata no longer matches that of Pointee::Metadata)

@joshlf
Copy link
Contributor

joshlf commented Mar 5, 2024

See also: #117474

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2024
Add `assume`s to slice length calls

Since `.len()` on slices is safe, let's see how this impacts things vs what we could do with rust-lang#121965 and LLVM 19
@joboet
Copy link
Member

joboet commented Mar 23, 2024

Does this affect the design of the unstable Pointee trait? (As in, the validity invariant of reference metadata no longer matches that of Pointee::Metadata)

I don't think so, as this is only an invariant for references, so ptr::slice_from_raw_parts and ptr::from_raw_parts wouldn't be affected.

@scottmcm
Copy link
Member Author

could you spell out the motivation for why it should be a validity invariant?
This does pose some extra complications for the opsem, since so far the validity invariants of metadata was always the same no matter whether it was in a raw pointer or a reference.

Can you elaborate on the complications it would cause? I don't know how to judge what is or isn't hard in there. I'd assumed that it wouldn't be substantially harder to deal with in opsem than how references have different validity invariants on the address+provenance part from pointers already.

(Talking about the pointee I think I understand how it's harder to enforce/check something, from the other conversations about that, but since the metadata isn't behind the pointer and thus it's right there to see when doing a typed copy, I didn't imagine it being substantially harder than checking things like the alignment of the pointer that's also reference-only, not applicable for pointers.)

I think the only reason I have to make it strictly a validity invariant would be for niches. That would be a better version of what RawVec is currently doing by hand, for example, with

#[repr(transparent)]
#[cfg_attr(target_pointer_width = "16", rustc_layout_scalar_valid_range_end(0x7fff))]
#[cfg_attr(target_pointer_width = "32", rustc_layout_scalar_valid_range_end(0x7fff_ffff))]
#[cfg_attr(target_pointer_width = "64", rustc_layout_scalar_valid_range_end(0x7fff_ffff_ffff_ffff))]
struct Cap(usize);

Today the very-common &[u8] has only the null available, but we could give it a whole isize::MAX more niches with a validity rule here. And having the niche rule for slice metadata would make it apply to Box<[T]> too, not just Vec<T>. (Plus I'd happily lose the ZST niche in Vec in exchange for everything else having a smarter one.)

Otherwise it's mostly convenience. Telling LLVM about the range restrictions is actually helpful, even just putting it on length checks, but also very expensive today -- both demonstrated in #122926

I'd love to just make it a validity invariant we can put on all loads, like we do with enums. (And soon we'll be able to do that for enum parameters too, which I'd love to do for slice lengths too.)

Is this meant to apply to all types with slice tails, or deliberately restricted to only slices and str?

Hmm, that's a very good question. I guess everything with a slice tail would be the most consistent, and thus really does need to be phrased in terms of the implied size of the object. (With the element count rule being the simple consequent case for plain slices.)

@RalfJung
Copy link
Member

RalfJung commented Mar 24, 2024

Can you elaborate on the complications it would cause? I don't know how to judge what is or isn't hard in there. I'd assumed that it wouldn't be substantially harder to deal with in opsem than how references have different validity invariants on the address+provenance part from pointers already.

It means we can't view metadata as "just a type" and check the metadata field as if it were a field of some type. Instead we have to view it as an inherent part of the pointer that cannot be described separately. (Or there are two separate types, one for 'metadata of reference' and one for 'metadata of raw pointer'.) We also get more degrees of freedom as we have to separately define the invariant for references and raw pointers.

It's not a big deal, so if there are clear benefits I don't think this should stop us. But we should clearly motivate breaking this symmetry.

In other words, it would have been nice to say that the validity of the metadata field of a pointer/reference to T is exactly that of <T as Pointee>::Metadata, but doesn't have to be like that as long as we are aware that we are breaking this property, we are doing so with sufficient motivation, and we are documenting this properly.

@saethlin
Copy link
Member

saethlin commented Apr 4, 2024

if there are clear benefits

Someone should diff the optimized IR for the compiler or some other large project before and after the PR that adds the assumes to see what optimizations are derived. I didn't notice this demonstrated in the PR, but I suspect with the assumes, LLVM optimizes code like &slice[idx..][..4] to one bounds check.

@joshlf
Copy link
Contributor

joshlf commented May 11, 2024

Is this PR obsoleted by rust-lang/reference#1482? IIUC, the text in this PR is all logically deducible from the text in rust-lang/reference#1482.

@saethlin
Copy link
Member

Is this PR obsoleted

Please no. We should have documentation about these things in as many places as is reasonable.

@joshlf
Copy link
Contributor

joshlf commented May 11, 2024

Is this PR obsoleted

Please no. We should have documentation about these things in as many places as is reasonable.

No objection to that.

@RalfJung , you seem to have concerns with this PR that you don't have with rust-lang/reference#1482. I don't personally have a dog in this fight as long as rust-lang/reference#1482 lands, but I'm curious just for curiosity's sake where the discrepancy is.

@RalfJung
Copy link
Member

RalfJung commented May 11, 2024 via email

@@ -1387,9 +1387,19 @@ mod prim_usize {}
/// returning values from safe functions; such violations may result in undefined behavior. Where
/// exceptions to this latter requirement exist, they will be called out explicitly in documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the section you are editing here is talking about the safety invariant, not the validity invariant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... the PR diff seems fine to me, except that it doesn't match the PR description (I don't see a validity invariant being defined here), and I am not sure if spelling out this consequence of the previous definition in so many words is all that useful?

If this intends to talk about the validity invariant, then IMO there should be separate subsections for safety and validity invariant, so that is is clear that we are talking about two different invariants.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 22, 2024
…, r=scottmcm

reference type safety invariant docs: clarification

The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant.

I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down.

I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR.

Cc `@joshlf` `@scottmcm`
fmease added a commit to fmease/rust that referenced this pull request May 22, 2024
…, r=scottmcm

reference type safety invariant docs: clarification

The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant.

I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down.

I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR.

Cc ``@joshlf`` ``@scottmcm``
fmease added a commit to fmease/rust that referenced this pull request May 22, 2024
…, r=scottmcm

reference type safety invariant docs: clarification

The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant.

I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down.

I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR.

Cc ```@joshlf``` ```@scottmcm```
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 22, 2024
Rollup merge of rust-lang#125043 - RalfJung:ref-type-safety-invariant, r=scottmcm

reference type safety invariant docs: clarification

The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant.

I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down.

I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR.

Cc ```@joshlf``` ```@scottmcm```
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request May 23, 2024
reference type safety invariant docs: clarification

The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant.

I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down.

I also took a clarification from rust-lang/rust#121965 that is orthogonal to the rest of that PR.

Cc ```@joshlf``` ```@scottmcm```
@RalfJung
Copy link
Member

t-opsem approved the desired validity invariant in rust-lang/unsafe-code-guidelines#510. The PR still needs adjustments though as noted above.

@joboet
Copy link
Member

joboet commented Jun 2, 2024

@rustbot author
r? @RalfJung

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 2, 2024
@rustbot rustbot assigned RalfJung and unassigned joboet Jun 2, 2024
@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in the lang call today. We believe that what's in this PR is implied by the change to the Reference here:

We decided to hold the FCP over on that Reference PR.

Once FCP on that PR completes, this PR will be OK to move forward in terms of lang.

@rustbot rustbot removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Jul 3, 2024
@JohnCSimon
Copy link
Member

@scottmcm
ping from triage - can you post your status on this PR? This PR has not received an update in a few months.

@scottmcm scottmcm closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants