Elaborate on the invariants for references-to-slices #121965

scottmcm · 2024-03-04T07:08:56Z

The length limit on slices is clearly a safety invariant, and I'd like it to also be a validity invariant. With function parameter metadata making progress in LLVM, I'd really like to be able to use it when &[_] is passed as a scalar pair, in particular.

The documentation for references is cagey about what exactly is a validity invariant, so for now just elaborate on the consequences of the existing safety rules on slices -- the length restriction follows from the size_of_val restriction -- as a way to help discourage people from trying to violate them.

I also made the existing warning stronger, since I'm fairly sure it's already UB to violate at least the "references must be non-null" rule, rather than it just being that it "might be UB in the future".

cc @joshlf @RalfJung

rustbot · 2024-03-04T07:09:04Z

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

joboet · 2024-03-04T10:37:58Z

Given that slice::from_raw_parts already states that "the total size len * mem::size_of::<T>() of the slice must be no larger than isize::MAX" and that its behaviour is undefined otherwise, I'd say that this is entirely uncontroversial. Still, I'd appreciate some team sign-off on this, I think this concerns lang?

@rustbot label +I-lang-nominated
(feel free to remove/adjust so that this reaches the right people)

RalfJung · 2024-03-04T10:58:52Z

The length limit on slices is clearly a safety invariant, and I'd like it to also be a validity invariant.

That sounds like we should go through t-opsem FCP, maybe joint with t-lang.

@scottmcm could you spell out the motivation for why it should be a validity invariant?
This does pose some extra complications for the opsem, since so far the validity invariants of metadata was always the same no matter whether it was in a raw pointer or a reference.

The authoritative location for validity invariants is the Reference ("behavior considered undefined"), so could you prepare an accompanying reference PR?

RalfJung · 2024-03-04T11:00:30Z

library/core/src/primitive_docs.rs

+/// `isize::MAX / size_of::<E>()`. (Raw pointers may have longer lengths, but
+/// references must not.  For example, compare the documentation of
+/// [`ptr::slice_from_raw_parts`](ptr/fn.slice_from_raw_parts.html) and
+/// [`slice::from_raw_parts`](slice/fn.from_raw_parts.html).)


Is this meant to apply to all types with slice tails, or deliberately restricted to only slices and str?

In &(i32, [i32]), is the max length reduced by one to make sure that the entire type always fits into isize?

Jules-Bertholet · 2024-03-04T13:50:05Z

Does this affect the design of the unstable Pointee trait? (As in, the validity invariant of reference metadata no longer matches that of Pointee::Metadata)

joshlf · 2024-03-05T18:15:09Z

See also: #117474

Add `assume`s to slice length calls Since `.len()` on slices is safe, let's see how this impacts things vs what we could do with rust-lang#121965 and LLVM 19

joboet · 2024-03-23T12:19:01Z

Does this affect the design of the unstable Pointee trait? (As in, the validity invariant of reference metadata no longer matches that of Pointee::Metadata)

I don't think so, as this is only an invariant for references, so ptr::slice_from_raw_parts and ptr::from_raw_parts wouldn't be affected.

scottmcm · 2024-03-24T02:39:32Z

could you spell out the motivation for why it should be a validity invariant?
This does pose some extra complications for the opsem, since so far the validity invariants of metadata was always the same no matter whether it was in a raw pointer or a reference.

Can you elaborate on the complications it would cause? I don't know how to judge what is or isn't hard in there. I'd assumed that it wouldn't be substantially harder to deal with in opsem than how references have different validity invariants on the address+provenance part from pointers already.

(Talking about the pointee I think I understand how it's harder to enforce/check something, from the other conversations about that, but since the metadata isn't behind the pointer and thus it's right there to see when doing a typed copy, I didn't imagine it being substantially harder than checking things like the alignment of the pointer that's also reference-only, not applicable for pointers.)

I think the only reason I have to make it strictly a validity invariant would be for niches. That would be a better version of what RawVec is currently doing by hand, for example, with

rust/library/alloc/src/raw_vec.rs

Lines 36 to 40 in 9b8d12c

    
           #[repr(transparent)] 
        
           #[cfg_attr(target_pointer_width = "16", rustc_layout_scalar_valid_range_end(0x7fff))] 
        
           #[cfg_attr(target_pointer_width = "32", rustc_layout_scalar_valid_range_end(0x7fff_ffff))] 
        
           #[cfg_attr(target_pointer_width = "64", rustc_layout_scalar_valid_range_end(0x7fff_ffff_ffff_ffff))] 
        
           struct Cap(usize);

Today the very-common &[u8] has only the null available, but we could give it a whole isize::MAX more niches with a validity rule here. And having the niche rule for slice metadata would make it apply to Box<[T]> too, not just Vec<T>. (Plus I'd happily lose the ZST niche in Vec in exchange for everything else having a smarter one.)

Otherwise it's mostly convenience. Telling LLVM about the range restrictions is actually helpful, even just putting it on length checks, but also very expensive today -- both demonstrated in #122926

I'd love to just make it a validity invariant we can put on all loads, like we do with enums. (And soon we'll be able to do that for enum parameters too, which I'd love to do for slice lengths too.)

Is this meant to apply to all types with slice tails, or deliberately restricted to only slices and str?

Hmm, that's a very good question. I guess everything with a slice tail would be the most consistent, and thus really does need to be phrased in terms of the implied size of the object. (With the element count rule being the simple consequent case for plain slices.)

RalfJung · 2024-03-24T18:44:53Z

Can you elaborate on the complications it would cause? I don't know how to judge what is or isn't hard in there. I'd assumed that it wouldn't be substantially harder to deal with in opsem than how references have different validity invariants on the address+provenance part from pointers already.

It means we can't view metadata as "just a type" and check the metadata field as if it were a field of some type. Instead we have to view it as an inherent part of the pointer that cannot be described separately. (Or there are two separate types, one for 'metadata of reference' and one for 'metadata of raw pointer'.) We also get more degrees of freedom as we have to separately define the invariant for references and raw pointers.

It's not a big deal, so if there are clear benefits I don't think this should stop us. But we should clearly motivate breaking this symmetry.

In other words, it would have been nice to say that the validity of the metadata field of a pointer/reference to T is exactly that of <T as Pointee>::Metadata, but doesn't have to be like that as long as we are aware that we are breaking this property, we are doing so with sufficient motivation, and we are documenting this properly.

saethlin · 2024-04-04T06:09:14Z

if there are clear benefits

Someone should diff the optimized IR for the compiler or some other large project before and after the PR that adds the assumes to see what optimizations are derived. I didn't notice this demonstrated in the PR, but I suspect with the assumes, LLVM optimizes code like &slice[idx..][..4] to one bounds check.

joshlf · 2024-05-11T19:04:29Z

Is this PR obsoleted by rust-lang/reference#1482? IIUC, the text in this PR is all logically deducible from the text in rust-lang/reference#1482.

saethlin · 2024-05-11T19:38:09Z

Is this PR obsoleted

Please no. We should have documentation about these things in as many places as is reasonable.

joshlf · 2024-05-11T19:46:22Z

Is this PR obsoleted

Please no. We should have documentation about these things in as many places as is reasonable.

No objection to that.

@RalfJung , you seem to have concerns with this PR that you don't have with rust-lang/reference#1482. I don't personally have a dog in this fight as long as rust-lang/reference#1482 lands, but I'm curious just for curiosity's sake where the discrepancy is.

RalfJung · 2024-05-11T21:01:00Z

I don't see much of a relation between the two PRs. The one in the reference documents a property of syntactic type well-formedness, this here is about a (runtime) validity invariant. I agree we should decide on the validity invariants of wide references and raw pointers. It may be better to consider all 4 at once though so that we have the larger picture in mind. (4 invariants: slice and dyn Trait tails; references and raw pointers.) I don't even know which of them we have consensus on in t-opsem.

RalfJung · 2024-05-12T07:13:38Z

library/core/src/primitive_docs.rs

@@ -1387,9 +1387,19 @@ mod prim_usize {}
 /// returning values from safe functions; such violations may result in undefined behavior. Where
 /// exceptions to this latter requirement exist, they will be called out explicitly in documentation.


Note that the section you are editing here is talking about the safety invariant, not the validity invariant.

So... the PR diff seems fine to me, except that it doesn't match the PR description (I don't see a validity invariant being defined here), and I am not sure if spelling out this consequence of the previous definition in so many words is all that useful?

If this intends to talk about the validity invariant, then IMO there should be separate subsections for safety and validity invariant, so that is is clear that we are talking about two different invariants.

…, r=scottmcm reference type safety invariant docs: clarification The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant. I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down. I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR. Cc `@joshlf` `@scottmcm`

…, r=scottmcm reference type safety invariant docs: clarification The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant. I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down. I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR. Cc ``@joshlf`` ``@scottmcm``

…, r=scottmcm reference type safety invariant docs: clarification The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant. I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down. I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR. Cc ```@joshlf``` ```@scottmcm```

Rollup merge of rust-lang#125043 - RalfJung:ref-type-safety-invariant, r=scottmcm reference type safety invariant docs: clarification The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant. I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down. I also took a clarification from rust-lang#121965 that is orthogonal to the rest of that PR. Cc ```@joshlf``` ```@scottmcm```

reference type safety invariant docs: clarification The old text could have been read as saying that you can call a function if these requirements are upheld, which is definitely not true as they are an underapproximation of the actual safety invariant. I removed the part about functions relaxing the requirements via their documentation... this seems incoherent with saying that it may actually be unsound to ever temporarily violate the requirement. Furthermore, a function *cannot* just relax this for its return value, that would in general be unsound. And the part about "unsafe code in a safe function may assume these invariants are ensured of arguments passed by the caller" also interacts with relaxing things: clearly, if the invariant has been relaxed, unsafe code cannot rely on it any more. There may be a place to give general guidance on what kinds of function contracts can exist, but the reference type is definitely not the right place to write that down. I also took a clarification from rust-lang/rust#121965 that is orthogonal to the rest of that PR. Cc ```@joshlf``` ```@scottmcm```

RalfJung · 2024-05-23T05:32:12Z

t-opsem approved the desired validity invariant in rust-lang/unsafe-code-guidelines#510. The PR still needs adjustments though as noted above.

joboet · 2024-06-02T13:49:04Z

@rustbot author
r? @RalfJung

traviscross · 2024-07-03T20:18:50Z

@rustbot labels -I-lang-nominated

We discussed this in the lang call today. We believe that what's in this PR is implied by the change to the Reference here:

elaborate on slice wide pointer metadata reference#1499

We decided to hold the FCP over on that Reference PR.

Once FCP on that PR completes, this PR will be OK to move forward in terms of lang.

JohnCSimon · 2024-09-03T23:00:06Z

@scottmcm
ping from triage - can you post your status on this PR? This PR has not received an update in a few months.

Elaborate on the invariants for references-to-slices

f62ad99

rustbot assigned joboet Mar 4, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 4, 2024

rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label Mar 4, 2024

RalfJung added T-opsem Relevant to the opsem team T-lang Relevant to the language team and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 4, 2024

RalfJung reviewed Mar 4, 2024

View reviewed changes

joshlf mentioned this pull request Mar 5, 2024

[slice] Document slice DSTs, including size guarantees #117474

Closed

scottmcm mentioned this pull request Mar 23, 2024

Add assumes to slice length calls #122926

Closed

RalfJung reviewed May 12, 2024

View reviewed changes

This was referenced May 12, 2024

Decide on validity for metadata of wide pointer/reference with slice tail rust-lang/unsafe-code-guidelines#510

Closed

reference type safety invariant docs: clarification #125043

Merged

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 2, 2024

rustbot assigned RalfJung and unassigned joboet Jun 2, 2024

rustbot removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Jul 3, 2024

scottmcm closed this Oct 3, 2024

		@@ -1387,9 +1387,19 @@ mod prim_usize {}
		/// returning values from safe functions; such violations may result in undefined behavior. Where
		/// exceptions to this latter requirement exist, they will be called out explicitly in documentation.

Elaborate on the invariants for references-to-slices #121965

Elaborate on the invariants for references-to-slices #121965

Uh oh!

Conversation

scottmcm commented Mar 4, 2024

Uh oh!

rustbot commented Mar 4, 2024

Uh oh!

joboet commented Mar 4, 2024

Uh oh!

RalfJung commented Mar 4, 2024

Uh oh!

RalfJung Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jules-Bertholet commented Mar 4, 2024

Uh oh!

joshlf commented Mar 5, 2024

Uh oh!

joboet commented Mar 23, 2024

Uh oh!

scottmcm commented Mar 24, 2024

Uh oh!

RalfJung commented Mar 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saethlin commented Apr 4, 2024

Uh oh!

joshlf commented May 11, 2024

Uh oh!

saethlin commented May 11, 2024

Uh oh!

joshlf commented May 11, 2024

Uh oh!

RalfJung commented May 11, 2024 via email

Uh oh!

RalfJung May 12, 2024

Choose a reason for hiding this comment

Uh oh!

RalfJung May 12, 2024

Choose a reason for hiding this comment

Uh oh!

RalfJung commented May 23, 2024

Uh oh!

joboet commented Jun 2, 2024

Uh oh!

traviscross commented Jul 3, 2024

Uh oh!

JohnCSimon commented Sep 3, 2024

Uh oh!

Uh oh!

RalfJung Mar 4, 2024 •

edited

Loading

RalfJung commented Mar 24, 2024 •

edited

Loading