Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify requirements of byte_{offset,add,sub} for zero-sized referents #133576

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jswrenn
Copy link
Member

@jswrenn jswrenn commented Nov 28, 2024

The safety documentation on core::ptr presents this rule:

For operations of size zero, every pointer is valid, including the null pointer.

However, due to the implementation details of byte_{offset,add,sub}, which involve intermediate operations on non-zero-sized referents, this rule does not apply to these methods.

This commit clarifies extends the over-arching rule with an "unless otherwise noted" caveat, and clarifies the documentation of byte_{offset,add,sub} to note that the only valid count for zero-sized referents is presently 0 (though I wonder if this requirement might be relaxed).

cc @rust-lang/opsem

@rustbot
Copy link
Collaborator

rustbot commented Nov 28, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 28, 2024
@saethlin
Copy link
Member

though I wonder if this requirement might be relaxed

I don't think it should be, I think of byte_ operations as doing arithmetic on the address while ignoring the pointee type. The whole point is ignoring the pointee type.

So I'm not sure what other behavior you would want for these methods on ZST pointees. Making them do nothing would be a breaking change.

@jswrenn
Copy link
Member Author

jswrenn commented Nov 28, 2024

I think something like this would both behave equivalently for non-zero-sized referents, and also soundly admit any count for zero-sized referents:

pub unsafe fn byte_offset(self, count: isize) -> *const T {
    let (addr, meta) = self.to_raw_parts();
    let addr = addr.map_addr(|addr| addr.wrapping_add_signed(count));
    super::from_raw_parts(addr as *const (), meta)
}

...but presumably optimize worse, since map_addr ultimately relies on intrinsics::arith_offset as opposed to intrinsics::offset.

@Amanieu
Copy link
Member

Amanieu commented Nov 28, 2024

The validity of offset doesn't care about the type of the pointer but rather the allocation it is derived from. For example, a byte_add on a ZST pointer is perfectly valid if that ZST is embedded within a larger non-ZST allocation.

@jswrenn
Copy link
Member Author

jswrenn commented Nov 28, 2024

Oh, good point! What do you think about this clarification instead:

Note that the usual guidance on operations on zero-sized referents does not apply here — for referents of all sizes, if count is non-zero, the address plus count must reside within the same allocation as self.

@RalfJung
Copy link
Member

RalfJung commented Nov 28, 2024

However, due to the implementation details of byte_{offset,add,sub}, which involve intermediate operations on non-zero-sized referents, this rule does not apply to these methods.

I am confused by this statement. The rule does apply there as well. The rule applies whenever the operation has size 0. For offset, the size is count * size_of::<T>(); for byte_offset, the size is count.

It seems to me that this PR is trying to fix a non-problem. So please explain the problem. :)

Comment on lines +449 to +451
/// safety requirements. Note that the usual guidance on operations on
/// zero-sized referents does not apply here — the only valid `count` for
/// zero-sized referents is `0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for pointer::offset don't even have any guidance about zero-sized references? In fact, neither operation involves any reference at all! Only raw pointers are involved.

I don't understand which problem you see with the current docs, and I think the new sentence is quite confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be clear, please undo all these changes to the bytewise offset methods in your PR, and only keep the change in library/core/src/ptr/mod.rs.

@RalfJung
Copy link
Member

RalfJung commented Nov 28, 2024

I think something like this would both behave equivalently for non-zero-sized referents,

No, this is not equivalent at all. It seems you misunderstood how offset works. This point is very important (emphasis mine):

"If the computed offset is non-zero, then self must be derived from a pointer to some allocated object, and the entire memory range between self and the result must be in bounds of that allocated object."

(For two operations to be equivalent, they must produce the same results, and they must also be UB under the same conditions.)

Your operation is equivalent to ptr.wrapping_byte_offset. But that operation is very different from ptr.byte_offset/ptr.offset!

and also soundly admit any count for zero-sized referents:

That's not even a goal for ptr.offset. I have no idea what you are trying to achieve.

Please write a problem statement explaining your reasoning. Why do you think (quoting from the PR description) "this rule does not apply to these methods"?

@jswrenn
Copy link
Member Author

jswrenn commented Nov 28, 2024

Re. goal: The background for this PR is I was reviewing a contract on byte_offset, when this over-arching guidance on pointers came to mind:

For operations of size zero, every pointer is valid, including the null pointer.

At a glance, this might be read as running counter to the guidance on ptr::offset, which does impose requirements on ZST pointers — they must be backed by a valid allocation. So, I would like to add an extra clarification to the documentation of pointer and these methods that "for ZSTs, every pointer is valid" doesn't mean that "every operation on a ZST pointer is valid".

Perhaps a more apt clarification would be:

For operations of size zero, every pointer is valid, including the null pointer. Note, this does not mean that every operation of size zero is valid.


Re. semantics:

Your operation is equivalent to ptr.wrapping_byte_offset. But that operation is very different from ptr.byte_offset/ptr.offset!

I understand the difference in pre-conditions between offset and wrapping_offset. Is there also a difference in their post conditions?

@RalfJung
Copy link
Member

RalfJung commented Nov 28, 2024

At a glance, this might be read as running counter to the guidance on ptr::offset, which does impose requirements on ZST pointers — they must be backed by a valid allocation

No, ptr::offset does not require ZST pointers to be backed by a valid allocation.

Maybe you mean ptr::byte_offset. If the offset is non-zero, the pointer must be backed by a valid allocation, even if the pointee type is a ZST. But that is already stated in the rule you quoted:

"For operations of size zero, every pointer is valid, including the null pointer."

The operation (0xab00 as *const()).byte_offset(100) does not have size zero. It has size 100. The pointee type has size zero, but that's not relevant.

So I don't think there is anything here that needs to be fixed.

@RalfJung
Copy link
Member

RalfJung commented Nov 28, 2024

Also note that

"For operations of size zero, every pointer is valid, including the null pointer."

does not even apply to offset. It is part of the definition of when a pointer is valid for an access. offset is not an access. offset does not link to that definition, so it is not clear to me why you think that this clause would be relevant for offset.

So IOW, (a) the rule you quoted is indeed correct even for offsets, under a reasonable interpretation of the idea of the "size of the operation", and (b) the docs do not even say that the rule applies to byte_offset or offset, so even if the rule were incorrect for offsets, there wouldn't by a problem.

@RalfJung
Copy link
Member

I understand the difference in pre-conditions between offset and wrapping_offset. Is there also a difference in their post conditions?

If the preconditions are met, the return values are the same. However, the fact that the preconditions are different already makes them fundamentally not equivalent, so it is not correct to state that your map_addr based version is equivalent to byte_offset. It would be correct to state that it is equivalent to wrapping_byte_offset.

@jswrenn
Copy link
Member Author

jswrenn commented Nov 29, 2024

The operation (0xab00 as *const()).byte_offset(100) does not have size zero. It has size 100. The pointee type has size zero, but that's not relevant.

Where the pointer docs say "For operations of size zero, every pointer is valid", "size zero" is actually a hyperlink to the reference section on zero sized types. This, to me, (incorrectly) suggested that the size of the referent was relevant in this case.

@Amanieu
Copy link
Member

Amanieu commented Nov 29, 2024

r? @RalfJung

@rustbot rustbot assigned RalfJung and unassigned Amanieu Nov 29, 2024
@RalfJung
Copy link
Member

RalfJung commented Nov 29, 2024

Where the pointer docs say "For operations of size zero, every pointer is valid"

Note the context of that statement. The paragraph just above says:

"Many functions in this module take raw pointers as arguments and read from or write to them. For this to be safe, these pointers must be valid for the given access. Whether a pointer is valid depends on the operation it is used for (read or write), and the extent of the memory that is accessed (i.e., how many bytes are read/written) – it makes no sense to ask “is this pointer valid”; one has to ask “is this pointer valid for a given access”. Most functions use *mut T and *const T to access only a single value, in which case the documentation omits the size and implicitly assumes it to be size_of::() bytes."

Why do you think this statement has anything to do with offset? It does not.

That paragraph also explains where the size comes from: some other parts of the docs will say something like "must be valid for reads of size N", and that is the size. Only if the size is not mentioned does it use the size of the type.

"size zero" is actually a hyperlink to the reference section on zero sized types.

Good point, we should probably remove that link.

@RalfJung RalfJung added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 29, 2024
@jswrenn
Copy link
Member Author

jswrenn commented Nov 29, 2024

Good point, we should probably remove that link.

Will do! Done!

@RalfJung I have a question about the preconditions of pointer::offset:

Safety

If any of the following conditions are violated, the result is Undefined Behavior:

  • The offset in bytes, count * size_of::<T>(), computed on mathematical integers (without
    "wrapping around"), must fit in an isize.

  • If the computed offset is non-zero, then self must be derived from a pointer to some
    [allocated object], and the entire memory range between self and the result must be in
    bounds of that allocated object. In particular, this range must not "wrap around" the edge
    of the address space.

Allocated objects can never be larger than isize::MAX bytes, so if the computed offset
stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement.
This implies, for instance, that vec.as_ptr().add(vec.len()) (for vec: Vec<T>) is always
safe.

If satisfying the second precondition implies that the first precondition is satisfied, is the first precondition entirely redundant?

The safety documentation on `core::ptr` presents this rule:

> For operations of size zero, every pointer is valid, including the null pointer.

However, due to the implementation details of `byte_{offset,add,sub}`,
which involve operations on non-zero-sized referents, this rule does
not apply to these methods.

This commit clarifies extends the over-arching rule with an "unless
otherwise noted" caveat, and clarifies the documentation of
`byte_{offset,add,sub}` to note that the only valid `count` for
zero-sized referents is presently `0`.
@jswrenn jswrenn force-pushed the pointer-byte-ops-zs-doc branch from c6d48d0 to e9c0f21 Compare November 29, 2024 18:10
@RalfJung
Copy link
Member

RalfJung commented Nov 29, 2024 via email

@jswrenn
Copy link
Member Author

jswrenn commented Nov 29, 2024

I think it would be reasonable to remove the redundancy. A SAFETY proof for an invocation of pointer::offset must concern itself with the real size of the allocation that self is derived from — not its theoretical maximum size. While the real allocation size will, necessarily, be less than the theoretical maximum, this is true of all allocations and "allocated object" is already a hyperlink for those who need a reminder of this (or any of the other qualities of allocations).

I'd like to reduce the safety preconditions of pointer::offset to this:

Safety

If the computed offset is non-zero, then self must be derived from a pointer to some
allocated object, and the entire memory range between self and the result must be in
bounds of that allocated object. In particular, this range must not "wrap around" the edge
of the address space.

@@ -15,7 +15,7 @@
//! The precise rules for validity are not determined yet. The guarantees that are
//! provided at this point are very minimal:
//!
//! * For operations of [size zero][zst], *every* pointer is valid, including the [null] pointer.
//! * For operations of size zero, *every* pointer is valid, including the [null] pointer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves a dangling [zst] markdown link further down this file, please also remove that.

@RalfJung
Copy link
Member

I'd like to reduce the safety preconditions of pointer::offset to this:

That won't do as it leaves it unclear how the offset is computed in terms of overflow. The current wording is the result of quite a bit of discussion to ensure it is understandable and unambiguous. I'm not super excited about the idea of re-litigating that discussion... but I guess you insist.

@rust-lang/opsem @rust-lang/libs-api @nikic what is your opinion on removing the mention of the isize limit from the list of requirements for offset? It is technically redundant since we require the offset to stay within an allocation, and no allocation can be bigger than isize::MAX anyway.

@nikic
Copy link
Contributor

nikic commented Nov 30, 2024

@RalfJung I don't think we can remove that part. There are two parts to the operation, one is the multiplication offset = count * size and the other part is addition of ptr + offset. We do need to explicitly specify that both count * size does not overflow and that ptr + offset is in bounds.

You could specify this as instead performing count operations of ptr += size where each addition must remain in bounds, but I think the current specification is clearer.

@jswrenn
Copy link
Member Author

jswrenn commented Nov 30, 2024

The second bullet point in the current safety specification includes:

In particular, this range must not "wrap around" the edge of the address space.

Is it possible for count * size to wrap around the address space without overflowing?

@jswrenn
Copy link
Member Author

jswrenn commented Nov 30, 2024

Or, put another way, the next sentence of the specification is:

Allocated objects can never be larger than isize::MAX bytes, so if the computed offset stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement.

I read this as saying "if you prove the second requirement, the first requirement holds by implication". Is that incorrect?

@RalfJung
Copy link
Member

There are two parts to the operation, one is the multiplication offset = count * size and the other part is addition of ptr + offset. We do need to explicitly specify that both count * size does not overflow and that ptr + offset is in bounds.

We can do the offset = count * size on unbounded mathematical integers, and then specify that ptr + offset must be in-bounds based on that.

@nikic
Copy link
Contributor

nikic commented Nov 30, 2024

There are two parts to the operation, one is the multiplication offset = count * size and the other part is addition of ptr + offset. We do need to explicitly specify that both count * size does not overflow and that ptr + offset is in bounds.

We can do the offset = count * size on unbounded mathematical integers, and then specify that ptr + offset must be in-bounds based on that.

Yes, if we specify that the whole operation works on unbounded integers, that works. I don't think that formulation is better than the current one though.

If we really want to change the way this is currently specified, I think the correct way to do it is to make byte_offset the fundamental operation (which only needs the second bullet point) and then offset is just a byte_offset of unchecked_mul.

(Incidentally, this is also how it ought to be implemented, but LLVM may not be ready for that yet in terms of codegen quality.)

@bors
Copy link
Contributor

bors commented Dec 31, 2024

☔ The latest upstream changes (presumably #134952) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants