clarify requirements of `byte_{offset,add,sub}` for zero-sized referents #133576

jswrenn · 2024-11-28T15:27:20Z

The safety documentation on core::ptr presents this rule:

For operations of size zero, every pointer is valid, including the null pointer.

However, due to the implementation details of byte_{offset,add,sub}, which involve intermediate operations on non-zero-sized referents, this rule does not apply to these methods.

This commit clarifies extends the over-arching rule with an "unless otherwise noted" caveat, and clarifies the documentation of byte_{offset,add,sub} to note that the only valid count for zero-sized referents is presently 0 (though I wonder if this requirement might be relaxed).

cc @rust-lang/opsem

rustbot · 2024-11-28T15:27:29Z

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

saethlin · 2024-11-28T19:35:28Z

though I wonder if this requirement might be relaxed

I don't think it should be, I think of byte_ operations as doing arithmetic on the address while ignoring the pointee type. The whole point is ignoring the pointee type.

So I'm not sure what other behavior you would want for these methods on ZST pointees. Making them do nothing would be a breaking change.

jswrenn · 2024-11-28T20:20:19Z

I think something like this would both behave equivalently for non-zero-sized referents, and also soundly admit any count for zero-sized referents:

pub unsafe fn byte_offset(self, count: isize) -> *const T {
    let (addr, meta) = self.to_raw_parts();
    let addr = addr.map_addr(|addr| addr.wrapping_add_signed(count));
    super::from_raw_parts(addr as *const (), meta)
}

...but presumably optimize worse, since map_addr ultimately relies on intrinsics::arith_offset as opposed to intrinsics::offset.

Amanieu · 2024-11-28T20:30:13Z

The validity of offset doesn't care about the type of the pointer but rather the allocation it is derived from. For example, a byte_add on a ZST pointer is perfectly valid if that ZST is embedded within a larger non-ZST allocation.

jswrenn · 2024-11-28T20:44:55Z

Oh, good point! What do you think about this clarification instead:

Note that the usual guidance on operations on zero-sized referents does not apply here — for referents of all sizes, if count is non-zero, the address plus count must reside within the same allocation as self.

RalfJung · 2024-11-28T21:35:30Z

However, due to the implementation details of byte_{offset,add,sub}, which involve intermediate operations on non-zero-sized referents, this rule does not apply to these methods.

I am confused by this statement. The rule does apply there as well. The rule applies whenever the operation has size 0. For offset, the size is count * size_of::<T>(); for byte_offset, the size is count.

It seems to me that this PR is trying to fix a non-problem. So please explain the problem. :)

RalfJung · 2024-11-28T21:37:56Z

library/core/src/ptr/const_ptr.rs

+    /// safety requirements. Note that the usual guidance on operations on
+    /// zero-sized referents does not apply here — the only valid `count` for
+    /// zero-sized referents is `0`.


The docs for pointer::offset don't even have any guidance about zero-sized references? In fact, neither operation involves any reference at all! Only raw pointers are involved.

I don't understand which problem you see with the current docs, and I think the new sentence is quite confusing.

So to be clear, please undo all these changes to the bytewise offset methods in your PR, and only keep the change in library/core/src/ptr/mod.rs.

RalfJung · 2024-11-28T21:40:18Z

I think something like this would both behave equivalently for non-zero-sized referents,

No, this is not equivalent at all. It seems you misunderstood how offset works. This point is very important (emphasis mine):

"If the computed offset is non-zero, then self must be derived from a pointer to some allocated object, and the entire memory range between self and the result must be in bounds of that allocated object."

(For two operations to be equivalent, they must produce the same results, and they must also be UB under the same conditions.)

Your operation is equivalent to ptr.wrapping_byte_offset. But that operation is very different from ptr.byte_offset/ptr.offset!

and also soundly admit any count for zero-sized referents:

That's not even a goal for ptr.offset. I have no idea what you are trying to achieve.

Please write a problem statement explaining your reasoning. Why do you think (quoting from the PR description) "this rule does not apply to these methods"?

jswrenn · 2024-11-28T22:10:59Z

Re. goal: The background for this PR is I was reviewing a contract on byte_offset, when this over-arching guidance on pointers came to mind:

For operations of size zero, every pointer is valid, including the null pointer.

At a glance, this might be read as running counter to the guidance on ptr::offset, which does impose requirements on ZST pointers — they must be backed by a valid allocation. So, I would like to add an extra clarification to the documentation of pointer and these methods that "for ZSTs, every pointer is valid" doesn't mean that "every operation on a ZST pointer is valid".

Perhaps a more apt clarification would be:

For operations of size zero, every pointer is valid, including the null pointer. Note, this does not mean that every operation of size zero is valid.

Re. semantics:

Your operation is equivalent to ptr.wrapping_byte_offset. But that operation is very different from ptr.byte_offset/ptr.offset!

I understand the difference in pre-conditions between offset and wrapping_offset. Is there also a difference in their post conditions?

RalfJung · 2024-11-28T22:21:02Z

At a glance, this might be read as running counter to the guidance on ptr::offset, which does impose requirements on ZST pointers — they must be backed by a valid allocation

No, ptr::offset does not require ZST pointers to be backed by a valid allocation.

Maybe you mean ptr::byte_offset. If the offset is non-zero, the pointer must be backed by a valid allocation, even if the pointee type is a ZST. But that is already stated in the rule you quoted:

"For operations of size zero, every pointer is valid, including the null pointer."

The operation (0xab00 as *const()).byte_offset(100) does not have size zero. It has size 100. The pointee type has size zero, but that's not relevant.

So I don't think there is anything here that needs to be fixed.

RalfJung · 2024-11-28T22:24:04Z

Also note that

"For operations of size zero, every pointer is valid, including the null pointer."

does not even apply to offset. It is part of the definition of when a pointer is valid for an access. offset is not an access. offset does not link to that definition, so it is not clear to me why you think that this clause would be relevant for offset.

So IOW, (a) the rule you quoted is indeed correct even for offsets, under a reasonable interpretation of the idea of the "size of the operation", and (b) the docs do not even say that the rule applies to byte_offset or offset, so even if the rule were incorrect for offsets, there wouldn't by a problem.

RalfJung · 2024-11-28T22:26:16Z

I understand the difference in pre-conditions between offset and wrapping_offset. Is there also a difference in their post conditions?

If the preconditions are met, the return values are the same. However, the fact that the preconditions are different already makes them fundamentally not equivalent, so it is not correct to state that your map_addr based version is equivalent to byte_offset. It would be correct to state that it is equivalent to wrapping_byte_offset.

jswrenn · 2024-11-29T00:06:01Z

The operation (0xab00 as *const()).byte_offset(100) does not have size zero. It has size 100. The pointee type has size zero, but that's not relevant.

Where the pointer docs say "For operations of size zero, every pointer is valid", "size zero" is actually a hyperlink to the reference section on zero sized types. This, to me, (incorrectly) suggested that the size of the referent was relevant in this case.

Amanieu · 2024-11-29T00:19:52Z

r? @RalfJung

RalfJung · 2024-11-29T06:30:43Z

Where the pointer docs say "For operations of size zero, every pointer is valid"

Note the context of that statement. The paragraph just above says:

"Many functions in this module take raw pointers as arguments and read from or write to them. For this to be safe, these pointers must be valid for the given access. Whether a pointer is valid depends on the operation it is used for (read or write), and the extent of the memory that is accessed (i.e., how many bytes are read/written) – it makes no sense to ask “is this pointer valid”; one has to ask “is this pointer valid for a given access”. Most functions use *mut T and *const T to access only a single value, in which case the documentation omits the size and implicitly assumes it to be size_of::() bytes."

Why do you think this statement has anything to do with offset? It does not.

That paragraph also explains where the size comes from: some other parts of the docs will say something like "must be valid for reads of size N", and that is the size. Only if the size is not mentioned does it use the size of the type.

"size zero" is actually a hyperlink to the reference section on zero sized types.

Good point, we should probably remove that link.

jswrenn · 2024-11-29T17:50:53Z

Good point, we should probably remove that link.

~~Will do!~~ Done!

@RalfJung I have a question about the preconditions of pointer::offset:

Safety

If any of the following conditions are violated, the result is Undefined Behavior:

The offset in bytes, count * size_of::<T>(), computed on mathematical integers (without
"wrapping around"), must fit in an isize.

If the computed offset is non-zero, then self must be derived from a pointer to some
[allocated object], and the entire memory range between self and the result must be in
bounds of that allocated object. In particular, this range must not "wrap around" the edge
of the address space.

Allocated objects can never be larger than isize::MAX bytes, so if the computed offset
stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement.
This implies, for instance, that vec.as_ptr().add(vec.len()) (for vec: Vec<T>) is always
safe.

If satisfying the second precondition implies that the first precondition is satisfied, is the first precondition entirely redundant?

The safety documentation on `core::ptr` presents this rule: > For operations of size zero, every pointer is valid, including the null pointer. However, due to the implementation details of `byte_{offset,add,sub}`, which involve operations on non-zero-sized referents, this rule does not apply to these methods. This commit clarifies extends the over-arching rule with an "unless otherwise noted" caveat, and clarifies the documentation of `byte_{offset,add,sub}` to note that the only valid `count` for zero-sized referents is presently `0`.

RalfJung · 2024-11-29T18:11:44Z

It is technically redundant, yes. It is explicitly mentioned for clarity, since we don't expect people to know the maximal size of a Rust allocation by heart.

jswrenn · 2024-11-29T19:06:33Z

I think it would be reasonable to remove the redundancy. A SAFETY proof for an invocation of pointer::offset must concern itself with the real size of the allocation that self is derived from — not its theoretical maximum size. While the real allocation size will, necessarily, be less than the theoretical maximum, this is true of all allocations and "allocated object" is already a hyperlink for those who need a reminder of this (or any of the other qualities of allocations).

I'd like to reduce the safety preconditions of pointer::offset to this:

Safety

If the computed offset is non-zero, then self must be derived from a pointer to some
allocated object, and the entire memory range between self and the result must be in
bounds of that allocated object. In particular, this range must not "wrap around" the edge
of the address space.

RalfJung · 2024-11-30T10:25:43Z

library/core/src/ptr/mod.rs

@@ -15,7 +15,7 @@
 //! The precise rules for validity are not determined yet. The guarantees that are
 //! provided at this point are very minimal:
 //!
-//! * For operations of [size zero][zst], *every* pointer is valid, including the [null] pointer.
+//! * For operations of size zero, *every* pointer is valid, including the [null] pointer.


This leaves a dangling [zst] markdown link further down this file, please also remove that.

RalfJung · 2024-11-30T10:28:47Z

I'd like to reduce the safety preconditions of pointer::offset to this:

That won't do as it leaves it unclear how the offset is computed in terms of overflow. The current wording is the result of quite a bit of discussion to ensure it is understandable and unambiguous. I'm not super excited about the idea of re-litigating that discussion... but I guess you insist.

@rust-lang/opsem @rust-lang/libs-api @nikic what is your opinion on removing the mention of the isize limit from the list of requirements for offset? It is technically redundant since we require the offset to stay within an allocation, and no allocation can be bigger than isize::MAX anyway.

nikic · 2024-11-30T12:16:02Z

@RalfJung I don't think we can remove that part. There are two parts to the operation, one is the multiplication offset = count * size and the other part is addition of ptr + offset. We do need to explicitly specify that both count * size does not overflow and that ptr + offset is in bounds.

You could specify this as instead performing count operations of ptr += size where each addition must remain in bounds, but I think the current specification is clearer.

jswrenn · 2024-11-30T12:25:22Z

The second bullet point in the current safety specification includes:

In particular, this range must not "wrap around" the edge of the address space.

Is it possible for count * size to wrap around the address space without overflowing?

jswrenn · 2024-11-30T12:36:27Z

Or, put another way, the next sentence of the specification is:

Allocated objects can never be larger than isize::MAX bytes, so if the computed offset stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement.

I read this as saying "if you prove the second requirement, the first requirement holds by implication". Is that incorrect?

RalfJung · 2024-11-30T12:38:29Z

There are two parts to the operation, one is the multiplication offset = count * size and the other part is addition of ptr + offset. We do need to explicitly specify that both count * size does not overflow and that ptr + offset is in bounds.

We can do the offset = count * size on unbounded mathematical integers, and then specify that ptr + offset must be in-bounds based on that.

nikic · 2024-11-30T14:33:14Z

There are two parts to the operation, one is the multiplication offset = count * size and the other part is addition of ptr + offset. We do need to explicitly specify that both count * size does not overflow and that ptr + offset is in bounds.

We can do the offset = count * size on unbounded mathematical integers, and then specify that ptr + offset must be in-bounds based on that.

Yes, if we specify that the whole operation works on unbounded integers, that works. I don't think that formulation is better than the current one though.

If we really want to change the way this is currently specified, I think the correct way to do it is to make byte_offset the fundamental operation (which only needs the second bullet point) and then offset is just a byte_offset of unchecked_mul.

(Incidentally, this is also how it ought to be implemented, but LLVM may not be ready for that yet in terms of codegen quality.)

bors · 2024-12-31T09:15:30Z

☔ The latest upstream changes (presumably #134952) made this pull request unmergeable. Please resolve the merge conflicts.

RalfJung · 2025-02-21T11:42:50Z

@jswrenn do you still plan to get back to this PR? If we want to give some guidance on ZST, it belongs in the pointer::offset docs, not in each reference to those docs. But IMO those docs already make that case sufficiently clear.

jswrenn · 2025-02-21T14:01:50Z

I'll close this PR since it's far afield from its original scope. But I do still think there's some improvements that could be made to offset. @nikic's comment:

Yes, if we specify that the whole operation works on unbounded integers, that works. I don't think that formulation is better than the current one though.

interests me, because the current formulation already includes the words "computed on mathematical integers". My sense is that the two bullet points of the current formulation could collapse into one, like so:

If the offset in bytes,count * size_of::<T>(), computed on mathematical integers (without "wrapping around"), is non-zero, then self must be derived from a pointer to some allocated object, and the entire memory range between self and the result must be in bounds of that allocated object.

RalfJung · 2025-02-21T14:16:29Z

Yes it can be collapsed, and in fact in MiniRust we do collapse this into a single check. It's non-trivial though, since when you used the term "result" in your version, that has to refer to "the result of adding the offset to the pointer, still using unbounded arithmetic", i.e. we need a notion of "pointer whose address part is an unbounded integer", which is a somewhat odd notion that doesn't otherwise exist. I therefore found it likely to be less confusing to first ensure we stay in the bounds of the underlying mathematical domain, and then separately deal with the in-bounds-of-allocation part.

rustbot assigned Amanieu Nov 28, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 28, 2024

RalfJung reviewed Nov 28, 2024

View reviewed changes

rustbot assigned RalfJung and unassigned Amanieu Nov 29, 2024

RalfJung added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 29, 2024

jswrenn force-pushed the pointer-byte-ops-zs-doc branch from c6d48d0 to e9c0f21 Compare November 29, 2024 18:10

RalfJung reviewed Nov 30, 2024

View reviewed changes

jswrenn closed this Feb 21, 2025

clarify requirements of byte_{offset,add,sub} for zero-sized referents #133576

clarify requirements of byte_{offset,add,sub} for zero-sized referents #133576

Uh oh!

Conversation

jswrenn commented Nov 28, 2024

Uh oh!

rustbot commented Nov 28, 2024

Uh oh!

saethlin commented Nov 28, 2024

Uh oh!

jswrenn commented Nov 28, 2024

Uh oh!

Amanieu commented Nov 28, 2024

Uh oh!

jswrenn commented Nov 28, 2024

Uh oh!

RalfJung commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

RalfJung Nov 30, 2024

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jswrenn commented Nov 28, 2024

Uh oh!

RalfJung commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Nov 28, 2024

Uh oh!

jswrenn commented Nov 29, 2024

Uh oh!

Amanieu commented Nov 29, 2024

Uh oh!

RalfJung commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jswrenn commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Safety

Uh oh!

RalfJung commented Nov 29, 2024 via email

Uh oh!

jswrenn commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Safety

Uh oh!

RalfJung Nov 30, 2024

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Nov 30, 2024

Uh oh!

nikic commented Nov 30, 2024

Uh oh!

jswrenn commented Nov 30, 2024

Uh oh!

jswrenn commented Nov 30, 2024

Uh oh!

RalfJung commented Nov 30, 2024

Uh oh!

nikic commented Nov 30, 2024

Uh oh!

bors commented Dec 31, 2024

Uh oh!

RalfJung commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jswrenn commented Feb 21, 2025

Uh oh!

RalfJung commented Feb 21, 2025

Uh oh!

Uh oh!

clarify requirements of `byte_{offset,add,sub}` for zero-sized referents #133576

clarify requirements of `byte_{offset,add,sub}` for zero-sized referents #133576

RalfJung commented Nov 28, 2024 •

edited

Loading

RalfJung commented Nov 28, 2024 •

edited

Loading

RalfJung commented Nov 28, 2024 •

edited

Loading

RalfJung commented Nov 28, 2024 •

edited

Loading

RalfJung commented Nov 29, 2024 •

edited

Loading

jswrenn commented Nov 29, 2024 •

edited

Loading

jswrenn commented Nov 29, 2024 •

edited

Loading

RalfJung commented Feb 21, 2025 •

edited

Loading