Skip to content

Conversation

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 11, 2024

In #110837, the offset intrinsic got changed to also allow a usize offset parameter. The intention is that this will do an unsigned multiplication with the size, and we have UB if that overflows -- and we also have UB if the result is larger than usize::MAX, i.e., if a subsequent cast to isize would wrap. The LLVM backend sets some attributes accordingly.

This updates the docs for add/sub to match that intent, in preparation for adjusting codegen to exploit this UB. We use this opportunity to clarify what the exact requirements are: we compute the offset using mathematical multiplication (so it's no problem to have an isize * usize multiplication, we just multiply integers), and the result must fit in an isize.
Cc @rust-lang/opsem @nikic

#130239 updates Miri to detect this UB.

sub still has some cases of UB not reflected in the underlying intrinsic semantics (and Miri does not catch): when we subtract usize::MAX, then after casting to isize that's just -1 so we end up adding one unit without noticing any UB, but actually the offset we gave does not fit in an isize. Miri will currently still not complain for such cases:

fn main() {
    let x = &[0i32; 2];
    let x = x.as_ptr();
    // This should be UB, we are subtracting way too much.
    unsafe { x.sub(usize::MAX).read() };
}

However, the LLVM IR we generate here also is UB-free. This is "just" library UB but not language UB.
Cc @saethlin; might be worth adding precondition checks against overflow on offset/add/sub?

Fixes #130211

@rustbot
Copy link
Collaborator

rustbot commented Sep 11, 2024

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 11, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 11, 2024

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

@RalfJung RalfJung changed the title Ptr offset unsigned ptr::add/sub: fix docs (do not claim equivalence with offset), and fix gap in Miri UB checks Sep 11, 2024
///
/// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`.
/// * The computed offset, `count * size_of::<T>()` bytes (using unbounded arithmetic),
/// must fit in an `isize`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the most clear and concise way to say "convert count and size_of::<T>() from whatever their types are into unbounded mathematical integers, multiply those, and check if the result fits in the value range of isize"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe flip it around? Could say the same thing as

If sizeof(T) > 0, then count <= isize::MAX / sizeof(T).

Alternatively, could say it as something like

usize::saturating_mul(count, size_of::<T>) fits in an isize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the docs on offset_from say

the absolute distance between the pointers, in bytes, computed on mathematical integers (without "wrapping around"), cannot overflow an isize

so maybe something like this could work

The offset in bytes (count * size_of::<T>()), computed on mathematical integers (without "wrapping around"), must fit in an isize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "mathematical multiplication" is the most intuitive version here -- good catch with the offset_from docs, I will follow that.

@lukas-code
Copy link
Member

To me this feels like we're changing the docs here to introduce new UB where there was none before. FWIW I'm in favor of this change and see how the new definition is much more useful, but I still feel that we should at least do a crater run (with an assert for the overflow) or something to see how widespread this misuse of add is in practice.

Copy link
Member

@lukas-code lukas-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byte_add/byte_sub methods also need to be updated, especially that byte_add also cannot go "backwards".

@RalfJung
Copy link
Member Author

To me this feels like we're changing the docs here to introduce new UB where there was none before. FWIW I'm in favor of this change and see how the new definition is much more useful, but I still feel that we should at least do a crater run (with an assert for the overflow) or something to see how widespread this misuse of add is in practice.

This is skirting the edge of clarifying docs vs introducing new UB. "must not overflow isize" arguably already made this UB before, but "convenience for .offset(count as isize)" contradicts that. Paging in @rust-lang/lang for awareness.

I'm not sure such a crater run would be meaningful though, it would also catch all the cases that already were UB before...

@RalfJung
Copy link
Member Author

RalfJung commented Sep 11, 2024

I have moved the Miri changes into a separate PR (#130239) so it is not held up by discussions around how to deal with the "kind of a breaking change" aspect of this.

I also incorporated the feedback. The wrapping_ methods also still have the wording around "convenience for ... as isize ...". There it is technically correct due to the wrapping semantics, but it might make sense to keep the docs consistent with the non-wrapping versions?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new wording is clearer.

@saethlin
Copy link
Member

might be worth adding precondition checks against overflow on offset/add/sub?

It is definitely worth adding. I'm working on it.

@WaffleLapkin
Copy link
Member

I also incorporated the feedback. The wrapping_ methods also still have the wording around "convenience for ... as isize ...". There it is technically correct due to the wrapping semantics, but it might make sense to keep the docs consistent with the non-wrapping versions?

I think this is a good idea. Looking at the docs now I think this "convinience for ... as isize" only adds confusion.

@RalfJung
Copy link
Member Author

All right so what's the process here? This is a library API so by default I would assume t-libs-api is responsible. This is fairly directly exposing a language primitive though, which is why I pinged t-lang above.

But still, let's nominate t-libs-api -- @rust-lang/libs-api are you okay with this change to the docs for our inbounds pointer arithmetic methods? This can be seen as a breaking change since we previously documented e.g. ptr.add(count) as convenience for ptr.offset(count as isize). This hasn't been true for a while, there are cases where the latter is fine but the former has UB: codegen assumes you're not able to move "backwards" even if count as isize would be negative. That's a useful assumption for codegen to have.

OTOH, the docs do say:

The computed offset, count * size_of::() bytes, must not overflow isize.

I would say if count * size_of::<T>() is usize::MAX then this overflows isize. So the docs kind of already say that ptr.add(usize::MAX) is UB, but they also say it is equivalent to ptr.offset(-1) which is not UB...

It seems unlikely that someone would rely on add with really big usize values to move the pointer backwards (why would they not use offset) -- unless of course they took our docs too literally. Therefore I hope this is acceptable breakage. Miri has already been adjusted to detect this UB and @saethlin is looking into the possibility of adding checks against this UB for debug builds.

@RalfJung RalfJung added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 12, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Sep 18, 2024 via email

@saethlin
Copy link
Member

The draft PR is #130251

@dtolnay dtolnay removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 19, 2024
@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Sep 21, 2024
@rfcbot
Copy link

rfcbot commented Sep 21, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@Amanieu Amanieu added S-waiting-on-fcp Status: PR is in FCP and is awaiting for FCP to complete. and removed S-waiting-on-team DEPRECATED: Use the team-based variants `S-waiting-on-t-lang`, `S-waiting-on-t-compiler`, ... labels Sep 24, 2024
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 1, 2024
@rfcbot
Copy link

rfcbot commented Oct 1, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@scottmcm
Copy link
Member

scottmcm commented Oct 1, 2024

With the FCP complete and no concerns,

@bors r+

(This looks fine to rollup as it's docs changes; #130251 is the one to which people might want to bisect.)

@bors
Copy link
Collaborator

bors commented Oct 1, 2024

📌 Commit bc3d072 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 1, 2024
@bors bors merged commit 97cdc8e into rust-lang:master Oct 1, 2024
@rustbot rustbot added this to the 1.83.0 milestone Oct 1, 2024
@RalfJung RalfJung deleted the ptr-offset-unsigned branch October 2, 2024 05:55
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. S-waiting-on-fcp Status: PR is in FCP and is awaiting for FCP to complete. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pointer addition semantics are unclear