Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the assumptions we make about the C standard library, that go beyond what C requires #426

Closed
RalfJung opened this issue Jul 4, 2023 · 12 comments · Fixed by rust-lang/rust#114412

Comments

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2023

See https://reviews.llvm.org/D86993: LLVM, and therefore Rust, assume that memcpy, memmove, memset and possibly other C standard library functions satisfy properties which are not required by the C standard. The least we can do is document this properly. However, I don't know where.

It's not just LLVM though, Rust itself also makes extra assumptions. We explicitly allow zero-sized accesses on pointers such as 42 as *const u8, and this includes zero-sized copy_nonoverlapping, copy and write_bytes. So basically we require zero-sized memcpy, memmove, memset to be a NOP. (Technically it could still be UB for NULL, OOB, or UAF pointers, but we might want to change this on the Rust side and aside from NULL it's not really possibly for implementations to exploit that.) This is justified by us emitting LLVM operations that explicitly say size 0 is a NOP -- but I am not sure what other backends are doing.

@chorman0773
Copy link
Contributor

chorman0773 commented Jul 4, 2023

I think there's a separate question of whether these are assumptions of Rust, or of rustc.

In theory, this is just rustc's implementation of these stdlib functions/intrinsics/operations, and another stdlib/compiler, e.g. lccc, could be more correct in its assumptions in terms of what C promises. So the semantic question is whether unsafe code users can rely on memcpy(a,b,n*size_of_val_raw(a)) and core::ptr::copy_nonoverlapping(b,a,n) being exactly identical.

If these are indeed assumptions made by rustc, then they should be documented by rustc (and they can theoretically be anything rustc wants them to be).

@chorman0773
Copy link
Contributor

chorman0773 commented Jul 4, 2023

(To be clear, I'm not necessarily saying that lccc would limit its assumptions and generate more conservative code for copy_nonoverlapping et. al, but it could - the assumptions rustc makes here are not necessarily fundamental to Rust)

@RalfJung
Copy link
Member Author

RalfJung commented Jul 4, 2023

Ah fair, I was thinking of rustc. I don't think we want to say anything about what happens when Rust user code calls the C functions.

@chorman0773
Copy link
Contributor

Yeah - if it's just a rustc thing, then I don't think it should be T-opsem's job to instruct rustc (or any other compiler, for that matter) what assumptions it may assume about its environment, nor to document that what assumptions it should make - the documentation should probably exist, but IMO that's a job for T-compiler, not T-opsem.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 4, 2023

The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document). If people have major concerns with making this assumption, maybe we have to reconsider some of these choices?

But yeah mostly this is not t-opsem, but still fits the UCG I think.

@digama0
Copy link

digama0 commented Jul 4, 2023

Yeah this question confuses me, in what way does any of this impact opsem? These are C functions, not accessible from rust unless you use the libc crate. Rust has copy_nonoverlapping and this has fairly clear and obvious preconditions; I don't see how the C spec or the LLVM spec for memcpy impact the spec of rust at all.

@chorman0773
Copy link
Contributor

chorman0773 commented Jul 4, 2023

The relation to t-opsem is that being able to make this assumption is a prerequisite for the operational semantics we want to use (even the one we already stably document).

Not really, it just means that if rustc doesn't make this assumption then it's codegen+stdlib impl of those semantics are wrong, as it was when it was assuming it could generate an empty infinite loop in llvm back when infinite loops weren't treated well by llvm.

But yeah mostly this is not t-opsem, but still fits the UCG I think.

I'm not really sure of this either, since we aren't directly exposing this to unsafe code, or user rust code in any way.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 4, 2023

Hm I guess this is really mostly an LLVM implementation detail then. rustc doesn't even use memcpy/memmove/memset, it uses LLVM intrinsics; it is up to LLVM to implement them properly. I still feel we should document this assumption somewhere but still have no idea where...

If that is the consensus for this situation, there are some action items though:

@RalfJung
Copy link
Member Author

RalfJung commented Jul 5, 2023

Point 2 now has a PR at rust-lang/rust#113347, Point 1 is tracked in the Miri repo, so this can probably be closed.

@RalfJung RalfJung closed this as completed Jul 5, 2023
@RalfJung
Copy link
Member Author

Based on the reply I got in rust-lang/rust#113435, the Rust standard library is actually making such assumptions itself directly, not just via LLVM. This concerns the memcmp function, not just a language intrinsic. I don't think we want to special-case the standard library here -- either Miri will accept such pointers in memcmp or it won't.

So we need to either fix the standard library or find a suitable place to document this assumption.

@RalfJung RalfJung reopened this Jul 13, 2023
@chorman0773
Copy link
Contributor

As I mentioned, the rust standard library is privileged even if it isn't directly privileged, becaue it has the exhaustive list of targets it knows are supported, so this isn't necessarily an assumption that can be made by portable rust code.

In either case, the suitable place to document this, imo, is in a current implementation section on the stdlib's (core) documentation.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 5, 2023
document our assumptions about symbols provided by the libc

LLVM makes assumptions about `memcmp`, `memmove`, and `memset` that go beyond what the C standard guarantees -- see https://reviews.llvm.org/D86993. Since we use LLVM, we are inheriting these assumptions.

With rust-lang#114382 we are also making a similar assumption about `memcmp`, so I added that to the list.

Fixes rust-lang/unsafe-code-guidelines#426.
@RalfJung
Copy link
Member Author

RalfJung commented Sep 6, 2023

Fixed by rust-lang/rust#114412

@RalfJung RalfJung closed this as completed Sep 6, 2023
RalfJung pushed a commit to RalfJung/miri that referenced this issue Sep 6, 2023
document our assumptions about symbols provided by the libc

LLVM makes assumptions about `memcmp`, `memmove`, and `memset` that go beyond what the C standard guarantees -- see https://reviews.llvm.org/D86993. Since we use LLVM, we are inheriting these assumptions.

With rust-lang/rust#114382 we are also making a similar assumption about `memcmp`, so I added that to the list.

Fixes rust-lang/unsafe-code-guidelines#426.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants