Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about: Pointer-to-integer transmutes? #286

Open
RalfJung opened this issue Jun 7, 2021 · 130 comments
Open

What about: Pointer-to-integer transmutes? #286

RalfJung opened this issue Jun 7, 2021 · 130 comments
Labels
A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) C-open-question Category: An open question that we should revisit

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 7, 2021

Transmuting pointers to integers (i.e., not going through the regular cast) is a problem. This is demonstrated by the following silly example:

fn example(ptr: *const i32, cmp: usize) -> usize { unsafe {
  let mut storage: usize = 0;
  *(&mut storage as *mut _ as *mut *const i32) = ptr; // write at ptr type
  let val = storage; // read at int type (0)
  storage = val; // redundant write back (1)
  external_function(&storage); // just making sure the value in `storage` can be observed
  if val == cmp {
    return cmp; // could exploit integer equivalence (2)
  }
  return 0;
} }

Imagine executing this code on the Abstract Machine, taking into account that pointers have provenance, i.e., a ptr-to-int conversion loses information. Now what happens at point (0)? Here we read the data stored in storage at type usize. That data however is the ptr ptr, i.e., it has provenance. What should happen with that provenance at (0)?

  1. We could drop the provenance. That would basically mean that the load of storage acts like an implicit ptr-to-int cast. The problem with this approach is that we cannot remove the redundant write at (1): the value in val is different from what is stored in storage, since val has no provenance but the ptr stored in storage does! This is basically another version of https://bugs.llvm.org/show_bug.cgi?id=34548: ptr-to-int casts are not NOPs, and a ptr-int-ptr roundtrip cannot be optimized away. If a load, like at (0), can perform a ptr-to-int cast, now the same concerns apply here.
  2. We could preserve the provenance. Then, however, we end up with val having type usize and also having provenance, which is a big problem: the compiler might decide, at program point (2), to return val instead of return cmp (based on the fact that val == cmp), but if val could have provenance then this transformation is wrong! This is basically the isue at the heart of my blog post on provenance: == ignores provenance, so just because two values are equal according to == does not mean they can be used interchangeably in all circumstances.
  3. What other option is there? Well, we might make the load return poison -- effectively declaring ptr-to-int transmutes as UB.

The last option is what is being proposed to LLVM, along with a new "byte" type such that loading at type bN would preserve provenance, but loading at type iN would turn bytes with provenance into poison. On the flipside, no arithmetic or logical operations are possible on bN; that type represents "opaque bytes" with the only possible operations being load and store (and explicit casts to remove any provenance that might exist). This leads to a consistent model in which both redundant store elimination and GVN substitution on integer types (the optimizations mentioned above) are possible. I don't know any other way to resolve the contradiction that otherwise arises from doing both of these optimizations. However, the LLVM discussion is still in its early stages, and there were already a lot of responses that I have not read in detail yet. If this ends up being accepted, we on the Rust side will have to figure out if and how we can make use of the new "byte" type and its explicit casts (to pointers or integers).

This thread is about discussing how we need to restrict ptr-to-int transmutes when pointers have provenance but integers do not. See #287 for a discussion with the goal of avoiding provenance in the first place.

@RalfJung RalfJung added C-open-question Category: An open question that we should revisit A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) labels Jun 7, 2021
@RalfJung
Copy link
Member Author

RalfJung commented Jun 7, 2021

Currently, Miri will perform these implicit ptr-to-int casts in many situations, so just because code is fine under Miri does not mean it is fine under the proposed LLVM semantics. I intend to add a flag that will treat ptr-to-int transmutes as UB.

@Lokathor
Copy link
Contributor

Lokathor commented Jun 8, 2021

So, clarification question: Is this currently UB or will it eventually be UB under a future LLVM?

@Diggsey
Copy link

Diggsey commented Jun 8, 2021

The last option is what is being proposed to LLVM, along with a new "byte" type such that loading at type bN would preserve provenance, but loading at type iN would turn provenance into poison.

I'm not sure that's right - those are the proposed semantics for bitcast, but a new cast "bytecast" is also introduced that does not produce poison.

The frontend will always produce a bytecast, which can then be optimized into a more specific cast if necessary.

AFAICT, they don't intend to change what is UB with this change, just fix bugs and reduce the number of pointers that are considered to escape:

In our semantics, byte type carries provenance and copying bytes does not escape pointers, thereby benefiting alias analysis.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2021

So, clarification question: Is this currently UB or will it eventually be UB under a future LLVM?

That depends on whether you consider LLVM semantics to be defined by the docs or the implementation. ;)
The docs do not mention such UB, but the implementation is most likely buggy unless this is UB, as demonstrated by this series of examples. So in a sense this already is UB in the implementation.

I'm not sure that's right - those are the proposed semantics for bitcast, but a new cast "bytecast" is also introduced that does not produce poison.

Note that in the problematic line (0) in the code. there is no cast of any kind. So the question here is not the semantics of bitcast, or the semantics of bytecast, but the semantics of an i64-typed load that accesses memory where a pointer value (with provenance) has been stored.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 13, 2021

I came up with an example that, I think, demonstrates that ptr-to-int transmutes are truly broken. I write these as Rust code for better readability, but you should interpret these as LLVM IR programs.

Let's start with this program:

// Prepare some pointers to distinct allocated objects.
let mut p = [0];
let paddr = p.as_mut_ptr();
let mut q = [0];
let qaddr = q.as_mut_ptr();
// Set up a bit of storage with 2 ptrs to it: one usize-typed, one ptr-typed.
let mut storage = 0usize;
let storage_usize = &mut storage as *mut usize;
let storage_ptr = storage_usize as *mut *mut i32;

// Now comes the tricky bit.
*storage_ptr = paddr.wrapping_offset(1);
if *storage_usize == qaddr as usize {
  let val = qaddr as usize;
  *storage_usize = val;
  **storage_ptr = 1;
  println!("{}", q[0]);
}

The one weird bit that is happening here is that we store qaddr as usize somewhere and then transmute that integer to a pointer, and write to it (**storage_ptr = 1). So, I am assuming here that int-to-ptr transmutes are okay. Possibly there is a way that we could salvage ptr-to-int transmutes if we give up on int-to-ptr transmutes, but (a) that feels even less natural, (b) I have no idea how to do this, and (c) I am not sure if that would actually help anyone; the people transmuting between ptrs and ints probably assume that going both ways is fine. We have to lose one way, and I think losing ptr-to-int transmutes is "less weird".

The program also does a ptr-to-int transmute, namely when it loads *storage_usize to compare that with qaddr as usize. This is the heart of the example: I will assume that this is fine to do, and will then derive a contradiction -- I will give a series of optimizations that change the behavior of this program, even though all these optimizations arguably should be correct. Making ptr-to-int transmute return poison means that the if in the example compares poison with an integer, which is UB, resolving the contradiction.

If this program does not have UB, it has two possible behaviors:

  • Print "1" (if q immediately follows p in memory)
  • Print nothing (otherwise)

The first optimization is to exploit the == inside the if, replacing qaddr as usize by *storage_usize.

// Prepare some pointers to distinct allocated objects.
let mut p = [0];
let paddr = p.as_mut_ptr();
let mut q = [0];
let qaddr = q.as_mut_ptr();
// Set up a bit of storage with 2 ptrs to it: one usize-typed, one ptr-typed.
let mut storage = 0usize;
let storage_usize = &mut storage as *mut usize;
let storage_ptr = storage_usize as *mut *mut i32;

// Now comes the tricky bit.
*storage_ptr = paddr.wrapping_offset(1);
if *storage_usize == qaddr as usize {
  let val = *storage_usize; // this changed
  *storage_usize = val;
  **storage_ptr = 1;
  println!("{}", q[0]);
}

The second optimization removes the redundant store to *storage_usize, since we are only storing back what we just loaded. We then also remove the let val = ... since val is unused now.

// Prepare some pointers to distinct allocated objects.
let mut p = [0];
let paddr = p.as_mut_ptr();
let mut q = [0];
let qaddr = q.as_mut_ptr();
// Set up a bit of storage with 2 ptrs to it: one usize-typed, one ptr-typed.
let mut storage = 0usize;
let storage_usize = &mut storage as *mut usize;
let storage_ptr = storage_usize as *mut *mut i32;

// Now comes the tricky bit.
*storage_ptr = paddr.wrapping_offset(1);
if *storage_usize == qaddr as usize {
  // 2 lines got removed here.
  **storage_ptr = 1;
  println!("{}", q[0]);
}

Next, we replace the *storage_ptr by the value that has just been written there:

// Prepare some pointers to distinct allocated objects.
let mut p = [0];
let paddr = p.as_mut_ptr();
let mut q = [0];
let qaddr = q.as_mut_ptr();
// Set up a bit of storage with 2 ptrs to it: one usize-typed, one ptr-typed.
let mut storage = 0usize;
let storage_usize = &mut storage as *mut usize;
let storage_ptr = storage_usize as *mut *mut i32;

// Now comes the tricky bit.
*storage_ptr = paddr.wrapping_offset(1);
if *storage_usize == qaddr as usize {
  *paddr.wrapping_offset(1) = 1; // this changed
  println!("{}", q[0]);
}

And finally, we exploit that the only 2 writes that happen definitely do not have the provenance of q, so q remains unchanged, so we can replace the final q[0] by a constant 0:

// Prepare some pointers to distinct allocated objects.
let mut p = [0];
let paddr = p.as_mut_ptr();
let mut q = [0];
let qaddr = q.as_mut_ptr();
// Set up a bit of storage with 2 ptrs to it: one usize-typed, one ptr-typed.
let mut storage = 0usize;
let storage_usize = &mut storage as *mut usize;
let storage_ptr = storage_usize as *mut *mut i32;

// Now comes the tricky bit.
*storage_ptr = paddr.wrapping_offset(1);
if *storage_usize == qaddr as usize {
  *paddr.wrapping_offset(1) = 1;
  println!("{}", 0); // this changed
}

This final program now can print "0" if q immediately follows p in memory. That is not a possible behavior of the original program, so one of the optimizations was wrong -- or the source program had UB.

@digama0
Copy link

digama0 commented Jun 13, 2021

And finally, we exploit that the only 2 writes that happen definitely do not have the provenance of q, so q remains unchanged, so we can replace the final q[0] by a constant 0:

Correct me if I'm wrong, but I think that this step is not validated by the current stacked borrows model. Once paddr and qaddr are created, both pieces of memory are valid for writes by pointers from any source, so in particular it is valid to use a pointer pointer derived from paddr but numerically equal to qaddr to write to q.

I hesitate to bring this up again, because it has already been discussed at some length after your provenance blog post, but I think there is a reasonably easy to understand model for pointers here that basically amounts to "pointers don't have provenance". We already have to support "wild" pointers for the case of manufacturing numeric addresses and dereferencing them, and the only thing that is needed to recover the majority of optimizations is to make memory only writable if there is a SharedRW(_|_) on the stack. The pointer itself is an untagged integral value.

This problem has been presented as a sort of Gordian knot with no good solution, but it's not like there aren't consistent (and comprehensible!) models for this stuff. I don't think losing this store forwarding optimization is a big deal, partly because it is very contrived, but also because it uses lots of pointer arithmetic and I think that in that case we should encourage people to use references when possible and otherwise do exactly what they wrote and expect the user to take the performance into their own hands.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 13, 2021

Correct me if I'm wrong, but I think that this step is not validated by the current stacked borrows model. Once paddr and qaddr are created, both pieces of memory are valid for writes by pointers from any source, so in particular it is valid to use a pointer pointer derived from paddr but numerically equal to qaddr to write to q.

No, that is not true. Once qaddr is created, it is legal to cast an int that is equal to qaddr to a ptr and use that to access q. However, that does not happen in the second-to-last program. Writes using a specific provenance still have to adhere to that provenance. The only writes that happen in the second-to-last program have known provenance (of storage and p, respectively); it would be UB for them to affect or depend on any other allocation, making this transformation legal. (This is basically the same as the last step in my provenance blog post.)

The pointer itself is an untagged integral value.

This is incompatible with many optimizations performed by LLVM. I don't think LLVM will switch to a provenance-free model any time soon, and so by extension Rust will not switch to a provenance-free model. In fact I think it is not possible to write a reasonably optimizing compiler for a model where pointers have no provenance. To my knowledge, every single optimizing compiler for languages such as C and C++ has provenance in its memory model. So, the burden of proof here is IMO on folks that don't like provenance: construct at least a reasonable prototype of a compiler that is correct for a provenance-free model. Even register allocation will be hard for such a compiler (since, in first approximation, any write through a pointer might affect any variable that ever had its address taken).

If you want to continue this discussion, please open a new thread (or maybe there already is one, I forgot). Questioning the existence of provenance itself is certainly off-topic in this issue.

This problem has been presented as a sort of Gordian knot with no good solution, but it's not like there aren't consistent (and comprehensible!) models for this stuff.

There aren't -- not if you also want to support a reasonable set of optimizations that modern compilers support. (Of course, there are easy models without provenance, but those are entirely unrealistic as basis for a compiler that wants to produce good assembly.) This is not just my opinion, it is the consensus of all the researchers that I know that work in this field.

Again, this is off-topic here. Please don't reply to this part inside this thread; create a new issue and link to it instead.

I don't think losing this store forwarding optimization is a big deal, partly because it is very contrived, but also because it uses lots of pointer arithmetic

EDIT: removed my previous reply since I think I misunderstood you. I think you mean the last optimization here, not the one that removes a redundant store. I'm not well-versed in compiler optimization pass jargon. ;)

That optimization is a representative example of something compilers do a lot. I have strong doubts that losing it is a realistic option, unless we want to give up on competing with C/C++.

@digama0

This comment has been minimized.

@Diggsey
Copy link

Diggsey commented Jun 13, 2021

One other possible resolution:

if *storage_usize == qaddr as usize {
  let val = *storage_usize; // this changed
  *storage_usize = val;
  **storage_ptr = 1;
  println!("{}", q[0]);
}

->

if *storage_usize == qaddr as usize {
  // 2 lines got replaced with this
  erase_provinence(*storage_usize);
  **storage_ptr = 1;
  println!("{}", q[0]);
}

Further transformations are prevented because the compiler can no longer prove that storage_ptr == paddr.wrapping_offset(1) due to the intervening erase_provinence.

Here erase_provinence behaves very similarly to the compiler_fence intrinsic, in that does not generate code, it just influences other optimizations passes.

(AIUI, this is basically saying that integer variables do not have provenance, but memory locations that happen to store integers may have provenance)

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member Author

@Diggsey thanks for continuing with the on-topic discussion!

Here erase_provinence behaves very similarly to the compiler_fence intrinsic, in that does not generate code, it just influences other optimizations passes.

Yes, this is an alternative proposal. However, it means that as far as the IR is concened, there is a write to *storage_usize that all the other optimizations need to treat as a proper write. So this negates most of the benefit of redundant store elimination.

@Diggsey
Copy link

Diggsey commented Jun 13, 2021

So this negates most of the benefit of redundant store elimination.

I think "most" is a little strong, at least without more evidence. It still eliminates the store from the resulting program, so the cost is only potentially missed optimizations from later passes.

Even then, it's not clear to me that erase_provenance prohibits any valid optimizations, since the only example we have is where the later optimization is actually wrong.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 13, 2021

To my knowledge, whether or not a piece of code performs a write and whether or not some pointer is written to is very useful information that can have large consequences in the optimizer.

Also note that in Rust we have quite a few language or library concepts that make no difference to the machine code (offset vs wrapping_offset, release/acquire vs sequentially consistent ordering on an x86 machine ...), and yet these concepts are still quite important. For example, to my knowledge the reason that bounds checks cost so much performance is not that the checks are slow on the CPU (the branch predictor can be instructed to assume they will all succeed); the actual cost of a bounds check is that it introduces tons of new control dependencies and thus grinds the optimizer and its ability to analyze the program and reorder instructions to a halt. The effect of some instruction on the optimizer can be more important than its effect in the final assembly.

I guess what I am saying in a round-about way is: this would need serious benchmarking, and it seems unlikely that we can do it with LLVM as our backend.

Even then, it's not clear to me that erase_provenance prohibits any valid optimizations, since the only example we have is where the later optimization is actually wrong.

The later optimization is perfectly fine, it is a standard alias analysis result. The original source program is wrong, IMO. Obviously my example is contrived, because real-world code doing things like this is just way too big to be considered in such detail.^^

For the IR, erase_provenance is a write no different from a "true" store, so every optimization that is prohibited by a true store is also prohibited by erase_provenance.

@Diggsey
Copy link

Diggsey commented Jun 13, 2021

One thing I'd like to clarify: "transmute" is not a thing in C++. The closest equivalent might be reinterpret_cast? However, C++ explicitly says that it is valid to reinterpret_cast a pointer to an intptr_t and back again.

So what you actually mean here is that type-punning via a store to memory of a pointer type, followed by a load at an integer type (or vice versa) should be considered UB, and that just happens to be how we define transmute right now?

@RalfJung
Copy link
Member Author

However, C++ explicitly says that it is valid to reinterpret_cast a pointer to an intptr_t and back again.

Oh, fun. I'll leave that to the C++/clang people to figure out. ;)

So what you actually mean here is that type-punning via a store to memory of a pointer type, followed by a load at an integer type (or vice versa) should be considered UB, and that just happens to be how we define transmute right now?

Yes. Type-punning raw ptr loads, union field accesses, and transmute are all equivalent operations in Rust. I don't see any benefit from making them different, since the first two of them cannot actually do anything to the data in a meaningful way as they have no idea what the "source type" of the data should be. So we have to solve this problem anyway, therefore making transmute behave differently than the others just increases complexity without solving the fundamental issue.

@thomcc
Copy link
Member

thomcc commented Jun 13, 2021

One thing I'd like to clarify: "transmute" is not a thing in C++. The closest equivalent might be reinterpret_cast?

reinterpret_cast between pointer and int is the same as the C-style cast, e.g. (intptr_t)ptr (The latter is defined in terms of the former. This is notably distinct from (and weaker than) transmute in many, many ways.

The closest thing to transmute is C++20's bit_cast, which is more-or-less identical to transmute. It's very new though, and probably is (also, it's a library function and not a builtin operator like reinterpret_cast, although this makes little difference in practice).

So, I don't think this will cause problems for C++ really, since IIUC we're not saying ptr as usize is invalid (the as-style cast remains fine), just that the transmute is invalid.

(Also, as you correctly mention, performing the transmute the way we do in these examples, where the underlying memory is interpreted as a different type, is UB in C++ for other reasons anyway)

@Diggsey
Copy link

Diggsey commented Jun 13, 2021

(Also, as you correctly mention, performing the transmute the way we do in these examples, where the underlying memory is interpreted as a different type, is UB in C++ for other reasons anyway)

Not always. All memory can be read and written via chars regardless of type - and if I've understood correctly, this is the reason for the attempt to introduce a "bytes" type to LLVM, so that Clang can use "bytes" when translating from a C++ char.

I don't really understand how user-implemented memcpy works in this model though, assuming it copies with granularity greater than char...

@thomcc
Copy link
Member

thomcc commented Jun 13, 2021

Yes, I'm aware of the exceptions around char (the "memcpy exception" and such). They don't apply here, since nothing like them is being used in the source in question (the rule around char doesn't apply indirectly, e.g. you can't use the memcpy exception to turn a T* into a U* which you then use as a U, unless that would already be allowed).

And user-implemented memcpy isn't allowed to copy with greater granularity than char for this reason, which is silly but true, and a good example of a place where Rust does a great deal better than C++ at reflecting semantics that real programs need to be efficient.

Anyway, we are well into the weeds at this point and probably off-topic (in a thread that's already had issues with staying on-topic).

@Diggsey
Copy link

Diggsey commented Jun 13, 2021

They don't apply here, since nothing like them is being used in the source in question

Maybe I'm misunderstanding, but couldn't you simply replace usize with [u8; size_of::<usize>()] in the example and have the same issue? And if you then translated that example into C++ it would not be allowed to be UB.

@thomcc
Copy link
Member

thomcc commented Jun 13, 2021

Oh, hm, probably. I guess I agree with Ralf then that that's, uh, gonna be a tricky one for the clang folks to work out.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 14, 2021

I don't really understand how user-implemented memcpy works in this model though, assuming it copies with granularity greater than char...

Whether and how memcpy can be implemented inside C at all is an interesting question -- and a "byte" type is probably part of the answer.

And if you then translated that example into C++ it would not be allowed to be UB.

There would be explicit casts between byte[8] and uintptr_t. Those casts are subject to similar restrictions as int-to-ptr casts: roundtrips cannot be optimized away. That makes at least one of the optimizations wrong, solving the problem.

@RalfJung
Copy link
Member Author

The latest/next C++ actually has an explicit std::byte type that would map very nicely to the LLVM type. We don't really have anything comparable in Rust, we use MaybeUninit<T> instead... so we should probably ensure that MaybeUninit<usize> becomes b64 (on a 64bit platform).

@thomcc
Copy link
Member

thomcc commented Jun 14, 2021

I'm pretty sure std::byte is semantically identical to all the other character types (like char, and unsigned char) except that it requires an explicit cast to convert to/from it (as it's defined as a enum class, which don't have implicit conversions).

The only places I can find where https://github.com/cplusplus/draft mentions it that aren't as part of a list of the other char types are where it describes which header it's found in and such.

So I think it's not really the same as the proposed llvm bytes type in any meaningful way, unless char/unsigned char also are.

@RalfJung
Copy link
Member Author

So I think it's not really the same as the proposed llvm bytes type in any meaningful way, unless char/unsigned char also are.

Yes, those are also LLVM byte under the current proposal.

@RalfJung
Copy link
Member Author

My main problem with stripping provenance implicitly is that it breaks the entire framework I have set up here: in that framework, if a list of bytes bytes can be "decoded" to a value val (like, a list of bytes with provenance can be decoded to an integer value), then during "encoding" val, it is legal for the machine to pick bytes as the encoding. There is a perfect symmetry here. I quite like that.

If we don't have that symmetry, we double the size of that file by having to define "encode" and "decode" separately for each type. That also means more work if we have to ensure any properties that relate these operations.

Note that having the symmetry while doing implicit provenance stripping is not an option: that means encoding an integer could synthesize arbitrary provenance, which violates the most fundamental property of provenance -- that it is "private" and can be "owned" by a function. This property is the reason why provenance exists in the first place (from a program logic / reasoning perspective), it is not negotiable.

making pointer-integer transmutes UB violates the "monotonicity property" that adding more provenance to a datum (in the sense of provenance that can access more memory) can only make your program "more defined": by adding provenance to an integer, we can introduce UB!

I also just now wondered if we can even have that property. Specifically, the property I was thinking of is: given an Abstract Machine state, if we alter that state by changing a single byte in memory to "increase" its provenance, then all executions that are possible from the altered machine state were also possible from the original machine state. This property certainly holds (has to hold!) when replacing an "uninit" byte by an arbitrary byte; do we also have it for increasing provenance?

I think the answer is no. The key question is how we define the "decode" operation for pointers when not all bytes have the same provenance:

  • We could say that is invalid. (That's what MiniRust does for now.) However, this means if we start with 8 bytes that have no provenance, and then add some provenance to one of those bytes, we turned a valid encoding of a ptr into an invalid encoding, thus violating the above property.
  • We could say that if there exists a "joint" provenance that merges them, then that is the provenance we use. As a special case, let's just say if some bytes have provenance and others do not, but for the bytes that do have provenance it's all the same provenance, then we use that provenance for the decoded ptr. But now we again violate the property: start with 8 bytes where the first 4 have no provenance and the last 4 have some provenance p1. Now change the machine state to give the first byte some provenance p2 (that is incompatible with p1, i.e., there exists no "joint" provenance). This turns a valid encoding of a ptr into an invalid one, thus violating the property.

If we truly don't have this property, then I am feeling strongly that we can not implicitly strip provenance when decoding integers. The combination of not having the property and implicit provenance stripping means that we cannot remove x=x; redundant assignments, and that sounds bad.

@RalfJung
Copy link
Member Author

RalfJung commented May 25, 2022

@digama0 proposed a different "decode" operation that uses the meet of all provenances (the "least common provenance") when decoding a pointer and the bytes do not all have the same provenance. (Basically: if any byte's provenance is None, the resulting pointer has None provenance.) That "decode" I think indeed satisfies the monotonicity property: if we increase the provenance in a byte, then either the meet with the other involved provenances will stay the same, or it will increase itself, which is fine.

Algebraically this means we have to require a "meet" to exist on Option<Provenance>, but since one can always use None that does not seem like a strong restriction

So maybe we have to choose between "encode and decode form an adjoint" and "provenance monotonicity".
I cannot make up my mind wrt which property I consider more natural / more important.

@RalfJung
Copy link
Member Author

RalfJung commented May 27, 2022

So maybe we have to choose between "encode and decode form an adjoint" and "provenance monotonicity".
I cannot make up my mind wrt which property I consider more natural / more important.

Turns out there is a way to also get an adjoint property with implicit provenance stripping. So the algebraic property is not at risk.

What remains is the question whether having provenance stripped implicitly is a good idea. (Also see this Zulip discussion.)
Both @JakobDegen and @digama0 are in favor of that. What is less clear to me is the motivation for this: is it "just" to make fewer programs UB? That's a good reason of course, but in this case I am not sure if those programs are actually very relevant. Ptr-int-ptr transmutation roundtrips are UB as there is no avoiding that, so the programs we are saving here are those that do such transmutes but never deref the resulting pointer.
Specifically, the counterexample against roundtrips here has to be UB, the question is just:

  • are the ptr-to-int transmutes UB (when computing left_int, right_int)
  • or is dereferencing left_ptr UB (because the pointer has None provenance)

I am slightly worried we will be confusing people if we tell them these transmutes are okay but the resulting pointer is garbage. Though maybe I am overestimating the difference between that, and telling them the transmutes are UB -- many will also find that confusing.
On the other hand, I do like the appeal of implementing addr via transmute, which "transmutation strips provenance" would let us do. It is nicely symmetric with implementing ptr::invalid via transmute as well. Basically this would mean the strict provenance operations can all be implemented with the base features of the language, no new intrinsic required. For me personally that is so far the best argument in favor of implicit provenance stripping.

@Lokathor
Copy link
Contributor

If I had to pick, "the transmute is UB" is probably easier to teach.

@thomcc
Copy link
Member

thomcc commented May 27, 2022

This could be a good case to apply https://gankra.github.io/blah/tower-of-weakenings/ though, since it sounds a lot like the difference between "a painfully strict and simple model that you can teach and check" and "messier to allow for Useful Crimes".

@RalfJung
Copy link
Member Author

Yeah, I had the same thought -- we can tell people ptr2int transmutes are "tricky, don't do them", and then if people care about the fine-print we can explain the details of what exactly is and is not allowed. In this case we don't know the Useful Crimes yet, but who knows what people will come up with.

@RalfJung
Copy link
Member Author

After all these discussions, I think I now agree with those who said that ptr2int transmutation should strip provenance, i.e., it should be equivalent to ptr.addr(). I have updated MiniRust accordingly.

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 6, 2022
implement ptr.addr() via transmute

As per the discussion in rust-lang/unsafe-code-guidelines#286, the semantics for ptr-to-int transmutes that we are going with for now is to make them strip provenance without exposing it. That's exactly what `ptr.addr()` does! So we can implement `ptr.addr()` via `transmute`. This also means that once rust-lang#97684 lands, Miri can distinguish `ptr.addr()` from `ptr.expose_addr()`, and the following code will correctly be called out as having UB (if permissive provenance mode is enabled, which will become the default once the [implementation is complete](rust-lang/miri#2133)):

```rust
fn main() {
    let x: i32 = 3;
    let x_ptr = &x as *const i32;

    let x_usize: usize = x_ptr.addr();
    // Cast back an address that did *not* get exposed.
    let ptr = std::ptr::from_exposed_addr::<i32>(x_usize);
    assert_eq!(unsafe { *ptr }, 3); //~ ERROR Undefined Behavior: dereferencing pointer failed
}
```

This completes the Miri implementation of the new distinctions introduced by strict provenance. :)

Cc `@Gankra` -- for now I left in your `FIXME(strict_provenance_magic)` saying these should be intrinsics, but I do not necessarily agree that they should be. Or if we have an intrinsic, I think it should behave exactly like the `transmute` does, which makes one wonder why the intrinsic should be needed.
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jun 6, 2022
… r=oli-obk

interpret: better control over whether we read data with provenance

The resolution in rust-lang/unsafe-code-guidelines#286 seems to be that when we load data at integer type, we implicitly strip provenance. So let's implement that in Miri at least for scalar loads. This makes use of the fact that `Scalar` layouts distinguish pointer-sized integers and pointers -- so I was expecting some wild bugs where layouts set this incorrectly, but so far that does not seem to happen.

This does not entirely implement the solution to rust-lang/unsafe-code-guidelines#286; we still do the wrong thing for integers in larger types: we will `copy_op` them and then do validation, and validation will complain about the provenance. To fix that we need mutating validation; validation needs to strip the provenance rather than complaining about it. This is a larger undertaking (but will also help resolve rust-lang/miri#845 since we can reset padding to `Uninit`).

The reason this is useful is that we can now implement `addr` as a `transmute` from a pointer to an integer, and actually get the desired behavior of stripping provenance without exposing it!
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 6, 2022
…=oli-obk

interpret: better control over whether we read data with provenance

The resolution in rust-lang/unsafe-code-guidelines#286 seems to be that when we load data at integer type, we implicitly strip provenance. So let's implement that in Miri at least for scalar loads. This makes use of the fact that `Scalar` layouts distinguish pointer-sized integers and pointers -- so I was expecting some wild bugs where layouts set this incorrectly, but so far that does not seem to happen.

This does not entirely implement the solution to rust-lang/unsafe-code-guidelines#286; we still do the wrong thing for integers in larger types: we will `copy_op` them and then do validation, and validation will complain about the provenance. To fix that we need mutating validation; validation needs to strip the provenance rather than complaining about it. This is a larger undertaking (but will also help resolve rust-lang/miri#845 since we can reset padding to `Uninit`).

The reason this is useful is that we can now implement `addr` as a `transmute` from a pointer to an integer, and actually get the desired behavior of stripping provenance without exposing it!
@thomcc
Copy link
Member

thomcc commented Jul 2, 2022

So, is this pretty much settled as "pointers->integers is not expected to be UB, but you should assume it strips provenance", or is it more up in the air than that?

@RalfJung
Copy link
Member Author

RalfJung commented Jul 2, 2022

I have personally pretty much settled on that, yes. It's also what Miri implements now, modulo rust-lang/miri#2182.

I'd love to make this "official", but I am not sure what the right process is for that. Probably an RFC? Or just FCP'ing the strict provenance APIs into stabilization?

workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
caution against ptr-to-int transmutes

I don't know how strong of a statement we want to make here, but I am very concerned that the current docs could be interpreted as saying that ptr-to-int transmutes are just as okay as transmuting `*mut T` into an `&mut T`.

Examples [like this](rust-lang/unsafe-code-guidelines#286 (comment)) show that ptr-to-int transmutes are deeply suspicious -- they are either UB, or they don't round-trip properly, or we have to basically say that `transmute` will actively look for pointers and do all the things a ptr-to-int cast does (which includes a global side-effect of marking the pointed-to allocation as 'exposed').

Another alternative might be to simply not talk about them... but we *do* want people to use casts rather than transmutes for this.

Cc `@rust-lang/lang`
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
implement ptr.addr() via transmute

As per the discussion in rust-lang/unsafe-code-guidelines#286, the semantics for ptr-to-int transmutes that we are going with for now is to make them strip provenance without exposing it. That's exactly what `ptr.addr()` does! So we can implement `ptr.addr()` via `transmute`. This also means that once rust-lang/rust#97684 lands, Miri can distinguish `ptr.addr()` from `ptr.expose_addr()`, and the following code will correctly be called out as having UB (if permissive provenance mode is enabled, which will become the default once the [implementation is complete](rust-lang/miri#2133)):

```rust
fn main() {
    let x: i32 = 3;
    let x_ptr = &x as *const i32;

    let x_usize: usize = x_ptr.addr();
    // Cast back an address that did *not* get exposed.
    let ptr = std::ptr::from_exposed_addr::<i32>(x_usize);
    assert_eq!(unsafe { *ptr }, 3); //~ ERROR Undefined Behavior: dereferencing pointer failed
}
```

This completes the Miri implementation of the new distinctions introduced by strict provenance. :)

Cc `@Gankra` -- for now I left in your `FIXME(strict_provenance_magic)` saying these should be intrinsics, but I do not necessarily agree that they should be. Or if we have an intrinsic, I think it should behave exactly like the `transmute` does, which makes one wonder why the intrinsic should be needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) C-open-question Category: An open question that we should revisit
Projects
None yet
Development

No branches or pull requests