Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

Unitialized memory #42

Closed
SimonSapin opened this issue Jun 28, 2017 · 11 comments
Closed

Unitialized memory #42

SimonSapin opened this issue Jun 28, 2017 · 11 comments

Comments

@SimonSapin
Copy link

Under what conditions is it valid to use any of these?

  • let x: T = std::mem::uninitialized(); on the stack
  • Box::new(std::mem::uninitialized())
  • The part of Vec between .len() and .capacity()
  • The memory pointed to by alloc::heap::allocate() without first writing to it
  • (Possibly many other similar cases…)

Let’s assume !Drop (or that we use std::mem::forget and are being very careful about panic-safety), and types (like u32) for which all bit patterns are valid.

Reading uninitialized memory is Bad and should be avoided, but what’s the worst that could happen? Undefined values might fine in many cases. Or is it Undefined Behavior of the “the optimizer is allowed to eat your lunch and elide half your program” kind?

As a concrete example, consider reallocate which when copying reads from the source pointer. That data is not necessarily entirely initialized: Vec::with_capacity(10).reserve(100)

@Amanieu
Copy link
Member

Amanieu commented Jun 28, 2017

I would this one to your list:

  • Padding bytes between struct members

Ideally, it would be nice to ensure that simply moving/copying an uninitialized value is fine, as long as you don't "use" it. This includes passing it as a parameter to another function, again as long as that function doesn't "use" the value.

@SimonSapin
Copy link
Author

What counts as using, then? Writing to a TCP socket for example is fundamentally a copy.

@nagisa
Copy link
Member

nagisa commented Jun 28, 2017

Yeah, that would still be a copy, provided the socket is operating in cooked packet mode. In raw mode the OS will inspect at least the contents of the TCP header.

@RalfJung
Copy link
Member

So here's what miri implements; IMHO that's a good starting point and it should be mostly compatible with LLVM...

Every byte (we ignore bit-wise accesses for now; anyway Rust doesn't have bitfields) is either some value (0 <= x < 256) or "undefined" (or call it unitialized or whatever you like). Loading four undef bytes into a u32 in unsafe code is fine, that just makes the u32 itself "undefined". Same for storing them to memory. However, addition and any other operation is UB if any of the operands is undefined.

I invite you to play around with miri and run your toy examples though it; if you run into missing functionality, just report a bug. :) My goal is for miri to explicitly be a tool to test such questions; of course, there's still a long way to go.


Now, when we are talking abut safe code, I think this is related to #12 (comment). I would also like to propose that passing some safe external function that expects a u32 some "undef" value is UB; "undef" is not a valid inhabitant of u32. This is comparable to bool: In my proposal, storing an "invalid" value (say, 3) in a bool variable in unsafe code is NOT insta-UB as long as nobody uses that thing; however, a conditional branch on an invalid bool is UB. Still, a safe function can expect the bool it got as an argument to be valid, so the "contract" described by the type says that bool must be 0 or 1 and that u32 must be defined.

@eddyb
Copy link
Member

eddyb commented Jul 11, 2017

However, addition and any other operation is UB if any of the operands is undefined.

AFAIK that's poison or stronger, not undef. Addition on undef soundly produces undef - you need to feed it into an operation that has any conditions of validity, or a conditional branch.
Unless you mean nsw (signed integers in C) addition, because with unknown inputs it could produce UB, so maybe it "always" does? But I'm not sure that's the stance LLVM takes.
See also https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html for the future of LLVM.

@RalfJung
Copy link
Member

I was referring to miri's undef, which indeed in LLVM is closest to posion.

@SimonSapin
Copy link
Author

So it sounds like branching is key to triggering UB, but arithmetic "merely" propagates poisoned values?

in unsafe code […] passing some safe external function

I’m worried about this distinction. How is it defined? In an implementation of Vec for example there is plenty of code that is not directly in an unsafe {…} block or in an unsafe fn function or method, but is "unsafe" in the sense that it is responsible for maintaining some invariants.

@RalfJung
Copy link
Member

So it sounds like branching is key to triggering UB, but arithmetic "merely" propagates poisoned values?

That's a difference between LLVM poison and miri undef -- the latter is UB on arithmetic.

I’m worried about this distinction. How is it defined? In an implementation of Vec for example there is plenty of code that is not directly in an unsafe {…} block or in an unsafe fn function or method, but is "unsafe" in the sense that it is responsible for maintaining some invariants.

Good question. We haven't figured out all the details yet. Notice however that most of the time, thse functions assume additional invariants on top of what the type says, which would be fine with a model that checks if at least the normal type interpretation holds.

@SimonSapin
Copy link
Author

Paraphrasing http://shape-of-code.coding-guidelines.com/2017/06/18/how-indeterminate-is-an-indeterminate-value/ for brevity:

memcpy could not be implemented in conforming C90 because copying structs with uninitialized padding was undefined behavior. C99 added wording so that uninitialized unsigned char is still indeterminate but could not have a "trap representation": reading it is not UB. Still, the value of uninitialized bytes is allowed change with each access, as if it were volatile: unsigned char x; return x ^ x; is not guaranteed to return zero. (XOR returns zero when its two arguments are equal.)

@eddyb
Copy link
Member

eddyb commented Jul 15, 2017

@SimonSapin Instead of XOR one could also use x != x or x == x.

@RalfJung
Copy link
Member

The story of mem::uninitialized has progressed quite a lot in the last years (and that function has been deprecated and declared to be basically impossible to use correctly), so I will close this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants