Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define: "padding bytes" #183

Closed
gnzlbg opened this issue Jul 31, 2019 · 14 comments · Fixed by #195
Closed

Define: "padding bytes" #183

gnzlbg opened this issue Jul 31, 2019 · 14 comments · Fixed by #195
Labels
A-padding Topic: Related to padding C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 31, 2019

We need a technical term for a byte that is not part of the "value representation" of a type, and we need to at least mention how these bytes currently behave, and provide some guarantees.

For example, a definition that would work today could be:

A "foo" byte is a byte that is not part of the value representation of a type; "foo" bytes can change without any external program action.

For example, the compiler inserts "foo" bytes in structs to pad struct fields to their alignment, or in unions to make the size of the union a multiple of its alignment.

The "foo bytes can change without any external program action" semantics is what we have today. We could guarantee more, but AFAICT right now we do not. I'd prefer if we could have some basic definition that reflects what happens today first, and punt any discussion about guaranteeing more about these bytes to a follow up discussion.

Question: can these "foo" bytes occur due to something that is not padding? If not, then we can call them padding bytes. I can imagine a general mechanism to tell the compiler that some byte of a type is a "foo" byte, similar to what we do for the NonNull and similar types today. That could be useful, for example, for mapping a [u8; N] to some C layout type, where you know the offsets of the bytes in the array that are padding bytes in C.

@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2019

A "foo" byte is a byte that is not part of the value representation of a type

Let me try to formalize this further: A byte at index i is a "foo" byte for type T if, for all value v and lists of bytes b such that v and b are related at T (let's write this Vrel_T(v, b)), changing b at index i to any other byte yields a b' such v and b' are related (Vrel_T(v, b')). IOW, the byte at index i is entirely ignored by Vrel_T (the value relation for T), and two lists of bytes that only differ in "foo" bytes relate to the same value(s), if any.

"foo" bytes can change without any external program action.

I really very strongly dislike magical clauses like this -- primarily because they are quite the opposite of operational. We should give an operational specification for an Abstract Machine that concretely says what happens on every program step. "can change" is not a term that can be used in this context.

And I also think it is unnecessary. In fact, my definition of a "typed copy" already takes care of padding bytes:

We can implement TypedMemory for any Memory as follows:

  • For typed_read, read ty.size() many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB.
  • For typed_write, pick any representation of val for ty, and call Memory::write. If no representation exists, we have UB.

In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type T), or when an argument of type T is passed to a function or a return value of type T returned from a function, this is called a "typed copy".
Its semantics is basically to do a typed_read at T followed by a typed_write at T.

One consequence of this definition is that if the byte at index i is a "foo" byte, it can have an arbitrary content after the copy because we just defined "foo" bytes as being irrelevant for the value representation relation. The typed_write can thus pick anything for the "foo" byte when turning a value into a list of bytes. In particular, that byte can become 0xUU ("uninitialized").

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

Let me try to formalize this further: A byte at index i is a "foo" byte for type T if, for all value v and lists of bytes b such that v and b are related at T (let's write this Vrel_T(v, b)), changing b at index i to any other byte yields a b' such v and b' are related (Vrel_T(v, b')). IOW, the byte at index i is entirely ignored by Vrel_T (the value relation for T), and two lists of bytes that only differ in "foo" bytes relate to the same value(s), if any.

This LGTM.

We should give an operational specification for an Abstract Machine that concretely says what happens on every program step.

Do we have a definition of what a program step is somewhere?

I'm not sure how these typed reads and writes interact with reads/writes via u8. Can the following code:

fn foo(x: &mut T) {
    // writes 42 to a padding byte:
    (&mut x as *mut _ as *mut u8).add(padding_byte_offset).write(42);
    let p0 = (&x as *const _ as *const u8).add(padding_byte_offset).read();
    assert_eq!(p0, 42); // CAN FAIL?
}

be optimized to:

fn foo(_x: &mut T) {  // x is unused

// write omitted, since it is writing to a padding byte:
// (&mut x as *mut _ as *mut u8).add(padding_byte_offset).write(42);

// read omitted, since it is reading from a padding byte:
// let p0 = (&x as *const _ as *const u8).add(padding_byte_offset).read();

// assert_eq!(p0, 42); can be optimized to assert_eq!(undef, 42); and to nothing from there:
}

?

These reads and writes are typed at u8, but I do think that these are optimizations that are worth consider performing.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

These reads and writes are typed at u8, but I do think that these are optimizations that are worth consider performing.

To expand on this, even if we were to optimize foo to:

fn foo(_x: &mut T) {  // x is unused

// actually perform the write
(&mut x as *mut _ as *mut u8).add(padding_byte_offset).write(42);

// actually perform the read
let p0 = (&x as *const _ as *const u8).add(padding_byte_offset).read();

assert_eq!(p0, 42); // PASSES
}

I think the following is definitely a valid transformation:

fn foo(_x: &mut T) {  // x is unused

// actually perform the write
(&mut x as *mut _ as *mut u8).add(padding_byte_offset).write(42);

// The compiler can read from x into the stack, and write back to x again,
// changing the value of the padding byte even tho no user code does this
let tmp = (x as *mut T).read();
(x as *mut T).write(tmp); // modifies padding

// actually perform the read -> does not read 42
let p0 = (&x as *const _ as *const u8).add(padding_byte_offset).read();

assert_eq!(p0, 42);  // PROBABLY FAIL
}

@hanna-kruppe
Copy link

@gnzlbg "this u8 access reads/writes to a padding byte" is nonsensical, not just by the definition Ralf proposed but also by the more general principle for Rust to not have "typed memory". No memory location inherently "is of" a specific type T, so you can't say some byte is always padding, you can only talk about padding when accessing the memory as a specific type. But in the moments you're concerned with here, all the accesses are with type u8, which has no padding. So deleting the load and assertion is fine, not because of padding, but just because there's no intervening store that could change that byte to a different value than 42. Deleting the store is (probably) not fine, because e.g. the caller could then read the same memory at type u8 again and expect 42 to be written there.

I think the following is definitely a valid transformation:

Why should it? The optimizer should be allowed to insert extra loads and idempotent stores where desired, but by the proposed semantics this is not an idempotent store. The purpose of allowing extra loads and stores to be inserted (enabling useful optimizations) is served perfectly well by inserting loads and stores that don't reset part of the memory to undef (and don't assert valididity or do any of the other things that Rust-level typed accesses imply), which is indeed what vanilla LLVM loads/stores will do.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

No memory location inherently "is of" a specific type T, so you can't say some byte is always padding,

I'd expected that every time a &T or &mut T is used in any way we would have an assume(padding bytes of T are undef).

That would mean:

fn foo(x: &mut T) {
    // writes 42 to a padding byte that was undef:
    (&mut x /*assume_padding_undef*/ as *mut _ as *mut u8).add(padding_byte_offset).write(42);
    // ^^^^ because the following read makes the write unobservable, the write can be removed
    let p0 = (&x /* assume_padding_undef */ as *const _ as *const u8).add(padding_byte_offset).read();
    // ^^^^ p0 is undef here
    assert_eq!(p0, 42); // => assert_eq!(undef, 42);
}

so foo can be optimized to fn foo(x: &mut T) {}.

@hanna-kruppe
Copy link

I see. That seems scary though (a read of a reference would write to the referenced memory!?!) and even more aggressive than "reference validity asserts validity of the referenced memory", which already seems undesirable. I also don't see how it helps with any useful optimizations. Code like you've shown above is rather artificial. It wouldn't even extend e.g. to an implemented-in-Rust memcpy because there you'd cast the reference to a *u8 at the start and never even handle a typed reference again.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

That seems scary though (a read of a reference would write to the referenced memory!?!)

I would prefer to frame this as part of the validity of T. In the same way that &T can be assumed to be nonnull, the padding bytes of T can always be assumed to be undef. I don't know what a good way to frame this would be. We require the bytes of &T to not be null, but requiring the padding bytes of T to be undef is too strong and not what I mean.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

It wouldn't even extend e.g. to an implemented-in-Rust memcpy because there you'd cast the reference to a *u8 at the start and never even handle a typed reference again.

If memcpy is inlined, I think it could make a difference. AFAIK it is not possible to inline memcpy right now, but that's probably an orthogonal problem.

@hanna-kruppe
Copy link

hanna-kruppe commented Jul 31, 2019

I would prefer to frame this as part of the validity of T. In the same way that &T can be assumed to be nonnull, the padding bytes of T can always be assumed to be undef. I don't know what a good way to frame this would be. We require the bytes of &T to not be null, but requiring the padding bytes of T to be undef is too strong and not what I mean.

I don't see a way to make it work either (valdity is a proposition, and nothing can be said about padding -- it's for pruning execution paths, not for introducing more paths by introducing more non-determinism). Furthermore, even if you could shoehorn it into validity, it's far from certain that

  1. validity of the reference would entail validity of the referenced memory (Validity of references: Memory-related properties #77)
  2. the (eventual) idiomatic way to turn a reference into a raw pointer would assert validity of the reference at all (I believe with RFC for an operator to take a raw reference rfcs#2582, &raw mut *x and &raw const *x wouldn't assert validity of x)

So if this is your justification for the optimization you want, it's on incredibly shaky ground from the get-go.

I also still don't understand the motivation for wanting this. What's an actually useful optimization enabled by this? Practically all copies and memory accesses are typed in idiomatic Rust, which should justify practically all optimizations. The ones that aren't justified are those where the code explicitly fiddles with bytes, which means some systems shenanigans are afoot where spraying undef around liberally would usually be antagonistic rather than useful to programmers.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

it's for pruning execution paths, not for introducing more paths by introducing more non-determinism)

I actually see this as something that removes non-determinism, e.g., it makes (&T as *const T).add(padding_byte_offset).read() consistently returns "uninitialized". Without this, that read might return "uninitialized" or might return something else, depending on what the memory contains (can this change depending on the optimization level?).

The ones that aren't justified are those where the code explicitly fiddles with bytes, which means some systems shenanigans are afoot where spraying undef around liberally would usually be antagonistic rather than useful to programmers.

If people write code with undefined behavior, we might miss optimize it. I don't know how much careful (if at all) people would need to become to avoid undefined behavior with such a model, and if this makes it harder to write code without UB then it might not be worth it. I do see a lot of generic code doing raw byte manipulations out there, and right now programmers have no tools to manually perform these optimizations. It is unclear to me how feasible would it be to provide such tools, but it does appear quite hard (e.g. a reliable or best effort way to obtain whether a byte was last accessed or will be next accessed as a padding byte).

What's an actually useful optimization enabled by this? The ones that aren't justified are those where the code explicitly fiddles with bytes, which means some systems shenanigans are afoot where spraying undef around liberally would usually be antagonistic rather than useful to programmers.

Optimizations enabled by this only apply to code that's manipulating raw bytes. This:

fn foo(x: *mut u8, y: *const u8, sz: usize) { for i in 0..usize { x.add(i).write(y.read()); } }
fn bar<T>(x: &mut T, y: &T) {
    let x = x as *mut T as *mut _; 
    let y as *const T as *const _;
    let sz =  y.len() * size_of::<T>());
    for i in 0..sz { x.add(i).write(y.read()); 
}
#[repr(align(128))] struct A(u8);
pub fn baz(a: &A) -> A {
   let mut b: A = uninitialized();
   bar(&mut b, a);
   b
}

can be optimized to:

pub fn baz(a: &A) -> A {
   let mut b: A = uninitialized();
   (&b as *const _ as *const u8).write((a as *const _ as *const u8).read());
   b
}

only if you can assume that the trailing padding of a is undef. If you can't assume that, I don't think this optimization is sound, and the loop will copy 128 bytes. But if you can assume that, you only have to copy 1 byte.

@hanna-kruppe
Copy link

I actually see this as something that removes non-determinism,

Uh, yeah, nevermind that part, I had the LLVM undef semantics in mind for that, which Rust luckily won't have.

If people write code with undefined behavior, we might miss optimize it. I don't know how much careful (if at all) people would need to become to avoid undefined behavior with such a model, and if this makes it harder to write code without UB then it might not be worth it. I do see a lot of generic code doing raw byte manipulations out there, and right now programmers have no tools to manually perform these optimizations. It is unclear to me how feasible would it be to provide such tools, but it does appear quite hard (e.g. a reliable or best effort way to obtain whether a byte was last accessed or will be next accessed as a padding byte).

I don't really understand what you're trying to say here. Can you please rephrase?

In part you seem concerned about whether tools like miri or sanitizers can check for UB arising from e.g. writing to padding and expecting it to be preserved. But the approach Ralf laid out also says precisely when "padding is clobbered" (whenever a typed copy into a location is performed, padding of that byte is clobbered in that location) and this allows an interpreter like Miri to set those bytes to 0xUU which will flag later uses of that padding by the program. AFAICT the direction you're pursuing is only different in that it's much more aggressive in when this happens, not fundamentally more mechanizable.

Optimizations enabled by this only apply to code that's manipulating raw bytes. This:

Sure, I was also saying only this kind of code is affected. My questions are:

  • Why do we care about optimizing that kind of code? It seems rare and/or unidiomatic to me if we leave aside copy_nonoverlapping.
  • How sure are we those kinds of optimizations are desirable in such code anyway? It seems just as likely to break a clever low level bit fiddling trick as to actually accelerate the intended operation.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

AFAICT the direction you're pursuing is only different in that it's much more aggressive in when this happens, not fundamentally more mechanizable.

This is correct.

As you phrase it before, every operation on a reference (e.g. a read, a cast, etc.) would be like a write of undef to the padding bytes of a T. Writing undef to those bytes is an operation in the abstract machine. If we can make those writes avoid data-races, that would be fine by me. A tool like miri would need to perform those writes on a read, but that would allow more optimizations, and on real hardware those writes wouldn't happen.

Sure, I was also saying only this kind of code is affected.

I agree with this.

Why do we care about optimizing that kind of code? It seems rare and/or unidiomatic to me if we leave aside copy_nonoverlapping.

I think that creating *u8 pointers to repr(C) structs in Rust is not that rare, what's rare is wanting to modify the padding bytes of such type in any meaningful way. AFAICT that can only happen when writing generic code where the code cannot know better. Some operations are not necessary, and will be "invalidated" the first time the memory is accessed by a typed T, but the code cannot tell. Code explicitly wanting to modify the contents of padding bytes of a T is AFAICT always broken. There is no way to do that correctly, since ever copy/move/access to the type might invalidate the contents. That code needs to use a union with a [u8; mem::size_of::<T>()] field and work on that instead, and yet, even there, accessing the T typed union field can set the padding bytes to undef.

So I can only think of of copy_nonoverlapping-like code that makes sense here. All other code doing this is probably incorrect anyways. I don't know if miri could diagnose that code earlier if every use would write undef to padding bytes. It probably be able to diagnose that code at some point, and it is unclear to me whether the cost of those writes is worth a potentially earlier diagnotic.

How sure are we those kinds of optimizations are desirable in such code anyway? It seems just as likely to break a clever low level bit fiddling trick as to actually accelerate the intended operation.

I don't know if the above answers this question. But I couldn't think of any clever low level code that can correctly do something useful with padding bytes without putting it in an union, which would block the optimization, and that isn't copy_nonoverlapping-like.

@RalfJung
Copy link
Member

Do we have a definition of what a program step is somewhere?

No, defining the program steps would be basically defining the Abstract Machine so that's years off.
Or do you mean the general term "step"? I just mean that in the sense of operational semantics.

I don't see a way to make it work either (valdity is a proposition, and nothing can be said about padding -- it's for pruning execution paths, not for introducing more paths by introducing more non-determinism).

Agreed. Magically resetting memory behind a reference could in principle be done as part of Stacked Borrows Retag, but, uh, that seems very extreme.

I actually see this as something that removes non-determinism, e.g., it makes (&T as *const T).add(padding_byte_offset).read() consistently returns "uninitialized".

It does indeed not add non-determinism, but it does add UB. It adds UB in very non-trivial raw-byte-level memory-manipulating code. I am already extremely worried about the kind of rules Stacked Borrows will impose on such code, but there at least we have a very strong motivation. But what you are proposing here goes way beyond that, it IMO makes it a lot harder to think about what happens when working with raw pointers in Rust, and the gain is nowhere near what our aliasing rules are giving us.

I think Rust basically exhausted its "strange UB" budget with the aliasing rules. Similar to how the type system was stripped down partially because the borrow checker exhausted that complexity budget (e.g. we don't have an effect system any more), I think we should strive to simplify UB wherever we can. In particular, we should consider C/C++ as the upper bound for how much UB is reasonable. Otherwise we risk becoming the "new C" in terms of "wtf are these people thinking making all this UB" -- and with rules like what is being proposed here, I could not disagree.

The validity invariant / value representation relation can be seen as an extension of the "trap representation" story of C, and under my proposals that also incorporates C's handling of padding (if we pick the right representation relation for unions). In particular I feel like #180 is very C-like, and the "value domain" proposal could also be used as the basis for a more formal version of the C value/object model. True, C does not have a "validity invariant" the way Rust does for enums or bool, but the basic concept is not entirely foreign to the language either.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

No, defining the program steps would be basically defining the Abstract Machine so that's years off.
Or do you mean the general term "step"?

I think I meant something more rudimentary, e.g., like C sequence points.

I don't disagree with the rest that you mention. It's a trade-off. It is unclear to me whether it would be worth it as well.

@RalfJung RalfJung added C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary A-padding Topic: Related to padding labels Aug 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-padding Topic: Related to padding C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants