Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MaybeUninit::uninit(); N] memsets memory #96274

Closed
SUPERCILEX opened this issue Apr 21, 2022 · 24 comments · Fixed by #98304
Closed

[MaybeUninit::uninit(); N] memsets memory #96274

SUPERCILEX opened this issue Apr 21, 2022 · 24 comments · Fixed by #98304
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@SUPERCILEX
Copy link
Contributor

I tried this code (from #93668 (comment)):

let mut p = [MaybeUninit::<u8>::uninit(); 571];
// VS
let mut p = MaybeUninit::<[u8; 571]>::uninit();

Godbolts: https://godbolt.org/z/341fEjqMd, https://godbolt.org/z/K9dMx8Whr

I expected to see this happen: no memset.

Instead, this happened: uninitialized memory is not actually uninitialized.

The docs explicitly say that these two should be equivalent: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.uninit_array

@SUPERCILEX SUPERCILEX added the C-bug Category: This is a bug. label Apr 21, 2022
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 21, 2022
@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

Looks like this literally does a memset(undef): https://godbolt.org/z/9fq7W3q3v

Surprised this is not optimized ... but I'm afraid that adding this optimization now would not be accepted, because it is technically illegal, due to the poison/undef distinction (https://alive2.llvm.org/ce/z/-Uryfw).

@RalfJung
Copy link
Member

RalfJung commented Apr 21, 2022

I've been looking at the assembly of [MaybeUninit::uninit(); N] and I'm pretty sure that ends up memsetting the array whereas MaybeUninit::<[u8; N]>::uninit() doesn't. That seems like a serious bug? Based on these docs, the two should be equivalent: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.uninit_array.

It's not what we want, yeah. But LLVM is way outside my purview, all I can say is that LLVM is allowed to not memset them -- in that sense, they are equivalent. "Equivalent" doesn't mean "generates the same assembly code" (that would be basically impossible to guarantee), it means "has the same observable behavior and the same UB".

@nikic this is new memory though, not memory that could even contain poison. So it would be technically legal to do the optimization we want.

@RalfJung
Copy link
Member

RalfJung commented Apr 21, 2022

Oh wait, that's a different code than in the OP? I was talking about [MaybeUninit::<u8>::uninit(); 571].

@RalfJung
Copy link
Member

Though not being able to optimize away "overwriting with undef" is also exceedingly silly. That's an LLVM design flaw. I hope they still plan to fix this by killing undef entirely?

Could we work around this by overwriting with poison instead? Though with poison not being per-byte but per-value that might also not be what we want.

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

FWIW I put up https://reviews.llvm.org/D124173 to implement the optimization, but I don't know if it will be accepted. It would be silly not to accept it given that we do the same thing for plain stores, but sometimes people get pedantic in weird places :)

Could we work around this by overwriting with poison instead? Though with poison not being per-byte but per-value that might also not be what we want.

I believe the undef here comes from a load from uninitialized memory -- there are plans to switch that to produce poison (llvm/llvm-project#52930), but those are long term plans that will involve a good bit of work.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 21, 2022

Oh wait, that's a different code than in the OP? I was talking about [MaybeUninit::::uninit(); 571].

Currently it behaves like this:

let mut p = [MaybeUninit::<u8>::uninit(); 571]; // memset
let mut p: [MaybeUninit<u8>; 571] = MaybeUninit::uninit_array(); // no memset
let mut p = MaybeUninit::<[u8; 571]>::uninit(); // no memset

@SUPERCILEX
Copy link
Contributor Author

Maybe this is a stupid question, but why is rust emitting a memset(undef) in the first place? That seems like a rust bug rather than an llvm bug.

Question about LLVM: @nikic isn't your PR actually invalid? That would depend on undef propagation, i.e. does passing an undef through a parameter require it to materialize or does the undef propagate through allowing every use to take on a different value? If it can take on a different value, then your change is fine. Otherwise, that optimization is invalid because every byte under the memset must be identical (even if it's some random value).

Going back to this begin a rust bug, here's what I mean. Let's compare this to an unconditional branch optimization. Presumably rust emits ir for an if false and lets llvm do the dead code elimination. That approach is valid because emitting dead code is just slow, not wrong. On the other hand, emitting any kind of memset violates the contract of uninit which states that the memory will be uninitialized. memset(undef) initializes that memory, even if it's initialized to garbage. That means fixing the bug on the llvm side is asking llvm to uphold a rust API contract which seems like a no bueno. Unless I'm misunderstanding something?

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

@SUPERCILEX In [MaybeUninit::<u8>::uninit(); 571] the MaybeUninit::<u8>::uninit() bit is producing uninitialized memory, but then you're taking that and copying it into 571 array elements. That's where the memset and undef come from. I don't think there is any API contract being violated here, though certainly a reasonable optimization expectation is violated. Generally, MaybeUninit in rust is highly dependent on LLVM optimizations to produce decent code.

That would depend on undef propagation, i.e. does passing an undef through a parameter require it to materialize or does the undef propagate through allowing every use to take on a different value? If it can take on a different value, then your change is fine. Otherwise, that optimization is invalid because every byte under the memset must be identical (even if it's some random value).

I don't believe we specify the behavior of memset with undef in that much detail, but generally speaking undef does not need to be materialized at call boundaries, and can take on different values for each use in the called function. Of course, things are a bit tricky when it comes to intrinsics, as we don't have a function implementation to consult in that case.

@SUPERCILEX
Copy link
Contributor Author

Thanks for all these details! I think I still don't understand uninitialized memory then. 😅 To me, "copies uninitialized memory" is a nonsensical statement — how can you copy something that doesn't exist? Conceptually, I've been thinking about uninitialized memory from the abstract machine perspective as the presence of a box in which to put stuff. You can put undefined stuff in there no problem, but you have to put something in there for it to not be a void (aka uninitialized). Is that just totally wrong? I guess I'm coming at this from the C angle in which uninitialized memory is just a fat pointer.

@RalfJung
Copy link
Member

RalfJung commented Apr 22, 2022

Think of uninit memory like this: every byte of memory either holds an initialized u8, or is not initialized.

enum AbstractByte {
  Init(u8),
  Uninit,
}

We can just copy a list of AbstractByte from one part of memory to another, that is perfectly meaningful -- and it also copies over the "init" state of each byte. (This is also basically the right model for C, btw. Also, it's actually a bit more complicated due to pointer provenance, but let's ignore that for now.)

See this blog post for some more details.

@SUPERCILEX
Copy link
Contributor Author

Mmmm, I like that enum, thanks. So then the uninit variant is being mapped to llvm's undef?

In that case, [MaybeUninit::<u8>::uninit(); N]; and MaybeUninit::<[u8; N]>::uninit(); have very different semantics that the docs are straight up wrong about. Here's what I think the next steps should be:

  1. Fix the docs to clarify the difference between the above code snippets.
  2. Stabilize maybe_uninit_uninit_array.
  3. Add a clippy lint that tells people to migrate to uninit_array from an inline const expression. It would be great for the memset(undef) to be optimized out, but I don't think most people have understood what [MaybeUninit::<u8>::uninit(); N]; actually means and wouldn't have used it if they did.

This issue should stay open to keep an eye on https://reviews.llvm.org/D124173 while the rest of the discussion can be moved to the uninit_array tracking issue. Does that sound like a plan?

@RalfJung
Copy link
Member

Mmmm, I like that enum, thanks. So then the uninit variant is being mapped to llvm's undef?

Yeah, basically.

In that case, [MaybeUninit::::uninit(); N]; and MaybeUninit::<[u8; N]>::uninit(); have very different semantics that the docs are straight up wrong about.

No they don't? Both create N bytes of memory filled with AbstractByte::Uninit.

@SUPERCILEX
Copy link
Contributor Author

Except isn't the first one copying a bunch of small types filled with AbstractByte::Uninit into the slots of the initialized array whereas the latter is just one big type (that happens to be an array) filled with AbstractByte::Uninit? Or at least without that copy-each-element distinction I'm not seeing why one version or the other would involve a memset. If they're the same, why doesn't creating a union memset the padding bytes to undef (which would mean MaybeUninit::uninit always memsets)?

@RalfJung
Copy link
Member

Except isn't the first one copying a bunch of small types filled with AbstractByte::Uninit into the slots of the initialized array whereas the latter is just one big type (that happens to be an array) filled with AbstractByte::Uninit?

There's no difference between how those two things are represented as a sequence of AbstractByte.

The memset probably comes form how [_; N] is compiled: by making N copies of the _.

undef is the default value that new values have in LLVM so we don't need to set anything to undef when creating a MaybeUninit via uninit.

@SUPERCILEX
Copy link
Contributor Author

undef is the default value that new values have in LLVM so we don't need to set anything to undef when creating a MaybeUninit via uninit.

Good to know, thanks.

The memset probably comes form how [_; N] is compiled: by making N copies of the _.

That's the critical distinction I'm referencing. I agree with you that there's no difference from the sequence-of-AbstractBytes perspective, but one version involves a copy whereas the other does not which makes a huge difference from the practical what-I-want-to-happen perspective. The whole point of having a buffer of MaybeUninits is to avoid unnecessary writes to memory. Whether or not the copies from [_; N] get optimized out is somewhat irrelevant since the programmer's intention is being misunderstood (they don't want wasted memory writes but [_; N] means copy _ N times). I suggested creating the clippy lint and stabilizing maybe_uninit_uninit_array because uninit_array matches the programmer's expectations whereas [_; N] does not.

Does that make sense? I feel like I'm not getting through. :)

@RalfJung
Copy link
Member

RalfJung commented Apr 26, 2022

I think I see what you mean, but I disagree. The following two functions are contextually equivalent, which means they are as equivalent as things can get in a programming language: it is impossible to tell the difference between them in terms of program behavior; one can be replaced by the other without program behavior changing even in the slightest.

fn mk_uninit1<N: const usize>() -> [MaybeUninit<u8>; N] {
  [MaybeUninit::uninit(); N]
}
fn mk_uninit2<N: const usize>() -> [MaybeUninit<u8>; N] {
  MaybeUninit::uninit_array()
}

But program equivalence is undecidable (think: halting problem), so it can easily happen that the optimizer treats one different from the other. When that happens, we should fix the optimizer, not design our libs APIs around it.

To try an analogy: you should write x*7, and not (x << 3) - x, because the former is more clear and it is the responsibility of the optimizer to generate the best assembly code for your intention. These two expressions are equivalent except that with a naive compiler, one generates worse assembly code than the other -- but that is a compiler problem and shouldn't be a programmer problem. The same argument applies to [MaybeUninit::uninit(); N].

@nikic nikic self-assigned this Apr 30, 2022
@nikic
Copy link
Contributor

nikic commented Apr 30, 2022

Upstream fix landed in llvm/llvm-project@1881711.

@SUPERCILEX
Copy link
Contributor Author

Upstream fix landed in llvm/llvm-project@1881711.

Woohoo!


@RalfJung I've been mulling this one over and I think the fundamental disagreement centers around MaybeUninit being about creating a bunch of uninit bytes vs. MaybeUninit being a performance tool to avoid wasted instructions.

it is impossible to tell the difference between them in terms of program behavior; one can be replaced by the other without program behavior changing even in the slightest.

From the performance perspective, this is wrong. [MaybeUninit::uninit(); N] is (non-obviously) the intent to copy uninit bytes N times whereas uninit_array is explicitly asking for the array without copies. Again, as a practical programmer this distinction matters — it's the difference between being in control of performance and leaving it up to "the optimizer" which can do whatever it wants. Since array initialization syntax is orthogonal to MaybeUninit, an API needs to exist to allow the programmer to explicitly request uninitialized uninit bytes without having to rely on the optimizer.

To touch on your analogy, the goal of x*7 is to return the product — how it happens is irrelevant. On the other hand, the goal of MaybeUninit is to give systems programmers control over how the language initializes memory so what happens under-the-hood does matter. Otherwise, by your reasoning it's totally fine (for example) for [_; N] to allocate heap memory because that doesn't change program behavior... I don't think people wouldn't find that acceptable.

@RalfJung
Copy link
Member

RalfJung commented May 1, 2022

In Rust we never guarantee the performance of anything, at least not in the same sense that we guarantee that 1+5 evaluates to 6. Performance is always best-effort. This is like in C. You write a program, and it has a certain set of allowed behaviors; the compiler then does its best to implement one (or several) of those behaviors with the best performance it can achieve.

Sometimes it is too hard for the compiler to achieve the desired performance all by itself, so we design new APIs that are harder to use but easier to turn into high-performance implementations. MaybeUninit is an example of such an API. But even then the performance promises are much more tenuous than guarantees about observable behavior of the function. It is certainly a bug if such an API is not as performant as expected, but such a bug is not nearly as critical as when the observable behavior is wrong.

But I don't think I am even fundamentally disagreeing with you. ;) I just think that new APIs for better performance should be a last resort if we can't get the compiler to do the right thing by itself for all relevant cases -- our opinions only really differ in this threshold.
The intent of [MaybeUninit::uninit(); N] is very obvious IMO, and the compiler ought to be able to work with that. (And it seems like soon, it will, thanks to @nikic. :D ) If this turns out to be wrong, we can always add more APIs easily (and they can be user-defined, so experimentation does not have to happen in the standard library).

@SUPERCILEX
Copy link
Contributor Author

Ok, I think I'm finally convinced. 😁

My one remaining worry now would be around regression testing. Is there some way we can know if the LLVM side of things breaks?

And then should uninit_array be removed? I'd prefer to only have one "correct" way of doing things so we can focus on making that way the best it can be.

@RalfJung
Copy link
Member

RalfJung commented May 4, 2022

My one remaining worry now would be around regression testing. Is there some way we can know if the LLVM side of things breaks?

We have LLVM codegen tests for this, yes (src/test/codegen).

@SUPERCILEX
Copy link
Contributor Author

I created #98304 to ensure there's no memset, but it still fails. How can one track the progress of an LLVM diff into rust's version of LLVM? Basically when can I expect that test to pass?

@nikic
Copy link
Contributor

nikic commented Jun 20, 2022

We'll update to LLVM 15 once it branches, I'd expect it to land in August.

@SUPERCILEX
Copy link
Contributor Author

Sounds good, I'll keep an eye out for LLVM 15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants