Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box<dyn FnOnce> doesn't respect self alignment #68304

Closed
programmerjake opened this issue Jan 17, 2020 · 53 comments · Fixed by #71170
Closed

Box<dyn FnOnce> doesn't respect self alignment #68304

programmerjake opened this issue Jan 17, 2020 · 53 comments · Fixed by #71170
Assignees
Labels
A-cranelift Things relevant to the [future] cranelift backend A-DSTs Area: Dynamically-sized types (DSTs) C-bug Category: This is a bug. F-unsized_locals `#![feature(unsized_locals)]` I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@programmerjake
Copy link
Member

The following code fails on stable:
When compiled in release mode, the "optimized out" assert is optimized out by LLVM even though it checks the exact same condition as the other assert, as can be verified by testing in debug mode.

Note that it's theoretically possible for the stack to be aligned correctly such that the bug is suppressed, but that's not likely. The 256's can be replaced with larger alignments if that happens.

The bug is caused by the impl of Box<dyn FnOnce> copying self to the stack with the default alignment of 16.

#![allow(dead_code)]
#[repr(align(256))]
struct A {
    v: u8
}

impl A {
    fn f(&self) -> *const A {
        assert_eq!(self as *const A as usize % 256, 0, "optimized out");
        self
    }
}

fn f2(v: u8) -> Box<dyn FnOnce() -> *const A> {
    let a = A { v };
    Box::new(move || {
        a.f()
    })
}

#[test]
fn test() {
    let addr = f2(0)();
    assert_eq!(addr as usize % 256, 0, "addr: {:?}", addr);
}

(Playground)

@programmerjake
Copy link
Member Author

#61042 and #48055 seem related

@programmerjake
Copy link
Member Author

This is a soundness hole.

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. I-nominated I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 17, 2020
@Centril Centril added the P-high High priority label Jan 17, 2020
@RalfJung
Copy link
Member

So... I guess the problem is that the alloca that rustc emits for the unsized local doesn't use align_of_val? That's... kind of bad, because it doesn't look like alloca even supports dynamically setting the alignment: https://llvm.org/docs/LangRef.html#alloca-instruction

If this is correct, the issue should be renamed to something like "unsized locals do not respect alignment".

@programmerjake
Copy link
Member Author

A way to fix this would be to change the ABI for dynamic trait calls that take self by value to always pass self by address. When the trait object is created is where the trampoline function would be generated, so there is no additional cost for static calls.

@jonas-schievink jonas-schievink added A-DSTs Area: Dynamically-sized types (DSTs) F-unsized_locals `#![feature(unsized_locals)]` labels Jan 17, 2020
@programmerjake
Copy link
Member Author

This can also be fixed by not allowing dynamically-sized local variables with an alignment more than some maximum value and having all dynamically-sized locals use that maximum alignment unless optimizations can prove that a smaller alignment suffices.

@hanna-kruppe
Copy link
Contributor

A way to fix this would be to change the ABI for dynamic trait calls that take self by value to always pass self by address. When the trait object is created is where the trampoline function would be generated, so there is no additional cost for static calls.

Messing with the ABI wouldn't help with unsized locals in general (assuming @RalfJung is correct), since e.g. let f: dyn FnOnce() = ...; will also need an alloca with dynamic alignment in general.

This can also be fixed by not allowing dynamically-sized local variables with an alignment more than some maximum value and having all dynamically-sized locals use that maximum alignment unless optimizations can prove that a smaller alignment suffices.

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

@oli-obk
Copy link
Contributor

oli-obk commented Jan 17, 2020

Can we call alloca(align + size) and then align the pointer for the local ourselves as a workaround until we come up with a good solution?

@oli-obk oli-obk closed this as completed Jan 17, 2020
@oli-obk oli-obk reopened this Jan 17, 2020
@crlf0710
Copy link
Member

In clang there's a function __builtin_alloca_with_align() which might or might not be useful in this case.

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Jan 17, 2020

@crlf0710

In clang there's a function __builtin_alloca_with_align() which might or might not be useful in this case.

__builtin_alloca_with_align requires a constant alignment. Non constant alignments are rejected by Clang. This matches semantics of alloca LLVM instruction, so it's not helping, as alignment of value in dyn FnOnce is unknown at compile time when moving from Box<dyn FnOnce>.

int main(void) {
    int alignment = 16;
    __builtin_alloca_with_align(16, alignment);
}

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Jan 17, 2020

I think that overallocating for unsized locals of unknown alignment may make sense. You can alloca(sizeof + alignof - 1) and then make use of <*>::align_offset to compute the offset.

Sure, it increases stack requirements, but it seems acceptable for Box<dyn FnOnce>, and in most cases you end up with extra 7 bytes on stack or less. 256 alignment seems like an edge case where wasting 255 bytes doesn't matter.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 17, 2020

if we keep setting 16 as the default alignment we can skip the overallocating and offset dance for types with an alignment below that.

@programmerjake
Copy link
Member Author

programmerjake commented Jan 17, 2020

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

My idea was that rustc would prohibit coercing overaligned types to a dyn trait that takes self by value. So:

#[repr(align(1048576))] // 1MiB
struct Overaligned(u8);

#[repr(align(16))]
struct Normal(u8);

trait TakeSelfByValue {
    fn f(self) {
        todo!()
    }
}

impl TakeSelfByValue for Overaligned {}
impl TakeSelfByValue for Normal {}

fn make_overaligned() -> Box<dyn TakeSelfByValue> {
    Box::new(Overaligned(0)) // errors due to too strict alignment
}

fn make_normal() -> Box<dyn TakeSelfByValue> {
    Box::new(Normal(0)) // doesn't error -- alignment is under limit
}

@RalfJung
Copy link
Member

if we keep setting 16 as the default alignment we can skip the overallocating and offset dance for types with an alignment below that.

The alignment is dynamically determined though, not sure if saving 15 bytes of stack space is worth the conditional branch?

@oli-obk
Copy link
Contributor

oli-obk commented Jan 17, 2020

Oh right. Yea that's probably not helpful.

@hanna-kruppe
Copy link
Contributor

Given only a Box<dyn Trait> value, there is no static way to say anything about its alignment, so how would you distinguish whether calling a self method on it is OK?

My idea was that rustc would prohibit coercing overaligned types to a dyn trait that takes self by value. So:

Since Fn : FnMut : FnOnce, this would be backwards incompatible (at latest) once trait object upcasting is finally implemented: dyn Fn and dyn FnMut trait objects would have to have the same alignment restriction (because you can upcast to dyn FnOnce), but today you can freely create such objects with any alignment.

@eddyb
Copy link
Member

eddyb commented Mar 4, 2020

I was under the impression that calling Box<dyn FnOnce()> didn't require dynamic alloca, is that related to missing MIR optimizations, meaning we have unnecessary moves which end up duplicating the data on the stack using a dynamic alloca?

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2020

Correctness shouldn't rely on optimizations to fire...

@eddyb
Copy link
Member

eddyb commented Mar 5, 2020

@RalfJung Sure, I was just hoping we weren't relying on "unsized locals", only unsized parameters (i.e. self), for what we're exposing as stable. And I was hoping an optimization wouldn't need to be involved to get there.

I wonder if it would even be possible to restrict "unsized locals" to the subset where we can handle it entirely through ABI, and not actually have dynamic allocas anywhere, since that would make me far more comfortable with Box<dyn FnOnce()>'s implementation or even stabilization.

Ironically, if that works, it's also the subset we could've implemented years earlier than we did, since most of the complexity, unknowns (and apparently bugs) are around dynamic allocas, not calling trait object methods on a dereference of Box.

@bjorn3
Copy link
Member

bjorn3 commented Mar 5, 2020

I wonder if it would even be possible to restrict "unsized locals" to the subset where we can handle it entirely through ABI, and not actually have dynamic allocas anywhere

That should be possible. I think everything that is necessary is performing the deref of the box as argument (call FnOnce::call_once(move *_1, _2)) instead of moving the box contents to a local first (_3 = move _1; call FnOnce::call_once(move _3, _2)) That would also make it possible to call Box<FnOnce> in cg_clif. Cranelift doesn't have alloca support, so the current MIR is untranslatable.

@eddyb
Copy link
Member

eddyb commented Mar 5, 2020

@bjorn3 Hmm, but that's not trivially sound because it could lead to MIR calls where argument and return places overlap, which is UB.

We'd potentially have to special-case it to Box derefs, but even then what's to stop someone from doing foo = (*foo.boxed_closure)();?

Maybe we could introduce a temporary for the return value instead, for these kinds of virtual calls? That should remove any potential overlap while allowing the call.

(Unrelatedly, I just realize we use a copying-out shim for vtable entries that take self by value, even when Self would be passed as *Self at the call ABI level. We would probably have to make computing the call ABI a query for this microopt to not be expensive)

@bjorn3
Copy link
Member

bjorn3 commented Mar 5, 2020

We'd potentially have to special-case it to Box derefs, but even then what's to stop someone from doing foo = (*foo.boxed_closure)();?

That would result in foo.boxed_closure: Box<FnOnce> being moved to a temp first before dropping foo and calling the boxed closure. If it wasn't moved first before the drop of foo, then borrowck would give an error.

(I meant for this change to be done at mir construction level, not as an optimization pass, so borrowck should catch any soundness issues)

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Apr 3, 2020

Can you not reinitialize *(x.y) today?

Hmm, that's a good point, I think you can do that. Sigh. I knew that seemed too easy. I remember we used to try and prevent that, but we never did it fully, and in the move to NLL we loosened those rules back up, so this compiles:

fn foo() {
    let mut x = Box::new(vec![22]);
    let y = take(*x);
    *x = vec![44];
}

fn take<T>(x: T) -> T { x }

Still, you could do my proposal for unsized types, though it would introduce an incongruity.

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

@nikomatsakis
Copy link
Contributor

@eddyb

At the very least it seems like it would be easier to implement a change only to MIR building, and it should fix problems with Box<dyn FnOnce()> for the time being, even without a separate feature-gate.

To be clear, I wasn't proposing changing anything apart from MIR building either, I don't think.

IIUC, what you proposed is that foo(*x, ...) would be compiled to:

let tmp1 = ..;
..
let tmpN = ..;
foo(*x, tmp1 .. tmpN)

whereas under my proposal (limited to apply to unsized parameters, instead of moved parameters) would yield:

let tmp0 = x;
...
let tmpN = ...;
foo(*tmp0, tmp1..tmpN)

I think I mis-stated the problem earlier. It's not so much that your proposal will introduce compilation errors, it's that I think it changes the semantics of code like this in what I see as a surprising way:

x.foo({ x = Box::new(...); ... })

I believe that this code would compile to the following under your proposal:

// evaluate argument1 into tmp1:
x = Box::new(...);
tmp1 = ...;

// invoke:
foo(*x, tmp1)

which means that foo would be invoked with the new value of x, right?

Under my proposal, the code would still compile, but it would behave in what I see as the expected fashion. Specifically, it would generate:

tmp0 = x;

x = Box::new(...);
tmp1 = ...;

foo(*tmp0, tmp1)

@eddyb
Copy link
Member

eddyb commented Apr 3, 2020

which means that foo would be invoked with the new value of x, right?

Isn't there a way to "lock" that value through the borrow-checker, so further accesses would be disallowed, until the call returns? I'd rather be stricter here.


UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

My point was that if your proposal affects sized moves, you're breaking stable reinitialization.
If it doesn't affect sized moves, I don't see how this applies anymore:

I think what I would like is that our overall behavior is as consistent as possible between Sized and Unsized parameters.

@RalfJung
Copy link
Member

RalfJung commented Apr 4, 2020

UPDATE: Oh, wait, I remember now. For unsized types, reinitialization is not possible, because you can't do *x = ... with an unsized rvalue (we wouldn't know how many bytes to move). So in fact there isn't really the same incongruity.

Let me just chime in here and say that this is a pain in Miri. Unsized locals are pretty different from sized locals, I had to introduce a bunch of special hacks to make them work. MIR locals even needed a new state just for them:

/// This local is alive but not yet initialized. It can be written to
/// but not read from or its address taken. Locals get initialized on
/// first write because for unsized locals, we do not know their size
/// before that.
Uninitialized,

Lazily allocating locals seems odd to me, but at least we can consistently do that for all locals. Elsewhere however we needed some special treatment to make this lazy initialization actually work out and catch mistakes where an unsized local is "overwritten" with a value of different size:

// This interprets `src.meta` with the `dest` local's layout, if an unsized local
// is being initialized!
let (dest, size) = self.force_allocation_maybe_sized(dest, src.meta)?;
let size = size.unwrap_or_else(|| {
assert!(
!dest.layout.is_unsized(),
"Cannot copy into already initialized unsized place"
);
dest.layout.size
});
assert_eq!(src.meta, dest.meta, "Can only copy between equally-sized instances");

This all feels really unprincipled, and it would be great if at some point we could come up with a coherent semantics for unsized locals in MIR.
The worst part is that the size of a local is implicitly determined on the first write to it. It would be much better to be explicit about that, e.g. at StorageLive time.

@nikomatsakis
Copy link
Contributor

@eddyb

Isn't there a way to "lock" that value through the borrow-checker, so further accesses would be disallowed, until the call returns? I'd rather be stricter here.

I don't think we have anything like this today, but presumably we could add such a thing, but I'm not quite sure what you have in mind. When does this prohibition on further accesses take effect?

In particular, the IR for foo(*x, ...) under your proposal, if I understood, would be

... // evaluate remaining arguments
foo(*x, ...) // invoke `foo` with `*x` directly, no temporary

But the access to x that (I believe) we wish to prevent takes place in the "evaluate remaining arguments" part of things. So I think we'd have to add some sort of "pseudo-move" that tells borrowck that x should not be assignable from that point forward, right?

My point was that if your proposal affects sized moves, you're breaking stable reinitialization. If it doesn't affect sized moves, I don't see how this applies anymore: [unsized should be congruent with sized]

Well, it depends on your perspective. It's true that, given the need to permit reinitialization, the check is now specific to unsized values -- i.e., MIR construction will be different depending on whether we know the type to be sized or not. That's unfortunate.

However, I think we still maintain a degree of congruence, from an end-user perspective. In particular, users could not have reinitialized a Box<T: ?Sized> anyhow.

However, under my proposal, users can write things like

x.foo({x = ...; ...})

and the code works as expected (versus either having an unexpected result or getting an error).

In any case, it seems like we're sort of "narrowing in" on a proposal here:

  • when we see a function call with a parameter that is ?Sized, and the value being passed is a deref, we want to modify the IR to not introduce a temporary and instead pass that parameter directly from its location,
  • either we do some sort of "no more assignments" lock,
  • or we move the pointer and introduce a temporary (and then pass in a deref of that temporary)

Does that sound right so far?

@eddyb
Copy link
Member

eddyb commented Apr 6, 2020

If it's easier to do the more flexible approach (by moving the pointer) than the "locking" approach (which is a bit like borrowing the pointer until all arguments are evaluated, then releasing the borrow and doing a move instead), I don't mind it, I was only arguing with the "uniformity" aspect.

However, under my proposal, users can write things like

x.foo({x = ...; ...})

and the code works as expected (versus either having an unexpected result or getting an error).

Something important here: your approach is stabilizable, while mine is meant more for that one impl<A> FnOnce<A> for Box<dyn FnOnce<A>> in the standard library, and not much else.

@nikomatsakis
Copy link
Contributor

Yes, I was kind of shooting for something that we could conceivably stabilize.

@spastorino
Copy link
Member

Removing nomination, this was already discussed in triage meeting and it's under control.

@nikomatsakis
Copy link
Contributor

So should we maybe try out the approach I proposed and see how well it works?

@spastorino
Copy link
Member

@nikomatsakis yes, going to try that out.

@nikomatsakis
Copy link
Contributor

For the record, the "new plan" is this.

Change to MIR construction

When generating the arguments to a call, we ordinarily will make a temporary tmp containing the value of each argument:

tmp0 = <arg expr>
...
tmpN = <arg expr>
call(tmp0, .., tmpN)

But we will modify this such that, for arguments that meet the following conditions:

  • The argument is moved (not copied),
  • and its type is not known to be Sized,
  • and the "place" has outer derefs (e.g., we are passing *P where P is some place),

then we assign the place P (instead of *P) to a temporary tmp and we make the argument to the function be *tmp (instead of tmp).

This means that given x(...) where x: Box<dyn FnOnce(...)>, we used to generate

let tmp0 = *x; // tmp0: dyn FnOnce(), requires an alloca
...
FnOnce::call(tmp0, ...)

but now we generate

let tmp0 = x; // tmp0: Box<dyn FnOnce()>, no alloca
...
FnOnce::call(*tmp0, ...)

Rationale

We are now moving the box itself, which is slightly different than the semantics for values known to be sized. In particular, it does not permit "re-initialization" of the box after having moved its contents. But that is not supported for unsized types anyway.

The change does permit modifications to the callee x during argument evaluation and they will work in an analogous way to how they would work with sized values. So given something like:

x.foo({ x = Box::new(...); ... })

we will find that we (a) save the old value of x to a temporary and then (b) overwrite x while evaluating the arguments and then (c) invoke the FnOnce with the old value of x.

@nikomatsakis
Copy link
Contributor

Update: There is also some discussion of implementation details on Zulip

@RalfJung
Copy link
Member

FnOnce::call(*tmp0, ...)

This needs MIR datastructure changes, right? It is not currently representable I think as fn arguments are Operand. (Cc #71117 (comment) where @eddyb argues that these "operands" should already have a weird semantics in case of move.)

@eddyb
Copy link
Member

eddyb commented Apr 14, 2020

move *tmp0 is a perfectly normal operand, I don't know what you mean.
It's the same kind of Operand as we currently have in @nikomatsakis' let tmp0 = *x; (on the RHS)

@RalfJung
Copy link
Member

Oh sorry, somehow I thought operands could only refer directly to locals, not also dereference them.

I think I am beginning to see the pieces here -- together with @eddyb-style Operand::Move function argument handling, this entirely avoids having to dynamically determine the size of a local. I have to agree that is quite elegant, even though baking "pass-by-reference" into MIR semantics still seems like a heavy hammer to me.

On the other hand this places a severe restriction on now unsized locals may be used, basically taking back large parts of rust-lang/rfcs#1909. Is there some long-term plan to bring that back (probably requires alloca with dynamic alignment in LLVM)? Or should we amend the RFC or at least the tracking issue with the extra restrictions?

@eddyb
Copy link
Member

eddyb commented Apr 15, 2020

I would keep the more general feature but use two different feature-gates (if we can).

And just like with a few other feature-gates (#![feature(const_generics)]) we could warn that #![feature(unsized_locals)] (the general one) is incomplete and could lead to misaligned variables (or something along those lines).

We could also add runtime asserts that the (dynamic) alignment is respected, once we know Box<dyn FnOnce()> could never trigger them (e.g. from stable code).

@nikomatsakis
Copy link
Contributor

Is there some long-term plan to bring that back (probably requires alloca with dynamic alignment in LLVM)?

I was wondering about that too. It seems like it'd be good to review the more general feature in any case.

This is partly why I brought up the idea of wanting a guaranteed "no alloca" path -- this was something I remember us discussing long ago, and so in some sense I feel like the work we're doing here isn't just working around LLVM's lack of dynamic alignment support (though it is doing that, too) but also working to complete the feature as originally envisioned.

@nikomatsakis
Copy link
Contributor

I added two notes to the tracking issue #48055 so we don't overlook this in the future.

@RalfJung
Copy link
Member

This is probably expected, but the fix failed to actually make unsized_locals sound: #71416.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cranelift Things relevant to the [future] cranelift backend A-DSTs Area: Dynamically-sized types (DSTs) C-bug Category: This is a bug. F-unsized_locals `#![feature(unsized_locals)]` I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.