Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code. #47412

Closed
nagisa opened this issue Jan 13, 2018 · 39 comments
Assignees
Labels
I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nagisa
Copy link
Member

nagisa commented Jan 13, 2018

With the following code

fn main() {
    #[derive(Copy, Clone)]
    enum Void {}
    union A { a: (), v: Void }
    let a = A { a: () };
    match a.v {
    }
}

it is possible to invoke undefined behaviour in safe code without using unstable features.

@nagisa nagisa added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jan 13, 2018
@nikomatsakis
Copy link
Contributor

Seems like unions elements ought to be inhabited.

cc @petrochenkov @joshtriplett

@nikomatsakis nikomatsakis added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 13, 2018
@nikomatsakis nikomatsakis changed the title Safe accesses to copy union fields allow invoking UB in safe code Safe accesses to uninhabited (but Copy) union fields allow invoking UB in safe code Jan 13, 2018
@nikomatsakis
Copy link
Contributor

@nagisa points out that we can't tell if generic types are inhabited, so maybe such a fix is not viable.

@nikomatsakis
Copy link
Contributor

cc @rust-lang/lang -- a bit of a tricky thing to decide what we should disallow here, though I'm leaning towards "safe access to union fields", or at least restricting the cases further (e.g., to those cases where know more than copy, but also inhabited)

@nikomatsakis
Copy link
Contributor

Also: @eternaleye points out uninitialized values (e.g., on itanium) may trap if you read from them, and this union RFC would seem to allow access to them from safe code. Leaning more and more towards "union fields should never be safe to access". =)

@eddyb
Copy link
Member

eddyb commented Jan 13, 2018

When did the decision for safe accesses to union fields happen? I do not recall any such proposal or discussion and I would think that I had to sign off on it.
EDIT: turns out "safe copy out of union fields" doesn't seem to exist, see #47412 (comment).

We've known for a long time that a bitcast between two types (with the same number of bits) is safe iff all possible bitpatterns of each of the two types are valid and they correspond to distinct values.
(Note that reading uninitialized memory, including inter-field padding, is unsafe in Rust)
Sometimes this was called "POD" (but it's much stricter than the C/C++ concept with the same name).

@joshtriplett
Copy link
Member

joshtriplett commented Jan 13, 2018

Is this actually UB, and should it be? enum Void should have size zero, and "accessing" it should not actually do a memory read, nor should that match generate any code.

EDIT: Ah, nevermind, I understand.

I tried this on the playground, and the entire function compiles down to a single undefined instruction (ud2); in LLVM, it compiles to an unreachable.

Also, I don't recall us ever making union field reads safe. Does this somehow work outside an unsafe block because the match doesn't actually do any pattern-matching? It seems like the reference to a.v alone should have required an unsafe block.

(Also, good catch, @nagisa!)

@eternaleye
Copy link
Contributor

@joshtriplett: It's not that "accessing" it is UB; it's that accessing it produces a value whose mere existence is UB, because enum Void {} has no values.

@joshtriplett
Copy link
Member

@eternaleye Thanks; yeah, took staring at it for a while to realize that was the problem here.

@eternaleye
Copy link
Contributor

eternaleye commented Jan 13, 2018

Note that you can also get UB with

fn main() {
    union A { a: u8, v: u16 }
    let a = A { a: 1 };
    match a.v {
        _ => println!("Congrats, it's a u16!")
    }
}

as this reads one of the u16's bytes from uninitialized memory.

@joshtriplett
Copy link
Member

joshtriplett commented Jan 13, 2018

@eternaleye That code won't compile; it'll generate error[E0133]: access to union field requires unsafe function or block. (Among a couple of other things.) Try it in the playground.

After experimenting with this a fair bit, it looks like match a.v isn't actually treated as something that requires an unsafe block itself, only the pattern within the match. Can we fix that? The original union RFC says that matching on a union field requires an unsafe block; I think that should be the case even without patterns, or with the only pattern as _. Wouldn't that fix this issue?

Attempting to compile @nagisa's original example should produce the same error E0133 at compile time.

@eddyb
Copy link
Member

eddyb commented Jan 13, 2018

Wait, I'm getting errors for reading, but @nagisa's example doesn't do this (pattern-matching isn't by-value unless there is a pattern that needs it to be).
This gets the correct error: match *&a.v {} - I think that's what match a.v {} should be checked as.

@eternaleye
Copy link
Contributor

@joshtriplett Yeah, I just noticed that - I'd been taking at face value that @nagisa's example was reading out.

@nikomatsakis
Copy link
Contributor

OK, so perhaps the problem is more narrow.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 13, 2018

In particular, matches with empty arms are somewhat special -- they act as an "assertion" of sorts that the path in question is valid. This probably means we forgot to account for that as a kind of read.

UPDATE: Some discussion on IRC where I spell out a bit more of the background

@joshtriplett
Copy link
Member

joshtriplett commented Jan 13, 2018

I think this is just a corner case that we didn't catch (or have a test for) in the union implementation, namely, that a match on a union field with no patterns wasn't treated as unsafe.

(That said, on the off chance someone was relying on this, such as via some kind of generic code and macros, when we fix it we should probably do a crater run.)

Here's a test case that should not compile:

fn main() {
    union A {
        a: i8,
        b: u8,
    }
    let a = A { a: -2 };
    match a.b {
        _ => { println!("should not be allowed") }
    }
}

It currently compiles and prints "should not be allowed"; it should not compile at all.

@eddyb
Copy link
Member

eddyb commented Jan 13, 2018

The Copy part is only relevant to stability, but not to the actual pattern-checking bug:

#![feature(untagged_unions)]
fn main() {
    enum Void {}
    union A { a: (), v: Void }
    let a = A { a: () };
    match a.v {
    }
}

@matthewjasper
Copy link
Contributor

It looks like this was changed between nightly-2017-09-23 and nightly-2017-09-29. Maybe caused by #44700?

@eternaleye
Copy link
Contributor

eternaleye commented Jan 13, 2018

@joshtriplett Not just "no arms", but "no by-value arms" - this should also require unsafe somewhere, IMO, even if it never actually hits UB (compiles and runs just fine):

fn main() {
    union A { a: u8, v: u16 }
    let a = A { a: 1 };
    match a.v {
        _ => println!("Congrats, it's a u16!")
    }
}

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 13, 2018

@eternaleye I'm fine with making that unsafe temporarily, however, it's not clear that it will eventually have to be. In particular, I think that match a.v { _ => could be read as a complete no-op. _ patterns ignore the value that they are matched on and do not require it to be valid. For example, this compiles:

fn main() {
    let x = Box::new(22);
    match x { 
        _ => { }
    }
}

As I wrote on IRC, I believe matches with no arms have to be considered somewhat special here.

Admittedly this needs to be written up more formally and documented.

@nikomatsakis
Copy link
Contributor

Another way to look at it: with a match with no arms, there is nowhere to branch to! So if that code is ever reached, that is UB. But with a single _ arm, this is not the case.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 13, 2018

Oh, wait, my example is bogus =) and actually the example I thought would compile didn't:

fn main() {
    let x = Box::new(22);
    drop(x);
    match x { 
        _ => { }
    }
}

Nonetheless, I think this can come up. I'll play around some more. =)

@nikomatsakis
Copy link
Contributor

OK, so, in the MIR-based borrowck, these examples do work as I expected:

#![feature(nll)]

fn main() {
    let pair = (Box::new(22), Box::new(22));
    drop(pair);
    match pair { 
        _ => { }
    }
}):

I will try to write up a more thorough "proposal" of some kind regarding this validity predicates. I've tried in the past but each time I get stuck trying to figure out how much background to give.

@joshtriplett
Copy link
Member

Reflecting some discussion from IRC back here: my proposal to address this is that naming a union field in a match should always require an unsafe block, even if the match doesn't name the field value or apply any patterns to the field value. That includes only having a _ pattern, or having no patterns at all.

@nikomatsakis
Copy link
Contributor

So, to clarify something that @joshtriplett alluded to but didn't make explicit:

There are two interesting questions to clarify. At what point do we have UB, and when is unsafety required? Clearly, unsafety must be required for any case that could cause UB, but it may also be required more broadly.

I think it's reasonable to require unsafe more broadly, especially to start. But I think we should also write up and nail down the cases where UB could occur.

And I think we may find value in helping the user identify the intersection and calling special attention to those cases where UB could actually occur.

@joshtriplett
Copy link
Member

@petrochenkov Does regression-from-stable-to-nightly apply here? The bug now exists in current stable.

@petrochenkov petrochenkov added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jan 13, 2018
@petrochenkov
Copy link
Contributor

Oops, wrong label.

@eddyb
Copy link
Member

eddyb commented Jan 13, 2018

MIR-based unsafety checking probably works after eliminating some dead code thus introducing the bug

There is no MIR, even dead code, to generate for match a.b {} or match a.b { _ => {} }, to access any part of a, only discriminant/length checks and bindings can do that.
There would be if deriving a.b from a would be a MIR instruction of some form - I keep wanting something like that, but it's not clear how to approach introducing it.

@pnkfelix
Copy link
Member

@eddyb wrote:

There is no MIR, even dead code, to generate for match a.b {} or match a.b { _ => {} }, ...

As part of fixing #27282 I am currently experimenting with adding MIR constructs that represent "start a (pseudo-)borrow of the discriminant for a match on place" and "end the (pseudo-)borrow for that match"

Its possible we might leverage that work to represent the accesses in question here.

@nikomatsakis
Copy link
Contributor

triage: P-high

Well, this is a regression. We ought to fix it. Assigning to pnkfelix and myself to figure out how to get this fixed.

@eddyb
Copy link
Member

eddyb commented Feb 8, 2018

@nox just demonstrated this by using match *(&() as *const _ as *const Void) {} in safe code.
Pattern-matching appears to have serious trouble keeping uninhabited places around to be checked.

EDIT: can't we just always add just a dummy temp = Discriminant(place) for the place we're matching on? That should work regardless of type and not cause any borrowck interactions.
This else will have to be changed to produce tcx.types.u64 (or anything else):

if let ty::TyAdt(adt_def, _) = ty.sty {
adt_def.repr.discr_type().to_ty(tcx)
} else {
// Undefined behaviour, bug for now; may want to return something for
// the `discriminant` intrinsic later.
bug!("Rvalue::Discriminant on Place of type {:?}", ty);
}

And the new dummy_temp = Discriminant(discriminant_place) would be created after this line:

let discriminant_place = unpack!(block = self.as_place(block, discriminant));

@glaebhoerl
Copy link
Contributor

match *(&() as *const _ as *const Void) {}

(Wait, why would that raw pointer dereference be allowed in safe code? Or is this an analogy for the effect that was achieved, and not itself the code that was used to to do it?)

@eddyb
Copy link
Member

eddyb commented Feb 8, 2018

@glaebhoerl Because the check is done on MIR, and this entire issue is about the dereference/union field access not ending up in MIR because it's never read from/written to by match.

@glaebhoerl
Copy link
Contributor

(Ah I see, I was looking for the union field access in there and wasn't following the details closely enough to see the analogy.)

@nikomatsakis
Copy link
Contributor

@eddyb

can't we just always add just a dummy temp = Discriminant(place) for the place we're matching on?

I think that's basically what we need to add, yes. At least, it'd be good to do this for now, and maybe revisit later if we want to think about a more "elegant" fix.

@nikomatsakis nikomatsakis assigned eddyb and unassigned nikomatsakis and pnkfelix Feb 8, 2018
@nikomatsakis
Copy link
Contributor

Assigning to @eddyb to do something for now to close the gaping hole.

@eddyb
Copy link
Member

eddyb commented Feb 9, 2018

@nikomatsakis Small problem, that approach also doubles up some MIR borrowck errors, e.g.:

let mut x = 0;
let r = &mut x;
match *x { ... }
*r += 1;

Because of the added dummy access, *x is checked twice and the conflict with the mutable borrow is reported twice - do we want to keep this behavior and somehow clean it up in the future?
It'd sure be fine for beta, because MIR borrowck is inaccessible there, but I wanted to check.

@eddyb eddyb changed the title Safe accesses to uninhabited (but Copy) union fields allow invoking UB in safe code Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code. Feb 9, 2018
@nikomatsakis
Copy link
Contributor

@eddyb seems ok for now

bors added a commit that referenced this issue Feb 11, 2018
 rustc_mir: insert a dummy access to places being matched on, when building MIR.

Fixes #47412 by adding a `_dummy = Discriminant(place)` before each `match place {...}`.

r? @nikomatsakis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants