rustc_mir: add a pass for fragmenting locals into their fields (aka SROA). #48300

eddyb · 2018-02-17T15:49:06Z

In order to make other MIR optimizations more effective and/or easier to implement, this pass breaks up ("fragments") aggregate locals into smaller locals, ideally leaf fields of primitive or generic types.

This roughly corresponds to LLVM's SROA ("Scalar Replacement of Aggregates"), although "scalar" is less applicable here, as MIR doesn't distinguish between register-like SSA and memory.

Locals are fragmented only when all accesses are directly through field/downcast projections, so that the number of statements is unchanged (ignoring Storage{Live,Dead} ones).
For example, x = y; isn't transformed to x.f = y.f; for each field, instead it completely prevents x and y from being split up in any way.

Variable debuginfo is always preserved by this pass, by transforming it into a "composite" form, which maps pieces of an user variable to independent Places.
That corresponds to DWARF's DW_OP_piece "composition operator", which LLVM exposes as a more "random-access" DW_OP_LLVM_fragment (as each llvm.dbg.declare can only point to one alloca), indicating what byte range of the debugger-facing variable is being declared.

However, enums can't be broken up into their discriminant and variant fields, if any debuginfo refers to them, as there is more than one possible memory location for the enum bytes after the discriminant, and the debugger would have to inspect the discriminant to use the right variant's fragmented fields.
And LLVM doesn't currently (and couldn't easily AFAICT) support the more advanced DWARF features which would let us check the discriminant and use different locations based on it.
(While there is a workaround, namely faking type debuginfo for such cases, so the variants appear laid out like in a tuple instead of overlapping, it's complex enough that I don't want to tackle it in this PR)

As an example, this small snippet:

let mut pair = (1, 2);
pair = (pair.1, pair.0);

currently produces this MIR (after deaggregation, and some details omitted):

scope 1 {
    debug pair => _1;
}

bb0: {
    (_1.0: i32) = const 1i32;
    (_1.1: i32) = const 2i32;
    _2 = (_1.1: i32);
    _3 = (_1.0: i32);
    (_1.0: i32) = _2;
    (_1.1: i32) = _3;
}

but after this PR, the pair is replaced with two separate locals:

scope 1 {
    debug pair: (i32, i32) {
        .0 => _3,
        .1 => _4,
    };
}

bb0: {
    _3 = const 1i32;
    _4 = const 2i32;
    _1 = _4;
    _2 = _3;
    _3 = move _1;
    _4 = move _2;
}

The only thing left of the original pair is the debuginfo, which describes how a debugger can reconstruct the pair user variable, by combining the contents of _3 and _4.

More concretely, the debugger sees a single pair: (i32, i32) variable with contents drawn from:

_3, for bytes 0..4
_4, for bytes 4..8

But outside of a debugger, those two halves are completely independent.

r? @nikomatsakis cc @rust-lang/wg-mir-opt @michaelwoerister

TODO: address review comments (about missing code comments, and some notes to self)

eddyb · 2018-02-17T15:50:42Z

r? @nikomatsakis

eddyb · 2018-02-17T15:52:59Z

@bors try

bors · 2018-02-17T15:53:09Z

⌛ Trying commit 9a18264 with merge b8d8ea6...

rustc_mir: add a pass for splitting locals into their fields (aka SROA). **DO NOT MERGE**: based on #48052.

bors · 2018-02-17T16:38:37Z

💔 Test failed - status-travis

eddyb · 2018-02-19T00:49:30Z

@bors try

bors · 2018-02-19T00:49:42Z

⌛ Trying commit 49f1ddaec0f5c745909c5797cc030eb5448473db with merge d2592cc1fd6fd56b975206a2d1c3e4f4978be3a6...

bors · 2018-02-19T03:05:56Z

☀️ Test successful - status-travis
State: approved= try=True

Mark-Simulacrum · 2018-02-20T20:05:10Z

http://perf.rust-lang.org/compare.html?start=27a046e9338fb0455c33b13e8fe28da78212dedc&end=d2592cc1fd6fd56b975206a2d1c3e4f4978be3a6&stat=instructions%3Au

eddyb · 2018-02-20T20:07:22Z

This comparison is a bit more accurate (this PR is rebased on top of #48052).

nikomatsakis

This is pretty nifty. Left a bunch of requests, mostly for comments. Will take another pass after that.