Prevent query cycles in the MIR inliner #68828

oli-obk · 2020-02-04T17:46:19Z

cc @rust-lang/wg-mir-opt

The general design is that we have a new query that is run on the validated_mir instead of on the optimized_mir. That query is forced before going into the optimization pipeline, so as to not try to read from a stolen MIR.

The query should not be cached cross crate, as you should never call it for items from other crates. By its very design calls into other crates can never cause query cycles.

This is a pessimistic approach to inlining, since we strictly have more calls in the validated_mir than we have in optimized_mir, but that's not a problem imo.

matthewjasper

This is a pessimistic approach to inlining, since we strictly have more calls in the validated_mir than we have in optimized_mir

Except for box_free calls, which drop elaboration adds.

src/librustc_mir/transform/mod.rs

src/librustc_mir/transform/inline.rs

eddyb · 2020-02-05T04:30:11Z

I think I get it now, this is similar to computing SCCs (of the callgraph) ahead of time, but both better (incremental) and worse (not cached nor using an SCC algorithm) at the same time.

We discussed something like this ages ago, maybe @nikomatsakis can remember where.

I think your query helps both with caching and incremental red-green, but it's not necessary to get the correctness, you could "lookahead" by using optimized_mir(callee) in the inliner directly (unlike your query, there would be no stealing issue there AFAICT).

One of the problems with precomputing SCCs in an on-demand/incremental fashion is deciding on which (callgraph) node "defines" the SCCs, as it needs to be the same no matter which node in the SCC the information is queried for.

And... I think we've had an answer for a while now! We sort several things by stable hashes.

So we could roughly have this query (where mir_inliner_callees is your query):

struct Cycle {
    root: LocalDefId,
    members: FxHashSet<LocalDefId>,
}
fn mir_inliner_cycle(tcx: TyCtxt<'tcx>, root: LocalDefId) -> Option<&'tcx Cycle> {
    let root_hash = tcx.def_path_hash(root);

    let mut cycle = None;

    let mut queue = VecDeque::new();
    queue.push_back(root);

    while let Some(caller) = queue.pop_front() {
        for callee in tcx.mir_inliner_callees(caller) {
            if callee == def_id {
                cycle = Some(Cycle {
                    root,
                    // TODO: accumulate cycle members.
                    // Can probably use a stack instead of a queue, emulating
                    // the runtime callstack, but this example is large as is.
                    members: FxHashSet::new(),
                });
                continue;
            }

            if tcx.def_path_hash(callee) < root_hash {
                // We may share a cycle with `callee`, in which case
                // it's "closer" to the cycle's real root (lower hash).
                if let Some(callee_cycle) = tcx.mir_inliner_cycle(callee) {
                    if callee_cycle.members.contains(&root) {
                        return Some(callee_cycle);
                    }
                }
            } else {
                // Keep exploring areas of the callgraph that could
                // contain a cycle that we're the root of
                // (i.e. all other nodes have higher hash).
                queue.push_back(callee);
            }
        }
    }

    cycle
}

Alright, I got carried away sketching that, but I think we've made progress here!

EDIT: to be clear, I don't think we can't use a real whole-graph SCC algorithm, everything has to be one SCC at a time (i.e. "find SCC this node is in") for it to work with incremental queries.

eddyb · 2020-02-05T04:37:14Z

This is a pessimistic approach to inlining, since we strictly have more calls in the validated_mir than we have in optimized_mir, but that's not a problem imo.

Is this true? Couldn't optimizations expose calls which were previously e.g. indirect?
Also, what about static dispatch being resolved by (partial) monomorphization during inlining?

src/librustc/query/mod.rs

oli-obk · 2020-02-05T08:14:55Z

I think your query helps both with caching and incremental red-green, but it's not necessary to get the correctness, you could "lookahead" by using optimized_mir(callee) in the inliner directly (unlike your query, there would be no stealing issue there AFAICT).

I'm not sure how that could work, since optimized_mir will trigger the inliner on callee, which then will call optimized_mir causing cycles

oli-obk · 2020-02-05T09:23:00Z

Couldn't optimizations expose calls which were previously e.g. indirect?

Yea, I guess const prop could turn (foo as fn())() into foo(), this could be solved by splitting the optimization pipeline into two queries. One "pre-inlining" and one with the inliner and everything after it.

Also, what about static dispatch being resolved by (partial) monomorphization during inlining?

I had a more complex approach before the one showed here. That approach also returned the substs of the function calls from the query so we can adjust them to the call site's substs. That's basically what inlining does right now. I scrapped it because I couldn't wrap my head around all the edge cases and wanted to publish something. I plan to deduplicate that logic with the inliner's logic so we have a single source of truth instead of having two schemes that cause cycles if they diverge.

nikomatsakis · 2020-02-05T17:43:16Z

I believe my plan was to abandon this constraint from @eddyb

EDIT: to be clear, I don't think we can't use a real whole-graph SCC algorithm, everything has to be one SCC at a time (i.e. "find SCC this node is in") for it to work with incremental queries.

Basically, we would have a single "call graph" query that walks the crate and constructs the call graph, presumably based on reading the "validated MIR". This query would be wrapped with other queries that extract the individual SCCs and return their contents -- with a bit of careful work on the keys, we could make it work so that if the contents of an SCC don't change, the query yields up green, and thus avoid the need to re-optimize things that contain inlining. (I am imagining something like: the key that identifies an SCC is a hash of its contents, perhaps? Or it is named by some def-id within which has the lowest hash. Something like that, so that it is not dependent on the order in which we traversed the call graph.)

Assumptions:

In an IDE, you would never do this; it's only done as part of batch compilation (i.e., when optimizing).
In an incremental context, most of the "Validated MIR" that you are asking for are going to be cached. (Is this true? Not sure, maybe we don't catch the intermediate MIR, I guess that could be a problem -- but we could address it by introducing an intermediate query that reads the validated MIR and then produces the list of callees, and caching just that.)
Similarly, the content of most SCCs will not change. Therefore, although all of the "optimized MIR" will be dependent on this "compute scc graph" query, there should mostly be green queries that lie in between, allowing us to avoid the work of recomputing it.

This strikes me as a fairly simple design that would likely work reasonably well? But I'm open to other alternatives for sure. I'm not sure how far off that is from the pseudo-code that @eddyb sketched.

eddyb · 2020-02-05T20:13:53Z

@nikomatsakis I thought we decided long ago that it's a no-starter, otherwise... we would've done if already, wouldn't we? We even had the SCC computation code and everything, I think we just removed it.

The problem with a monolith query is that it depends on every function in the crate.
So any time the MIR changes (or more realistically, a call is added/removed), anywhere in the crate, the whole query would have to re-run.

Sure, individual optimized_mir executions would be firewalled by the intermediary per-SCC query, but a single monolithic query still has to run... do we expect it to be cheap?

My alternative would be more expensive the first time, probably, but it should explore smaller parts of the callgraph later.

@oli-obk Can you make a crate-wide query that just queries your local_callees query for all DefIds with MIR, and make the inliner call it, even when it's not enabled by the mir-opt-level? Just to see if the global SCC computation is feasible in terms of dependencies, even with no actual data being produced.

nikomatsakis · 2020-02-05T20:20:31Z

I thought we decided long ago that it's a no-starter, otherwise... we would've done if already, wouldn't we? We even had the SCC computation code and everything, I think we just removed it.

I don't recall any such decision, but I could believe that there was a consensus reached, either without me, or that I've forgotten about.

So any time the MIR changes (or more realistically, a call is added/removed), anywhere in the crate, the whole query would have to re-run.

Yes. I am not convinced this would be a problem in practice, but it is definitely a downside. My hope was that this specific query would be "small enough" in scope not to be a big deal.

My alternative would be more expensive the first time, probably, but it should explore smaller parts of the callgraph later.

I guess I need to look more closely. It seems to me that one challenge here is that, if the query is "figure out which SCC this def-id is a part of", then we'd probably want to extend the query system, because otherwise you'll have to recompute that computation for each member of the SCC, right? (i.e., you could start from many different entry points)

bjorn3 · 2020-02-05T21:05:58Z

Would cycle error recovery be sound by discarding everything up to the oldest try_query call on a cycle, even when there is a newer try_query call. That way the query will be recomputed for every possible cycle entry point.

eg optimized_mir(a) -> try optimized_mir(b) -> try optimized_mir(a) would return the cycle error from try optimized_mir(b), not try optimized_mir(a) and the result of optimized_mir(b) will not be stored, thus not causing any difference depending on the cycle root.

The returning back to the oldest try_query could be implemented using:

unwinding
returning cycle error from newest try_query and not caching result for any queries returning until the oldest try_query returns.
same as previous, but returning a special kind of cycle error to request short circuiting of the query evaluation by eg returning a dummy value.

nikomatsakis · 2020-02-05T22:46:09Z

I do think we could extend the query system to permit cycles, potentially, and avoid caching for intermediate nodes, but it raises some interesting questions. There are various models for this. It also interacts with parallel rustc. I guess I just need to read this PR and @eddyb's comment in more detail so I actually understand what's being proposed here and stop talking about hypotheticals.

Zoxc · 2020-02-06T07:19:17Z

src/librustc_mir/transform/inline.rs

-                    // and is already optimized.
-                    self.tcx.optimized_mir(callsite.callee)
-                };
+                    let mut seen: FxHashSet<_> = Default::default();


I think this cycle check should be extracted to a function.

src/librustc_mir/transform/inline.rs

eddyb · 2020-02-06T08:37:25Z

because otherwise you'll have to recompute that computation for each member of the SCC, right? (i.e., you could start from many different entry points)

It bails out as soon as it finds another cycle member with a lower hash (i.e. "closer to the root"), but you're right that it might be too expensive (I think we should benchmark both your and my version).

Would cycle error recovery be sound by discarding everything up to the oldest try_query call on a cycle, even when there is a newer try_query call. That way the query will be recomputed for every possible cycle entry point.

Heh, that's what cyclotron's bruteforce implementation does. It's good enough for parsing with arbitrary context-free grammars (it handles left recursion, you just add LL and backtracking on top).

I don't think we can do that for optimized_mir (we don't really want to optimize more than once, although that changes if it's more fixpoint-like, but that requires properties which we can't guarantee of optimizations), but it might be interesting to rely on something like that for computing SCCs.

nikomatsakis · 2020-02-07T11:07:04Z

If we opt to have cycles discard caching for the contents of the cycle, it is reasonable, but it does mean that foo.query(bar) may have "inconsistent" results -- i.e., it can return distinct things depending on whether invoked from within a cycle or not. It's certainly not the right behavior for every scenario, but I think that for many scenarios it is what you want (in particular, for cases where you're attempting to do a search of a graph of possibilities).

EDIT: Just to clarify -- it's exactly the possibility of 'inconsistent' results that causes us problems using it for mir-optimize, since it means that the final results depend on query order, which we do not want.

EDIT: The prolog "tabling" solution would be to remember the result you got at the end and then re-run the query, but this time you use that result for any recursive results, and you require that it reaches a fixed point.

I had hoped we could get away with the approach that I was advocating for earlier in these sorts of scenarios, but I am open to the idea that it won't work. I feel though that we should at least try it before we reach that conclusion. As @eddyb said, building a few options and benchmarking would be ideal.

bjorn3 · 2020-02-07T11:18:16Z

EDIT: Just to clarify -- it's exactly the possibility of 'inconsistent' results that causes us problems using it for mir-optimize, since it means that the final results depend on query order, which we do not want.

What I meant was that at a cycle error everything after the first try_query is discarded, so for every possible entry point of the cycle a cycle error will be reported. on the next query of the cycle.

optimized_mir(a) <-> optimized_mir(b) will return a cycle error for the call of try optimized_mir(b) when calling optimized_mir(a) and vice versa. This way the query order doesn't matter.

eddyb · 2020-02-07T12:27:08Z

EDIT: The prolog "tabling" solution would be to remember the result you got at the end and then re-run the query, but this time you use that result for any recursive results, and you require that it reaches a fixed point.

Haha, I've been meaning to look into tabling for a while now.
That's exactly what I implemented in cyclotron and I got the idea specifically thinking about left-recursion during parsing (because CFG parsing is monotonic, so you make progress up to a fixpoint).

bors · 2020-03-01T04:49:00Z

☔ The latest upstream changes (presumably #69592) made this pull request unmergeable. Please resolve the merge conflicts.

camelid · 2021-01-23T20:19:24Z

failures:

---- [mir-opt] mir-opt/inline/cycle.rs stdout ----

13           StorageLive(_3);                 // scope 0 at $DIR/cycle.rs:4:5: 4:6
14           _3 = &_1;                        // scope 0 at $DIR/cycle.rs:4:5: 4:6
15           StorageLive(_4);                 // scope 0 at $DIR/cycle.rs:4:5: 4:8
-           _2 = <impl Fn() as Fn<()>>::call(move _3, move _4) -> [return: bb1, unwind: bb3]; // scope 0 at $DIR/cycle.rs:4:5: 4:8
+           _2 = <impl Fn() as Fn<()>>::call(move _3, move _4) -> bb1; // scope 0 at $DIR/cycle.rs:4:5: 4:8
17                                            // mir::Constant
18                                            // + span: $DIR/cycle.rs:4:5: 4:6
19                                            // + literal: Const { ty: for<'r> extern "rust-call" fn(&'r impl Fn(), ()) -> <impl Fn() as std::ops::FnOnce<()>>::Output {<impl Fn() as std::ops::Fn<()>>::call}, val: Value(Scalar(<ZST>)) }

24           StorageDead(_3);                 // scope 0 at $DIR/cycle.rs:4:7: 4:8
25           StorageDead(_2);                 // scope 0 at $DIR/cycle.rs:4:8: 4:9
26           _0 = const ();                   // scope 0 at $DIR/cycle.rs:3:20: 5:2
-           drop(_1) -> [return: bb2, unwind: bb4]; // scope 0 at $DIR/cycle.rs:5:1: 5:2
+           drop(_1) -> bb2;                 // scope 0 at $DIR/cycle.rs:5:1: 5:2
29   
30       bb2: {


31           return;                          // scope 0 at $DIR/cycle.rs:5:2: 5:2
-   
-   
-       bb3 (cleanup): {
-           drop(_1) -> bb4;                 // scope 0 at $DIR/cycle.rs:5:1: 5:2
-   
-   
-       bb4 (cleanup): {
-           resume;                          // scope 0 at $DIR/cycle.rs:3:1: 5:2
41   }
42

thread '[mir-opt] mir-opt/inline/cycle.rs' panicked at 'Actual MIR output differs from expected MIR output /checkout/src/test/mir-opt/inline/cycle.f.Inline.diff', src/tools/compiletest/src/runtest.rs:3452:25

oli-obk · 2021-01-25T09:35:07Z

@bors r=wesleywiser

wasm generates different MIR

bors · 2021-01-25T09:35:09Z

📌 Commit d38553c has been approved by wesleywiser

bors · 2021-01-25T09:35:46Z

⌛ Testing commit d38553c with merge 12c5b6d144fd893cecdab3539aa2723d0c860909...

rust-log-analyzer · 2021-01-25T09:58:16Z

The job dist-x86_64-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

    Checking crates-io v0.31.1 (/tmp/cargo/crates/crates-io)
error[E0283]: type annotations needed
   --> src/cargo/util/config/de.rs:530:63
    |
530 |                 seed.deserialize(Tuple2Deserializer(1i32, env.as_ref()))
    |                                                           |   |
    |                                                           |   |
    |                                                           |   cannot infer type for type parameter `T` declared on the trait `AsRef`
    |                                                           this method call resolves to `&T`
    |
    = note: cannot satisfy `std::string::String: AsRef<_>`
help: use the fully qualified path for the potential candidates
    |
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<OsStr>>::as_ref(env)))
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<std::path::Path>>::as_ref(env)))
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<str>>::as_ref(env)))
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<[u8]>>::as_ref(env)))

error: aborting due to previous error

For more information about this error, try `rustc --explain E0283`.

bors · 2021-01-25T10:00:46Z

💔 Test failed - checks-actions

oli-obk · 2021-01-25T12:51:35Z

@bors retry

bors · 2021-01-25T19:03:43Z

⌛ Testing commit d38553c with merge f4eb5d9...

bors · 2021-01-25T22:11:25Z

☀️ Test successful - checks-actions
Approved by: wesleywiser
Pushing f4eb5d9 to master...

Mark-Simulacrum · 2021-01-26T22:35:12Z

It looks like this was a moderate regression in instruction counts on several benchmarks (https://perf.rust-lang.org/compare.html?start=7fba12bb1d3877870758a7a53e2fe766bb19bd60&end=f4eb5d9f719cd3c849befc8914ad8ce0ddcf34ed&stat=instructions:u), though there was also some improvement. @oli-obk - was this expected? I guess there were some earlier perf runs that were much worse, so we've improved since then at least...

oli-obk · 2021-01-27T11:08:02Z

the previous runs here were with inlining activated. I should have re-run without inlining activated, but I forgot.

The results are really odd. When looking at the detailed perf (e.g. for keccak), it seems like typeck has gotten slower, which makes absolutely no sense considering this PR doesn't actually change anything without mir-opt-level=2... I'll start looking into this.

rust-highfive assigned eddyb Feb 4, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 4, 2020

matthewjasper reviewed Feb 4, 2020

View reviewed changes

src/librustc_mir/transform/mod.rs Outdated Show resolved Hide resolved

wesleywiser reviewed Feb 4, 2020

View reviewed changes

src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved

src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved

eddyb reviewed Feb 5, 2020

View reviewed changes

src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved

eddyb reviewed Feb 5, 2020

View reviewed changes

src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved

Zoxc reviewed Feb 5, 2020

View reviewed changes

src/librustc/query/mod.rs Outdated Show resolved Hide resolved

oli-obk force-pushed the inline_cycle branch from e4e3172 to 6e8588a Compare February 5, 2020 12:41

Zoxc reviewed Feb 6, 2020

View reviewed changes

src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 11, 2020

joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 23, 2020

joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 2, 2020

oli-obk added 2 commits January 23, 2021 16:51

Leave some notes for future changes to the MIR opt level of mir inlining

209889d

Move test to mir-opt so we actually see that no inlining is happening

03c6364

Ignore a test on wasm, because that changes landing pads

d38553c

oli-obk force-pushed the inline_cycle branch from 5104117 to d38553c Compare January 25, 2021 09:34

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2021

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 25, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 25, 2021

bors merged commit f4eb5d9 into rust-lang:master Jan 25, 2021

rustbot added this to the 1.51.0 milestone Jan 25, 2021

bors mentioned this pull request Jan 25, 2021

Box generator-related Body fields #81114

Merged

matthiaskrgr mentioned this pull request Jan 26, 2021

ICE: Encountered error .. selecting .. #81403

Closed

oli-obk deleted the inline_cycle branch January 27, 2021 11:02

oli-obk mentioned this pull request Jan 30, 2021

Tracking Issue for mir-inlining #81567

Open

5 tasks

pnkfelix mentioned this pull request Apr 22, 2022

change how MIR inlining handles cycles #43542

Closed

Prevent query cycles in the MIR inliner #68828

Prevent query cycles in the MIR inliner #68828

Uh oh!

Conversation

oli-obk commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewjasper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eddyb commented Feb 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddyb commented Feb 5, 2020

Uh oh!

Uh oh!

oli-obk commented Feb 5, 2020

Uh oh!

oli-obk commented Feb 5, 2020

Uh oh!

nikomatsakis commented Feb 5, 2020

Uh oh!

eddyb commented Feb 5, 2020

Uh oh!

nikomatsakis commented Feb 5, 2020

Uh oh!

bjorn3 commented Feb 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikomatsakis commented Feb 5, 2020

Uh oh!

Zoxc Feb 6, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eddyb commented Feb 6, 2020

Uh oh!

nikomatsakis commented Feb 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Feb 7, 2020

Uh oh!

eddyb commented Feb 7, 2020

Uh oh!

bors commented Mar 1, 2020

Uh oh!

camelid commented Jan 23, 2021

Uh oh!

oli-obk commented Jan 25, 2021

Uh oh!

bors commented Jan 25, 2021

Uh oh!

bors commented Jan 25, 2021

Uh oh!

rust-log-analyzer commented Jan 25, 2021

Uh oh!

bors commented Jan 25, 2021

Uh oh!

oli-obk commented Jan 25, 2021

Uh oh!

bors commented Jan 25, 2021

Uh oh!

bors commented Jan 25, 2021

Uh oh!

Mark-Simulacrum commented Jan 26, 2021

Uh oh!

oli-obk commented Jan 27, 2021

Uh oh!

Uh oh!

oli-obk commented Feb 4, 2020 •

edited

Loading

eddyb commented Feb 5, 2020 •

edited

Loading

bjorn3 commented Feb 5, 2020 •

edited

Loading

nikomatsakis commented Feb 7, 2020 •

edited

Loading