Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM/MIR optimizations to simplify the new RFC3058 ? desugaring #85133

Closed
scottmcm opened this issue May 10, 2021 · 18 comments
Closed

LLVM/MIR optimizations to simplify the new RFC3058 ? desugaring #85133

scottmcm opened this issue May 10, 2021 · 18 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. F-try_trait_v2 Tracking issue for RFC#3058 I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@scottmcm
Copy link
Member

With #84767 implementing the new ? desugaring, there's no mir-opt that can simplify the double-match it emits, as the arm identity pass only does Result->Result, and thus cannot handle Result->ControlFlow->Result.

That's not a blocker for checkin, as the arm identity pass is disabled by default currently because of things like #78628 and it depends on the MIR inliner being enabled anyway, but it sure would be nice to do better here.

Zulip conversation with some more context: https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/Suggestions.20for.20a.20beginner/near/237739360

@scottmcm scottmcm added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-mir-opt Area: MIR optimizations F-try_trait_v2 Tracking issue for RFC#3058 labels May 10, 2021
@scottmcm scottmcm added the E-help-wanted Call for participation: Help is requested to fix this issue. label May 10, 2021
@Moxinilian
Copy link
Contributor

I am interested in taking this one as a first contribution. I will probably have time to work on it this week.

@Moxinilian
Copy link
Contributor

Moxinilian commented May 21, 2021

@michaelmaitland
Copy link

I see this issue marked as open. Is it being worked on or available for me to pickup?

@Moxinilian
Copy link
Contributor

Moxinilian commented Jul 13, 2021

It is being worked on, see #85646. Any suggestion/help is welcome, though!

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 25, 2021
…jgillot

MIR opt: separate constant predecessors of a switch

For each block S ending with a switch, this pass copies S for each of S's predecessors that seem to assign the value being switched over as a const. This is done using a somewhat simple heuristic to determine what seems to be a const transitively.

More precisely, this is what the pass does:
- find a block that ends in a switch
- track if there is an unique place set before the current basic block that determines the result of the switch (this is the part that resolves switching over discriminants)
- if there is, iterate over the parents that have a reasonable terminator and find if the found determining place is likely to be (transitively) set from a const within that parent block
- if so, add the corresponding edge to a vector of edges to duplicate
- once this is done, iterate over the found edges: copy the target block and replace the reference to the target block in the origin block with the new block

This pass is not optimal and could probably duplicate in more cases, but the intention was mostly to address cases like in rust-lang#85133 or rust-lang#85365, to avoid creating new enums that get destroyed immediately afterwards (notably making the new try v2 `?` desugar zero-cost).

A benefit of this pass working the way it does is that it is easy to ensure its correctness: the worst that can happen is for it to needlessly copy a basic block, which is likely to be destroyed by cleanup passes afterwards. The complex parts where aliasing matters are only heuristics and the hard work is left to further passes like ConstProp.

# LLVM blocker

Unfortunately, I believe it would be unwise to enable this optimization by default for now. Indeed, currently switch lowering passes like SimplifyCFG in LLVM lose the information on the set of possible variant values, which means it tends to actually generate worse code with this optimization enabled. A fix would have to be done in LLVM itself. This is something I also want to look into. I have opened [a bug report at the LLVM bug tracker](https://bugs.llvm.org/show_bug.cgi?id=50455).

When this is done, I hope we can enable this pass by default. It should be fairly fast and I think it is beneficial in many cases. Notably, it should be a sound alternative to simplify-arm-identity. By the way, ConstProp only seems to pick up the optimization in functions that are not generic. This is however most likely an issue in ConstProp that I will look into afterwards.

This is my first contribution to rustc, and I would like to thank everyone on the Zulip mir-opt chat for the help and support, and especially `@scottmcm` for the guidance.
@BGR360
Copy link
Contributor

BGR360 commented Aug 23, 2021

Is this issue now resolved after @Moxinilian 's work?

@Moxinilian
Copy link
Contributor

Moxinilian commented Aug 23, 2021

Right now it is not. Right now this is blocked by LLVM doing a sub-optimal transformation that loses the optimization at a very early stage, preventing the duplicated tails to be merged back into one. I've tried investigating if we could not merge them back directly in MIR. While possible in theory, my understanding was that MIR still abstracts away too much details (notably on the discriminant aspect) for the algorithm to be reasonably complex. So right now we either need a breakthrough on that front or work in the LLVM codebase. I want to start investigating more in the LLVM side of things, but I have not successfully allocated enough time for it as of now.

@Moxinilian
Copy link
Contributor

Moxinilian commented Aug 23, 2021

More on simplifying in MIR:

What LLVM is struggling with is keeping the fact that variant switches are exhaustive, so it can merge the blocks that do the same thing except for different constants (representing the variant IDs). What we could do is do that merging directly in MIR, in a pass close to the end of the pipeline. A naive approach would be to take a switch-terminated block bb that ends in (semantically)

if x == 1: goto bb1
else if x == 2: goto bb2
else: unreachable

Then, replace any instance of the constant 1 in bb1 with x before x is reassigned and every instance of 2 in bb2 with x before x is reassigned. Now, if the new versions of bb1 and bb2 are equal, the blocks can safely be merged. That seems to be sound to me but it would not be very useful here because:

  • Maybe other optimizations transformed the blocks asymmetrically in such a way that they are semantically equivalent but not syntactically equal (for example if an useless affectation got inserted for some reason). One would need to make a canonicalization pass to fix this, which probably could resolve some of those issues. However, the second point is the most important:
  • Discriminants must be constant in MIR form, so we can't replace them with an use of x, even though later down the pipeline it would be lowered into that. This kills pretty much all of our use cases for this issue.

Note that if MIR was in SSA form, this would probably be more effective in practice because x would never be re-defined.

@Moxinilian
Copy link
Contributor

More on fixing LLVM:

Right now, rustc calls into LLVM with a pass pipeline that starts with the SimplifyCFG LLVM pass that uses the LowerSwitch utility to turn switches into lower level constructs. An issue with that pass is that it is opinionated in what is worth simplifying away, which includes unreachable switch targets. So for example, the following LLVM IR:

define i64 @demo(i64 %x) {
entry:
    switch i64 %x, label %bb3 [
        i64 0, label %bb1
        i64 1, label %bb2
    ]
bb1:
    ret i64 0
bb2:
    %0 = icmp eq i64 %x, 100 ; this will necessarily be false because %x == 1
    br i1 %0, label %bb4, label %bb5
bb3:
    unreachable
bb4:
    ret i64 200
bb5:
    ret i64 10
}

is turned into

define i64 @demo(i64 %x) {
entry:
    %switch = icmp ult i64 %x, 1
    %0 = icmp eq i64 %x, 100
    %. = select i1 %0, i64 200, i64 10
    %merge = select i1 %switch, i64 0, i64 %.
    ret i64 %merge
}

which throws away the exhaustive nature of the switch, making in this example further passes not able to optimize the useless comparison away.

The code of LowerSwitch seems to contain an explicit part where it removes unreachable branches in switches. Forcing this off might change this behavior and make it work as intended. I have not gotten to try that yet, but plan to.

@BGR360
Copy link
Contributor

BGR360 commented Aug 24, 2021

Oh, that's annoying. But very intriguing. Thanks for the explanation!

LebedevRI added a commit to llvm/llvm-project that referenced this issue Feb 17, 2022
…nge comparison and branch until after at least the IPSCCP

That transformation is lossy, as discussed in
#53853
and rust-lang/rust#85133 (comment)

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes #53853
Refs. rust-lang/rust#85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
@LebedevRI
Copy link

FWIW this has been fixed in llvm trunk in llvm/llvm-project@371fcb7.

LebedevRI added a commit to LebedevRI/llvm-project that referenced this issue Feb 23, 2022
…nge comparison and branch until after at least the IPSCCP

That transformation is lossy, as discussed in
llvm#53853
and rust-lang/rust#85133 (comment)

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes llvm#53853
Refs. rust-lang/rust#85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
@nikic
Copy link
Contributor

nikic commented Feb 27, 2022

So ... what is the actual Rust code we're supposed to be optimizing here? Is the goal to get rid of the icmp + zext in https://rust.godbolt.org/z/nd9fEx3ed or is it something else? (The referenced LLVM change does not get rid of those.)

@scottmcm
Copy link
Member Author

@nikic That's right. try { x? } not being a nop, as in your example, is the easiest demonstration.

As I understand it, the icmp shows up because there's a trunc in there originally, but LLVM has lost track of the fact that the truncation doesn't ever wrap.

(My uninformed thought: It's odd to me that there's no nsw/nuw flags for trunc, as a way to say "I promise this came from a [sz]ext". If there were, then we could emit those flags in the discriminant loading, and avoid the icmp.)

@scottmcm
Copy link
Member Author

Actually, here's a better cut-down version of the example. This is (essentially) the first part of ?, but with no trait calls involved to muddle things: https://rust.godbolt.org/z/MWxdTcGn6

use std::ops::ControlFlow;
pub fn demo(x: Result<u32, i32>) -> ControlFlow<i32, u32> {
    match x {
        Ok(v) => ControlFlow::Continue(v),
        Err(e) => ControlFlow::Break(e),
    }
}

Ok and Continue are both discriminant 0; Err and Break are both discriminant 1.

It looks like the core of what we emit there is

  %8 = load i32, i32* %7, align 4, !dbg !10, !range !11
  %_2 = zext i32 %8 to i64, !dbg !10
  switch i64 %_2, label %bb2 [
    i64 0, label %bb3
    i64 1, label %bb1
  ], !dbg !12

where that range metadata is

!11 = !{i32 0, i32 2}

So AFAIK that PR above was about trying to avoid turning the switch into an icmp+br, which lost that range info.

@nikic
Copy link
Contributor

nikic commented Feb 27, 2022

Okay, so I think the PhaseOrdering change is necessary to avoid folding the switch before SROA (of course, the alternative solution is to make SROA emit range assumptions). But it's not sufficient:

*** IR Dump After InstCombinePass on _ZN4test4test17h60e5e3f4ec590b13E ***
; Function Attrs: nonlazybind uwtable
define { i64, i64 } @_ZN4test4test17h60e5e3f4ec590b13E(i64 %x.0, i64 %x.1) unnamed_addr #0 {
start:
  %0 = call fastcc { i64, i64 } @"_ZN79_$LT$core..result..Result$LT$T$C$E$GT$$u20$as$u20$core..ops..try_trait..Try$GT$6branch17hed04f6fc65ae78f0E"(i64 %x.0, i64 %x.1)
  %.fca.0.extract1 = extractvalue { i64, i64 } %0, 0
  %.fca.1.extract2 = extractvalue { i64, i64 } %0, 1
  switch i64 %.fca.0.extract1, label %bb3 [
    i64 0, label %bb2
    i64 1, label %bb4
  ]

bb3:                                              ; preds = %start
  unreachable

bb2:                                              ; preds = %start
  br label %bb6

bb4:                                              ; preds = %start
  %1 = call fastcc i64 @"_ZN153_$LT$core..result..Result$LT$T$C$F$GT$$u20$as$u20$core..ops..try_trait..FromResidual$LT$core..result..Result$LT$core..convert..Infallible$C$E$GT$$GT$$GT$13from_residual17h7fc30301c010dde1E"(i64 %.fca.1.extract2)
  br label %bb6

bb6:                                              ; preds = %bb2, %bb4
  %.sroa.3.0 = phi i64 [ %1, %bb4 ], [ %.fca.1.extract2, %bb2 ]
  %.sroa.0.0 = phi i64 [ 1, %bb4 ], [ 0, %bb2 ]
  %2 = insertvalue { i64, i64 } undef, i64 %.sroa.0.0, 0
  %3 = insertvalue { i64, i64 } %2, i64 %.sroa.3.0, 1
  ret { i64, i64 } %3
}
*** IR Dump After SimplifyCFGPass on _ZN4test4test17h60e5e3f4ec590b13E ***
; Function Attrs: nonlazybind uwtable
define { i64, i64 } @_ZN4test4test17h60e5e3f4ec590b13E(i64 %x.0, i64 %x.1) unnamed_addr #0 {
start:
  %0 = call fastcc { i64, i64 } @"_ZN79_$LT$core..result..Result$LT$T$C$E$GT$$u20$as$u20$core..ops..try_trait..Try$GT$6branch17hed04f6fc65ae78f0E"(i64 %x.0, i64 %x.1)
  %.fca.0.extract1 = extractvalue { i64, i64 } %0, 0
  %.fca.1.extract2 = extractvalue { i64, i64 } %0, 1
  %switch = icmp ult i64 %.fca.0.extract1, 1
  br i1 %switch, label %bb6, label %bb4

bb4:                                              ; preds = %start
  %1 = call fastcc i64 @"_ZN153_$LT$core..result..Result$LT$T$C$F$GT$$u20$as$u20$core..ops..try_trait..FromResidual$LT$core..result..Result$LT$core..convert..Infallible$C$E$GT$$GT$$GT$13from_residual17h7fc30301c010dde1E"(i64 %.fca.1.extract2)
  br label %bb6

bb6:                                              ; preds = %start, %bb4
  %.sroa.3.0 = phi i64 [ %1, %bb4 ], [ %.fca.1.extract2, %start ]
  %.sroa.0.0 = phi i64 [ 1, %bb4 ], [ 0, %start ]
  %2 = insertvalue { i64, i64 } undef, i64 %.sroa.0.0, 0
  %3 = insertvalue { i64, i64 } %2, i64 %.sroa.3.0, 1
  ret { i64, i64 } %3
}

We'd need either InstCombine or SimplifyCFG to fold %.sroa.0.0 into %.fca.0.extract1.

I believe SimplifyCFG would do that as part of ForwardSwitchConditionToPHI, but this is only done at the very end of the pipeline.

@nikic
Copy link
Contributor

nikic commented Feb 27, 2022

Just remembered that we have https://github.com/llvm/llvm-project/blob/fdfe26ddbeb1a5eb6106a6a27d6f8240deca543f/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp#L1256 in InstCombine. I guess it would be natural to extend to that to switches and not just conditional branches.

nikic pushed a commit to nikic/llvm-project that referenced this issue Mar 2, 2022
…nge comparison and branch until after at least the IPSCCP

That transformation is lossy, as discussed in
llvm#53853
and rust-lang/rust#85133 (comment)

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes llvm#53853
Refs. rust-lang/rust#85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854

(cherry picked from commit 371fcb7)
tstellar pushed a commit to llvmbot/llvm-project that referenced this issue Mar 9, 2022
…nge comparison and branch until after at least the IPSCCP

That transformation is lossy, as discussed in
llvm#53853
and rust-lang/rust#85133 (comment)

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes llvm#53853
Refs. rust-lang/rust#85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
@nikic
Copy link
Contributor

nikic commented Mar 18, 2022

Upstream fix is llvm/llvm-project@4010a7a.

@nikic nikic self-assigned this Mar 18, 2022
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Mar 18, 2022
@nikic
Copy link
Contributor

nikic commented Aug 30, 2022

The LLVM side of this is fixed now, and #100693 added a codegen test. Is a MIR transform still needed?

@nikic nikic removed their assignment Aug 30, 2022
@scottmcm scottmcm changed the title Wishlist: a MIR optimization to simplify the new RFC3058 ? desugaring LLVM/MIR optimizations to simplify the new RFC3058 ? desugaring Aug 30, 2022
@scottmcm
Copy link
Member Author

I'll close this one because it does look like the match-of-match, as ? does, is now working great.

Unfortunately, however, when just matching the result directly and rebuilding, it seems to only work for some widths. The codegen test works with Result<i32, u32>, but doesn't optimize away any more for Result<i64, u64>. I've opened #101210 to track that separately, @nikic .

@scottmcm scottmcm removed the E-help-wanted Call for participation: Help is requested to fix this issue. label Aug 30, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
…nge comparison and branch until after at least the IPSCCP

That transformation is lossy, as discussed in
llvm/llvm-project#53853
and rust-lang/rust#85133 (comment)

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes llvm/llvm-project#53853
Refs. rust-lang/rust#85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. F-try_trait_v2 Tracking issue for RFC#3058 I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

6 participants