Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization with short circuit AND #103327

Closed
y21 opened this issue Oct 20, 2022 · 7 comments · Fixed by #109895
Closed

Missed optimization with short circuit AND #103327

y21 opened this issue Oct 20, 2022 · 7 comments · Fixed by #109895
Assignees
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@y21
Copy link
Member

y21 commented Oct 20, 2022

Consider this code:

pub fn other(a: i32, b: i32) -> bool {
    let c1 = (a >= 0) & (a <= 10);
    let c2 = (b >= 0) & (b <= 20);

    if c1 & c2 {
        a + 100 != b
    } else {
        true
    }
}

This compiles to just:

example::other:
        mov     al, 1
        ret

(Godbolt)
However, changing any of the & to && breaks this optimization and a bunch of cmps are generated as well as the addition, which seems unnecessary in this case, because &/&& shouldn't make any difference as short circuit can't be observed.

example::other:
        mov     al, 1
        cmp     edi, 10
        ja      .LBB0_3
        cmp     esi, 20
        ja      .LBB0_3
        add     edi, 100
        cmp     edi, esi
        setne   al
.LBB0_3:
        ret

(Godbolt)
This works fine with Clang

Doesn't seem like an LLVM issue (to me?). The LLVM IR that Rust emits The resulting LLVM IR:

define zeroext i1 @test(i32 %a, i32 %b) unnamed_addr #0 {
start:
  %0 = icmp ult i32 %a, 11
  %1 = icmp ult i32 %b, 21
  %_13.0 = select i1 %0, i1 %1, i1 false
  br i1 %_13.0, label %bb4, label %bb6

bb4:                                              ; preds = %start
  %_16 = add nuw nsw i32 %a, 100
  %2 = icmp ne i32 %_16, %b
  br label %bb6

bb6:                                              ; preds = %start, %bb4
  %.0 = phi i1 [ %2, %bb4 ], [ true, %start ]
  ret i1 %.0
}

correctly gets optimized to this with opt -O3 in -S -o out

define zeroext i1 @test(i32 %a, i32 %b) unnamed_addr #0 {
start:
  ret i1 true
}

(Godbolt)

@y21 y21 added the C-bug Category: This is a bug. label Oct 20, 2022
@chenyukang
Copy link
Member

@rustbot claim

@Rageking8
Copy link
Contributor

@rustbot label +T-compiler +A-codegen +I-slow

@rustbot rustbot added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 21, 2022
@chenyukang chenyukang removed their assignment Oct 21, 2022
@y21
Copy link
Member Author

y21 commented Oct 22, 2022

Seems like I misunderstood what the "Build LLVM IR" button on the Rust playground does (I thought it shows the LLVM IR that rustc emits and not the resulting, final IR after optimization passes).

It looks like LLVM started recognizing this pattern in version 13 (I'm assuming opt 13.0.0 on godbolt means it uses LLVM version 13):

Though that still doesn't explain why this doesn't happen when compiled with rust. I'm on 1.66-nightly and that uses LLVM version 15.0.2 as far as I can tell (rustc -Vv), so presumably it should be able to recognize this pattern and further reduce it to ret true like it does with opt 13. Does opt do additional passes that rustc doesn't do that could prevent this?

@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 22, 2022
@nikic
Copy link
Contributor

nikic commented Oct 22, 2022

This pattern is currently only handled by IPSCCP, which only runs very early. CVP doesn't handle it, because it currently only optimizes conditions with a constant operand. In principle, it can be extended to handle this with some extensions to LVI.

The alternative is to get the IR into the necessary form prior to IPSCCP. I believe in this case the problem is that early SimplifyCFG runs before SROA, and we need SROA to simplify the conditions stored into variables first. Swapping the passes would probably fix this, but I vaguely remember there being some complication with doing that.

@nikic
Copy link
Contributor

nikic commented Oct 24, 2022

The alternative is to get the IR into the necessary form prior to IPSCCP. I believe in this case the problem is that early SimplifyCFG runs before SROA, and we need SROA to simplify the conditions stored into variables first. Swapping the passes would probably fix this, but I vaguely remember there being some complication with doing that.

The reason this doesn't work is that we currently rely on SimplifyCFG not optimizing branches into selects this early, so IPSCCP still sees proper branches. Would have to add an extra option to disables this in the early SimplifyCFG run.

@nikic
Copy link
Contributor

nikic commented Nov 2, 2022

Upstream patch: https://reviews.llvm.org/D137253

@nikic nikic self-assigned this Nov 2, 2022
@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Apr 3, 2023
@nikic
Copy link
Contributor

nikic commented Apr 3, 2023

Fixed by LLVM 16 upgrade.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Apr 11, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 12, 2023
@bors bors closed this as completed in 73f40d4 Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants