Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient codegen for eq of enum with identical variants #113506

Open
adrian17 opened this issue Jul 9, 2023 · 7 comments
Open

Inefficient codegen for eq of enum with identical variants #113506

adrian17 opened this issue Jul 9, 2023 · 7 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/

Comments

@adrian17
Copy link

adrian17 commented Jul 9, 2023

Given an enum:

type Pointer = *const ();
#[derive(PartialEq, Eq)]
pub enum MyEnum {
    A(Pointer),
    B(Pointer),
    C(Pointer),
    D(Pointer),
    E(Pointer),
    F(Pointer),
    Any,
}

The generated comparison uses jump table despite every jump target being identical:
https://godbolt.org/z/6nWW53eb5
As far as I'm aware, this jump should have been optimized away entirely, probably to something like

example::compare:
        mov     rax, qword ptr [rdi]
        cmp     rax, qword ptr [rsi]
        jne     .LBB0_1
        cmp     rax, 5
        ja      .LBB0_3
-        lea     rcx, [rip + .LJTI0_0]
-        movsxd  rax, dword ptr [rcx + 4*rax]
-        add     rax, rcx
-        jmp     rax
-.LBB0_5:
        mov     rax, qword ptr [rdi + 8]
        cmp     rax, qword ptr [rsi + 8]
        sete    al
        ret

With WASM target, it's even worse, as it doesn't deduplicate the identical match arms at all:
https://godbolt.org/z/Gf16rM4MY

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 9, 2023
@saethlin
Copy link
Member

saethlin commented Jul 9, 2023

Looks similar to #55030

I minimized this a bit to

#[derive(PartialEq, Eq)]
pub enum MyEnum {
    A(u64),
    B(u64),
    C(u64),
}

pub fn compare(a: &MyEnum, b: &MyEnum) -> bool {
    *a == *b
}

Which emits different LLVM IR with -O and -Copt-level=3. Both have a switch which leads to the same blocks. For 3 variants, if I make the payload u32 something cleans up the jump table when lowering to x86_64. Adding another variant inhibits that.

Here is the IR for the code above with -Copt-level=3:

define noundef zeroext i1 @_ZN7example7compare17had18f8c96abc6decE(ptr noalias nocapture noundef readonly align 4 dereferenceable(8) %a, ptr noalias nocapture noundef readonly align 4 dereferenceable(8) %b) unnamed_addr #0 !dbg !6 {
start:
  %a.val = load i32, ptr %a, align 4, !dbg !11, !range !12, !noundef !10
  %0 = getelementptr i8, ptr %a, i64 4, !dbg !11
  %a.val1 = load i32, ptr %0, align 4, !dbg !11
  %b.val = load i32, ptr %b, align 4, !dbg !11, !range !12, !noundef !10
  %1 = getelementptr i8, ptr %b, i64 4, !dbg !11
  %b.val2 = load i32, ptr %1, align 4, !dbg !11
  %_5.i = icmp eq i32 %a.val, %b.val, !dbg !13
  br i1 %_5.i, label %bb2.i, label %"_ZN56_$LT$example..MyEnum$u20$as$u20$core..cmp..PartialEq$GT$2eq17h895d84d9838af47eE.exit", !dbg !13

bb2.i:                                            ; preds = %start
  switch i32 %a.val, label %bb5.i [
    i32 0, label %bb4.i
    i32 1, label %bb6.i
    i32 2, label %bb7.i
  ], !dbg !13

bb5.i:                                            ; preds = %bb2.i
  unreachable, !dbg !19

bb4.i:                                            ; preds = %bb2.i
  %2 = icmp eq i32 %a.val1, %b.val2, !dbg !21
  br label %"_ZN56_$LT$example..MyEnum$u20$as$u20$core..cmp..PartialEq$GT$2eq17h895d84d9838af47eE.exit", !dbg !23

bb6.i:                                            ; preds = %bb2.i
  %3 = icmp eq i32 %a.val1, %b.val2, !dbg !24
  br label %"_ZN56_$LT$example..MyEnum$u20$as$u20$core..cmp..PartialEq$GT$2eq17h895d84d9838af47eE.exit", !dbg !23

bb7.i:                                            ; preds = %bb2.i
  %4 = icmp eq i32 %a.val1, %b.val2, !dbg !26
  br label %"_ZN56_$LT$example..MyEnum$u20$as$u20$core..cmp..PartialEq$GT$2eq17h895d84d9838af47eE.exit", !dbg !23

"_ZN56_$LT$example..MyEnum$u20$as$u20$core..cmp..PartialEq$GT$2eq17h895d84d9838af47eE.exit": ; preds = %start, %bb4.i, %bb6.i, %bb7.i
  %.0.i = phi i1 [ false, %start ], [ %4, %bb7.i ], [ %3, %bb6.i ], [ %2, %bb4.i ]
  ret i1 %.0.i, !dbg !28
}

It is technically possible to optimize out these duplicate match arms before LLVM, but probably waste of time given the current state of MIR. I think special-casing the derive macro might be better, if we don't want to wait for LLVM.

@saethlin saethlin added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. I-heavy Issue: Problems and improvements with respect to binary size of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 9, 2023
@dianqk
Copy link
Member

dianqk commented Jul 22, 2023

Upstream issue: llvm/llvm-project#64031.

@dianqk
Copy link
Member

dianqk commented Aug 10, 2023

@rustbot claim

@dianqk
Copy link
Member

dianqk commented Sep 20, 2023

@rustbot label llvm-fixed-upstream

@rustbot
Copy link
Collaborator

rustbot commented Sep 20, 2023

Error: Label llvm-fixed-upstream can only be set by Rust team members

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@Noratrieb Noratrieb added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Sep 20, 2023
@nikic nikic removed the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Feb 14, 2024
@nikic
Copy link
Contributor

nikic commented Feb 14, 2024

This did not get fixed by the LLVM 18 upgrade. I believe the problem is that we can only hoist instructions from some of the successors, so we'd need to perform some kind of edge splitting.

@dianqk
Copy link
Member

dianqk commented Mar 6, 2024

This did not get fixed by the LLVM 18 upgrade. I believe the problem is that we can only hoist instructions from some of the successors, so we'd need to perform some kind of edge splitting.

Well, indeed, now otherwise it's not unreachable. I'm not sure splitting edge is always beneficial, but for this example is ok. I think either #121397 or #120268 might fix it.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/
Projects
None yet
Development

No branches or pull requests

7 participants