-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed optimization on array comparison #62531
Comments
I think there are 2 issues:
See this modified example: https://godbolt.org/z/86WRHF |
I may misunderstand but in the C version, does clang require similar conditions to optimize code? |
Those are preprocessor directives and are handled at compile time so their Rust equivalent would be:
|
https://doc.rust-lang.org/nightly/std/macro.cfg.html states that
|
Good point, those two are equal:
I don't know enough about const and I could be totally wrong but I think in the first case
It cannot make |
Interesting, If I look at MIR output, I think rustc already have enough information: MIR// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn compare() -> bool {
let mut _0: bool; // return place in scope 0 at src/lib.rs:2:21: 2:25
let _1: [u8; 4]; // "bytes" in scope 0 at src/lib.rs:3:9: 3:14
let mut _2: &[u8; 4]; // in scope 0 at src/lib.rs:4:5: 4:10
let mut _3: &[u8; 4]; // in scope 0 at src/lib.rs:4:14: 8:6
let _4: [u8; 4]; // in scope 0 at src/lib.rs:4:14: 8:6
let mut _5: bool; // in scope 0 at src/lib.rs:4:17: 4:44
scope 1 {
}
bb0: {
StorageLive(_1); // bb0[0]: scope 0 at src/lib.rs:3:9: 3:14
_1 = const core::f32::<impl f32>::to_ne_bytes(const 12.5f32) -> bb1; // bb0[1]: scope 0 at src/lib.rs:3:17: 3:38
// ty::Const
// + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
// + val: Scalar(<ZST>)
// mir::Constant
// + span: src/lib.rs:3:25: 3:36
// + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
// + literal: Const { ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}, val: Scalar(<ZST>) }
// ty::Const
// + ty: f32
// + val: Scalar(0x41480000)
// mir::Constant
// + span: src/lib.rs:3:17: 3:24
// + ty: f32
// + literal: Const { ty: f32, val: Scalar(0x41480000) }
}
bb1: {
StorageLive(_2); // bb1[0]: scope 1 at src/lib.rs:4:5: 4:10
_2 = &_1; // bb1[1]: scope 1 at src/lib.rs:4:5: 4:10
StorageLive(_3); // bb1[2]: scope 1 at src/lib.rs:4:14: 8:6
StorageLive(_4); // bb1[3]: scope 1 at src/lib.rs:4:14: 8:6
StorageLive(_5); // bb1[4]: scope 1 at src/lib.rs:4:17: 4:44
_5 = const false; // bb1[5]: scope 1 at src/lib.rs:4:17: 4:44
// ty::Const
// + ty: bool
// + val: Scalar(0x00)
// mir::Constant
// + span: src/lib.rs:4:17: 4:44
// + ty: bool
// + literal: Const { ty: bool, val: Scalar(0x00) }
switchInt(_5) -> [false: bb2, otherwise: bb3]; // bb1[6]: scope 1 at src/lib.rs:4:14: 8:6
}
bb2: {
_4 = [const 0u8, const 0u8, const 72u8, const 65u8]; // bb2[0]: scope 1 at src/lib.rs:7:9: 7:33
// ty::Const
// + ty: u8
// + val: Scalar(0x00)
// mir::Constant
// + span: src/lib.rs:7:10: 7:14
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x00) }
// ty::Const
// + ty: u8
// + val: Scalar(0x00)
// mir::Constant
// + span: src/lib.rs:7:16: 7:20
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x00) }
// ty::Const
// + ty: u8
// + val: Scalar(0x48)
// mir::Constant
// + span: src/lib.rs:7:22: 7:26
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x48) }
// ty::Const
// + ty: u8
// + val: Scalar(0x41)
// mir::Constant
// + span: src/lib.rs:7:28: 7:32
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x41) }
goto -> bb4; // bb2[1]: scope 1 at src/lib.rs:4:14: 8:6
}
bb3: {
_4 = [const 65u8, const 72u8, const 0u8, const 0u8]; // bb3[0]: scope 1 at src/lib.rs:5:9: 5:33
// ty::Const
// + ty: u8
// + val: Scalar(0x41)
// mir::Constant
// + span: src/lib.rs:5:10: 5:14
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x41) }
// ty::Const
// + ty: u8
// + val: Scalar(0x48)
// mir::Constant
// + span: src/lib.rs:5:16: 5:20
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x48) }
// ty::Const
// + ty: u8
// + val: Scalar(0x00)
// mir::Constant
// + span: src/lib.rs:5:22: 5:26
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x00) }
// ty::Const
// + ty: u8
// + val: Scalar(0x00)
// mir::Constant
// + span: src/lib.rs:5:28: 5:32
// + ty: u8
// + literal: Const { ty: u8, val: Scalar(0x00) }
goto -> bb4; // bb3[1]: scope 1 at src/lib.rs:4:14: 8:6
}
bb4: {
_3 = &_4; // bb4[0]: scope 1 at src/lib.rs:4:14: 8:6
_0 = const <[u8; 4] as std::cmp::PartialEq>::eq(move _2, move _3) -> bb5; // bb4[1]: scope 1 at src/lib.rs:4:5: 8:6
// ty::Const
// + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
// + val: Scalar(<ZST>)
// mir::Constant
// + span: src/lib.rs:4:5: 8:6
// + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
// + literal: Const { ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}, val: Scalar(<ZST>) }
}
bb5: {
StorageDead(_3); // bb5[0]: scope 1 at src/lib.rs:8:5: 8:6
StorageDead(_2); // bb5[1]: scope 1 at src/lib.rs:8:5: 8:6
StorageDead(_1); // bb5[2]: scope 0 at src/lib.rs:9:1: 9:2
StorageDead(_5); // bb5[3]: scope 0 at src/lib.rs:9:1: 9:2
StorageDead(_4); // bb5[4]: scope 0 at src/lib.rs:9:1: 9:2
return; // bb5[5]: scope 0 at src/lib.rs:9:2: 9:2
}
} |
@mati865 Whether a function is marked |
Rust now generates the following (almost optimal) assembly for the OP's code: example::compare:
push rax
mov al, 1
pop rcx
ret I would guess that recent improvements to const propagation in the MIR are responsible for this (thanks @wesleywiser!), but it might also have been a change in LLVM. This seems like a decent candidate for a codegen regression test, although an equivalent one might already exist. |
It is better now. But are |
I assume it is due to a difference in calling convention, but it does seem kind of silly. You'll need to ask someone more knowledgeable unfortunately. |
@lzutao A quick google leads me to believe that the |
So does it: https://stackoverflow.com/a/37774474/5456794. Looks like only regression tests are needed. |
This version doesn't need any stack alignment: https://godbolt.org/z/6donWz pub fn compare() -> bool {
let gf_u = 12.5f32;
#[cfg(target_endian = "big")]
return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
#[cfg(not(target_endian = "big"))]
return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
} |
I tried to leave this alone since I'm not very knowledgeable in this area, but curiosity got the better of me. The following might be incorrect, but it's my best guess. In the optimized LLVM IR for rust, a call to At some point during translation to x86 assembly, the Perhaps someone more familiar with the various LLVM passes could say more. BTW, a fairer C++ comparison is the following, but this gives the same results: int compare() {
uint8_t big[4] = {0x41, 0x48, 0x00, 0x00};
uint8_t lit[4] = {0x00, 0x00, 0x48, 0x41};
ieee_float_shape_type gf_u;
gf_u.value = 12.5;
if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) {
return *(uint32_t *)gf_u.bytes == *(uint32_t *)big;
} else {
return *(uint32_t *)gf_u.bytes == *(uint32_t *)lit;
}
} |
There's some discussion of this in Zulip. I was indeed wrong BTW 😄. |
This issue appears to be fixed on beta. Maybe we need a codegen test for it. Edit: It seems that the llvm-ir generated by rustc are the same between beta and stable. |
The issue seems to have re-appeared in master :( |
Both example::compare:
mov al, 1
ret |
Add codegen test for array comparision opt Fixed since rust 1.55 closes rust-lang#62531
Add codegen test for array comparision opt Fixed since rust 1.55 closes rust-lang#62531
The godbolt link: https://godbolt.org/z/dc9o3x
I think this snippet should just return
true
:The generated asm with
opt-level=3
:The text was updated successfully, but these errors were encountered: