-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Description
Given val: u8, slice: &[u8; 8] and arr: [u8; 8], I expected the following statements to compile down to the same thing :
// a
slice.contains(&val);
// b
slice[0] == val
|| slice[1] == val
|| slice[2] == val
|| slice[3] == val
|| slice[4] == val
|| slice[5] == val
|| slice[6] == val
|| slice[7] == val;
// c
arr.contains(&val);
// d
arr[0] == val
|| arr[1] == val
|| arr[2] == val
|| arr[3] == val
|| arr[4] == val
|| arr[5] == val
|| arr[6] == val
|| arr[7] == val;However, the resulting assembly differs quite a lot:
- The
astatement compiles down to a loop, checking one element at a time, except forT = u8|i8andN < 16where it instead call fall on the fast path ofmemchrwhich gets optimized a little bit better. - The
bstatement compiles down to a unrolled-loop, checking one element at a time in a branchless fashion. Most of the time it doesn't give any SIMD instructions. - The
cstatement always compiles down to a loop, checking one element at a time, except forT = u8|i8andN >= 16where it instead callmemchr_general_case - The
dstatement always compiles down to a few branchless SIMD instructions for any primitive type used and any array size.
Because the slice/array size is known at compile-time and the type checker guarantees that it will be respected by any calling function, I expected the compiler to take this into account while optimizing the resulting assembly. However, this information seems to be lost at some point when using the contains method.
arr.contains(&val) and slice.contains(&val) are simplified as arr.as_ref().iter().any(|e| *e == val) and slice.iter().any(|e| *e == val) if I'm not mistaken (which is wierd because for some N and T, they don't yield the same assembly). The compiler does not seem to be able to unroll this case.
godbolt links for
T=u8; N=8
T=u16; N=8
T=u32; N=8
T=u64; N=8