-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX512 code generated for i32 array sum is worse than code by clang 5 #48287
Comments
Funny, when I change 16 to 17 in the rust code pub struct v {
val:[i32;17]
}
pub fn test(a:v, b:v) -> v {
let mut res = v { val : [0;17] };
for i in 0..17 {
res.val[i] = a.val[i] + b.val[i];
}
return res;
} I get example::test:
push rbp
mov rbp, rsp
sub rsp, 72
mov dword ptr [rbp - 8], 0
mov qword ptr [rbp - 16], 0
vmovdqu32 zmm0, zmmword ptr [rdx]
vpaddd zmm0, zmm0, zmmword ptr [rsi]
vmovdqu32 zmmword ptr [rbp - 72], zmm0
mov eax, dword ptr [rdx + 64]
add eax, dword ptr [rsi + 64]
mov dword ptr [rbp - 8], eax
mov dword ptr [rdi + 64], eax
vmovdqu ymm0, ymmword ptr [rbp - 72]
vmovdqu ymm1, ymmword ptr [rbp - 40]
vmovdqu ymmword ptr [rdi + 32], ymm1
vmovdqu ymmword ptr [rdi], ymm0
mov rax, rdi
add rsp, 72
pop rbp
ret Is this closer to the clang instructions? |
The referenced issue #48293 has a better explanation of what is happening. |
I was just about to post this issue here, good thing someone else already did. Clang only produces this "good" code for C++, not for C. On reddit people came to the conclusion that this is due to copy elision (in particular return value optimization) that is done in C++, but apparently not in C and Rust. |
In that case, #47954 might help, right? |
This no longer seems to be a problem with the latest versions of both rustc and clang: https://gcc.godbolt.org/z/c4187cno3 |
And looks like this has been the case for quite a while already, since 1.52. Worth mentioning that LLVM intentionally does not use 512-bit vectors here by default. |
Demo: https://godbolt.org/g/vqB6oj
I tried this code:
Compiled it with
rustc --crate-type=lib -C opt-level=3 -C target-cpu=skylake-avx512 --emit asm test.rs
I expected to see this happen:
Instead, this happened:
Meta
The text was updated successfully, but these errors were encountered: