-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_mm256_loadu_si256 only loads 128 bits when compiled with default cargo build --release #52636
Comments
Behavoir on Windows is kind of strange. Debug just runs fine, release errors out
I am not a windows developer, so I have no clue why and what happens :D maybe some register clobbering. Cannot confirm this on playground though :/ Also breaks on my Linux 😮 |
With Level 2:
Level 3:
|
I don't think this analysis is correct, the More importantly, this combined with #50154 is why the store doesn't work: tl;dr duplicate of #50154 |
@rkruppe You're right that this is related to #50154, but based on your comments here, we might have established the true cause of these AVX bugs. My use of
You appear to have I find the idea that registers should be "demoted" to memory regions to fit an ABI circumspect. If the target ABI does not allow for the use of said registers, it would be more ergonomic for the compiler to fail early and often, notifying the user that their target ABI does not support the registers they are attempting to use. |
I don't see how that's relevant to this issue.
I am talking about the LLVM IR and assembly generated, not about the source code. However, it appears we're talking about different machine code: on the playground and in @hellow554's experiments, the body of
while your original post quotes a disassembly that contains only In any case, #50154 would also explain why
They have, not just in the sense required for allocating them to ymm registers and using AVX instructions on them, but yes also in the sense you mean here: they are passed in memory rather than as immediates.
Unfortunately the ABI mismatch problems are real, difficult to solve, we can't very well ignore them, so this is the only feasible approach for the time being. |
@rkruppe The goofy AT&T ASM syntax always trips me up. In Intel-flavor mnemonics,
For some reason, EDIT: It's worth noting that in order to ensure it was the dereference causing issues and not explicitly written memory copying, I rewrote the test program earlier using only The updated program: use std::arch::x86_64;
#[repr(align(32))]
struct BytePair {
load_bytes: [u8; 32],
store_bytes: [u8; 32]
}
fn main() {
let mut byte_pair = BytePair{
load_bytes: [0x0f; 32],
store_bytes: [0; 32]
};
let lb_ptr = byte_pair.load_bytes.as_ptr();
let reg_load = unsafe {
x86_64::_mm256_load_si256(
lb_ptr as *const x86_64::__m256i
)
};
println!("{:?}", reg_load);
let sb_ptr = byte_pair.store_bytes.as_mut_ptr();
unsafe {
x86_64::_mm256_store_si256(sb_ptr as *mut x86_64::__m256i, reg_load);
}
assert_eq!(&byte_pair.load_bytes, &byte_pair.store_bytes);
} EDIT 2: The disassembly of
And the contents of
|
I've confirmed that this is the same bug as #50154, which is the same as upstream LLVM https://bugs.llvm.org/show_bug.cgi?id=37358 as @rkruppe mentioned |
@alexcrichton The LLVM bug report was opened in May 2018 with seemingly no progress since then. In the intervening period, the SIMD features were marked "stable" in general (not just at the API level) and shipped even with this bug present. Are there any plans to address the bug at the |
@djsweet I don't personally know how we could fix this at the rustc level, but it may be good to ping the LLVM issue if you're interested in stirring up activity! |
The issue of passing around SIMD types as values between functions has seen [quite a lot] of [discussion], and although we thought [we fixed it][quite a lot] it [wasn't]! This PR is a change to rustc to, again, try to fix this issue. The fundamental problem here remains the same, if a SIMD vector argument is passed by-value in LLVM's function type, then if the caller and callee disagree on target features a miscompile happens. We solve this by never passing SIMD vectors by-value, but LLVM will still thwart us with its argument promotion pass to promote by-ref SIMD arguments to by-val SIMD arguments. This commit is an attempt to thwart LLVM thwarting us. We, just before codegen, will take yet another look at the LLVM module and demote any by-value SIMD arguments we see. This is a very manual attempt by us to ensure the codegen for a module keeps working, and it unfortunately is likely producing suboptimal code, even in release mode. The saving grace for this, in theory, is that if SIMD types are passed by-value across a boundary in release mode it's pretty unlikely to be performance sensitive (as it's already doing a load/store, and otherwise perf-sensitive bits should be inlined). The implementation here is basically a big wad of C++. It was largely copied from LLVM's own argument promotion pass, only doing the reverse. In local testing this... Closes rust-lang#50154 Closes rust-lang#52636 Closes rust-lang#54583 Closes rust-lang#55059 [quite a lot]: rust-lang#47743 [discussion]: rust-lang#44367 [wasn't]: rust-lang#50154
rustc: Fix (again) simd vectors by-val in ABI The issue of passing around SIMD types as values between functions has seen [quite a lot] of [discussion], and although we thought [we fixed it][quite a lot] it [wasn't]! This PR is a change to rustc to, again, try to fix this issue. The fundamental problem here remains the same, if a SIMD vector argument is passed by-value in LLVM's function type, then if the caller and callee disagree on target features a miscompile happens. We solve this by never passing SIMD vectors by-value, but LLVM will still thwart us with its argument promotion pass to promote by-ref SIMD arguments to by-val SIMD arguments. This commit is an attempt to thwart LLVM thwarting us. We, just before codegen, will take yet another look at the LLVM module and demote any by-value SIMD arguments we see. This is a very manual attempt by us to ensure the codegen for a module keeps working, and it unfortunately is likely producing suboptimal code, even in release mode. The saving grace for this, in theory, is that if SIMD types are passed by-value across a boundary in release mode it's pretty unlikely to be performance sensitive (as it's already doing a load/store, and otherwise perf-sensitive bits should be inlined). The implementation here is basically a big wad of C++. It was largely copied from LLVM's own argument promotion pass, only doing the reverse. In local testing this... Closes #50154 Closes #52636 Closes #54583 Closes #55059 [quite a lot]: #47743 [discussion]: #44367 [wasn't]: #50154
The issue of passing around SIMD types as values between functions has seen [quite a lot] of [discussion], and although we thought [we fixed it][quite a lot] it [wasn't]! This PR is a change to rustc to, again, try to fix this issue. The fundamental problem here remains the same, if a SIMD vector argument is passed by-value in LLVM's function type, then if the caller and callee disagree on target features a miscompile happens. We solve this by never passing SIMD vectors by-value, but LLVM will still thwart us with its argument promotion pass to promote by-ref SIMD arguments to by-val SIMD arguments. This commit is an attempt to thwart LLVM thwarting us. We, just before codegen, will take yet another look at the LLVM module and demote any by-value SIMD arguments we see. This is a very manual attempt by us to ensure the codegen for a module keeps working, and it unfortunately is likely producing suboptimal code, even in release mode. The saving grace for this, in theory, is that if SIMD types are passed by-value across a boundary in release mode it's pretty unlikely to be performance sensitive (as it's already doing a load/store, and otherwise perf-sensitive bits should be inlined). The implementation here is basically a big wad of C++. It was largely copied from LLVM's own argument promotion pass, only doing the reverse. In local testing this... Closes rust-lang#50154 Closes rust-lang#52636 Closes rust-lang#54583 Closes rust-lang#55059 [quite a lot]: rust-lang#47743 [discussion]: rust-lang#44367 [wasn't]: rust-lang#50154
rustc: Fix (again) simd vectors by-val in ABI The issue of passing around SIMD types as values between functions has seen [quite a lot] of [discussion], and although we thought [we fixed it][quite a lot] it [wasn't]! This PR is a change to rustc to, again, try to fix this issue. The fundamental problem here remains the same, if a SIMD vector argument is passed by-value in LLVM's function type, then if the caller and callee disagree on target features a miscompile happens. We solve this by never passing SIMD vectors by-value, but LLVM will still thwart us with its argument promotion pass to promote by-ref SIMD arguments to by-val SIMD arguments. This commit is an attempt to thwart LLVM thwarting us. We, just before codegen, will take yet another look at the LLVM module and demote any by-value SIMD arguments we see. This is a very manual attempt by us to ensure the codegen for a module keeps working, and it unfortunately is likely producing suboptimal code, even in release mode. The saving grace for this, in theory, is that if SIMD types are passed by-value across a boundary in release mode it's pretty unlikely to be performance sensitive (as it's already doing a load/store, and otherwise perf-sensitive bits should be inlined). The implementation here is basically a big wad of C++. It was largely copied from LLVM's own argument promotion pass, only doing the reverse. In local testing this... Closes rust-lang#50154 Closes rust-lang#52636 Closes rust-lang#54583 Closes rust-lang#55059 [quite a lot]: rust-lang#47743 [discussion]: rust-lang#44367 [wasn't]: rust-lang#50154
The issue of passing around SIMD types as values between functions has seen [quite a lot] of [discussion], and although we thought [we fixed it][quite a lot] it [wasn't]! This PR is a change to rustc to, again, try to fix this issue. The fundamental problem here remains the same, if a SIMD vector argument is passed by-value in LLVM's function type, then if the caller and callee disagree on target features a miscompile happens. We solve this by never passing SIMD vectors by-value, but LLVM will still thwart us with its argument promotion pass to promote by-ref SIMD arguments to by-val SIMD arguments. This commit is an attempt to thwart LLVM thwarting us. We, just before codegen, will take yet another look at the LLVM module and demote any by-value SIMD arguments we see. This is a very manual attempt by us to ensure the codegen for a module keeps working, and it unfortunately is likely producing suboptimal code, even in release mode. The saving grace for this, in theory, is that if SIMD types are passed by-value across a boundary in release mode it's pretty unlikely to be performance sensitive (as it's already doing a load/store, and otherwise perf-sensitive bits should be inlined). The implementation here is basically a big wad of C++. It was largely copied from LLVM's own argument promotion pass, only doing the reverse. In local testing this... Closes rust-lang#50154 Closes rust-lang#52636 Closes rust-lang#54583 Closes rust-lang#55059 [quite a lot]: rust-lang#47743 [discussion]: rust-lang#44367 [wasn't]: rust-lang#50154
The issue of passing around SIMD types as values between functions has seen [quite a lot] of [discussion], and although we thought [we fixed it][quite a lot] it [wasn't]! This PR is a change to rustc to, again, try to fix this issue. The fundamental problem here remains the same, if a SIMD vector argument is passed by-value in LLVM's function type, then if the caller and callee disagree on target features a miscompile happens. We solve this by never passing SIMD vectors by-value, but LLVM will still thwart us with its argument promotion pass to promote by-ref SIMD arguments to by-val SIMD arguments. This commit is an attempt to thwart LLVM thwarting us. We, just before codegen, will take yet another look at the LLVM module and demote any by-value SIMD arguments we see. This is a very manual attempt by us to ensure the codegen for a module keeps working, and it unfortunately is likely producing suboptimal code, even in release mode. The saving grace for this, in theory, is that if SIMD types are passed by-value across a boundary in release mode it's pretty unlikely to be performance sensitive (as it's already doing a load/store, and otherwise perf-sensitive bits should be inlined). The implementation here is basically a big wad of C++. It was largely copied from LLVM's own argument promotion pass, only doing the reverse. In local testing this... Closes rust-lang#50154 Closes rust-lang#52636 Closes rust-lang#54583 Closes rust-lang#55059 [quite a lot]: rust-lang#47743 [discussion]: rust-lang#44367 [wasn't]: rust-lang#50154
The AVX2 intrinsic
_mm256_loadu_si256
fully loads all 256 bits from memory into the register when compiled without any optimization, but only loads 128 bits when compiled with the defaultcargo build --release
option. This small program exhibits the issue:When I run
cargo run
, this is the output:However, with
cargo run --release
, this is the output:I am on macOS 10.13.6 with a Core i7 I7-4960HQ, and the output of
rustc --version --verbose
isCuriously, when inspecting the assembly of
main
, the call to_mm256_loadu_si256
is not inlined, but instead generates this function:Note the
vzeroupper
instruction, which clears out the non-XMM registers. This is incorrect behavior,_m256i
requires the full YMM register to be loaded unmodified. A similar spuriousvzeroupper
is also present in the assembly generated for_mm256_storeu_si256
, but after the register is stored into memory.The text was updated successfully, but these errors were encountered: