-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscompilation of AVX2 code under --release #79865
Comments
If this is helpful, this is the code closest to where the problem is occurring: let v0 = _mm256_add_epi64(
_mm256_and_si256(v0, _mm256_set_epi64x(-1, 0x3ffffff, 0x3ffffff, 0x3ffffff)),
_mm256_permute4x64_epi64(
_mm256_srlv_epi64(v0, _mm256_set_epi64x(64, 26, 26, 26)),
set02(2, 1, 0, 3),
),
); |
@rustbot ping llvm |
Hey LLVM ICE-breakers! This bug has been identified as a good cc @camelid @comex @cuviper @DutchGhost @hdhoang @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique |
I found a workaround for this issue which suffices for my purposes, but also hopefully helps track down the bug. The miscompilation was occurring inside of a lambda. I extracted the lambda(s) into named functions and that worked around the problem successfully: |
This is unsound and it impacts cryptographic code, so we decided on Assigning |
meta: I thought |
@cuviper I think |
I've removed I may have inadvertently removed a couple target features specific-code or attributes, or too much of the cpu autodetection, as it now doesn't build with RUSTFLAGS="-C target-feature=+avx2" ? The error is still the same as the OP (and reproduces on the playground) so maybe it is useful to @tarcieri. I haven't looked at the issue much though, and am not sure what to expect here (especially if code using target features is built without them enabled). |
@rustbot ping cleanup Would be nice to get an MCVE for this one. |
Hey Cleanup Crew ICE-breakers! This bug has been identified as a good cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @JamesPatrickGill @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @shekohex @sinato @smmalis37 @steffahn @Stupremee @tamuhey @turboladen @woshilapin @yerke |
Edit: down to 55 lines |
This is the smallest version I can get |
Looks like this still reproduces on nightly, so not fixed by LLVM 12. |
I believe this is an ABI mismatch problem. Argument promotion converts the by-pointer arguments into by-value arguments, so we pass |
This sounds very similar to why we are passing the SIMD arguments by-pointer in the first place – in attempt to avoid this exact kind of ABI mismatch. |
It looks like ArgPromotion does check for ABI compatibility: https://github.com/llvm/llvm-project/blob/237526319cb3a17852a0e732f85f1562e42d73cc/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L843 Either it's not doing that right, or there's some more complex interaction here. |
Currently on both stable and nightly:
At last, consistent behavior between versions. |
This issue shares some of the reason for its problems with #58279 |
This is fixed upstream (cf llvm/llvm-project#52660) and will be pulled in with the LLVM 14 update. |
Fantastic! Thank you! |
This is fixed by the LLVM 14 upgrade on beta/nightly. |
I think we still want a regression test for this issue, right? |
…r=oli-obk Add regression test for issue rust-lang#79865 Closes rust-lang#79865
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#116099 (Add regression test for issue rust-lang#79865) - rust-lang#116102 (Correct codegen of `ConstValue::Indirect` scalar and scalar pair) - rust-lang#116131 (Rename `cold_path` to `outline`) - rust-lang#116144 (subst -> instantiate) - rust-lang#116151 (Fix typo in rustdoc unstable features doc) - rust-lang#116153 (Update books) - rust-lang#116162 (Gate and validate `#[rustc_safe_intrinsic]`) r? `@ghost` `@rustbot` modify labels: rollup
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#116099 (Add regression test for issue rust-lang#79865) - rust-lang#116102 (Correct codegen of `ConstValue::Indirect` scalar and scalar pair) - rust-lang#116131 (Rename `cold_path` to `outline`) - rust-lang#116144 (subst -> instantiate) - rust-lang#116151 (Fix typo in rustdoc unstable features doc) - rust-lang#116153 (Update books) - rust-lang#116162 (Gate and validate `#[rustc_safe_intrinsic]`) r? `@ghost` `@rustbot` modify labels: rollup
…iaskrgr Rollup of 5 pull requests Successful merges: - rust-lang#116099 (Add regression test for issue rust-lang#79865) - rust-lang#116131 (Rename `cold_path` to `outline`) - rust-lang#116151 (Fix typo in rustdoc unstable features doc) - rust-lang#116153 (Update books) - rust-lang#116162 (Gate and validate `#[rustc_safe_intrinsic]`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#116099 - eduardosm:issue-79865-regression, r=oli-obk Add regression test for issue rust-lang#79865 Closes rust-lang#79865
Apologies for not having a minimal reproduction, but this was an extremely difficult bug to even isolate occurring inside some complicated AVX2 code.
The bug is causing the wrong values to be computed. Whether or not it occurs depends on the following conditions:
target-cpu
unset, the bug does NOT occur in debug builds, but DOES occur with--release
target-cpu=haswell
, the bug does NOT occur in--release
builds and both debug and release builds are OKI can attempt to further isolate and reduce the problem, but there's a lot of spooky-action-at-a-distance happening making that rather difficult.
For now, here is the best reproduction I can provide:EDIT: I've deleted the
poly1305/avx2-bug
branch as there is now a much smaller repro, but so long as GitHub hasn't GC'd it here's the original commit:RustCrypto/universal-hashes@7485010
NOTE: if you
git show
from here, I've included lots of notes in the latest commit about the bug in the commit message. The commit also contains comments indicating lines you can comment or uncomment to make the tests succeed or fail.Commands to run which DON'T trigger the bug
cargo test donna_self_test1 -- --nocapture
RUSTFLAGS="-Ctarget-cpu=haswell" cargo test donna_self_test1 --release -- --nocapture
Commands to run which DO trigger the bug
NOTE: as this is a bug in the AVX2 backend, you'll need to run it on an AVX2-capable host to trigger the bug.
cargo test donna_self_test1 --release -- --nocapture
This test fails with a miscomputed result (as do all of the tests across the board if you run the whole suite):
Things which mysteriously make the tests pass
The aforementioned
cargo test ... --release ...
will pass if any of the following things which are documented in the 74850109 commit (git show
) message and comments introduced in that commit are changed:dbg!
statement near the first observation of the miscompilation is uncommented (heisenbug!)#[target_feature(enable = "avx2")]
attribute on thefinalize
function is commented out. This function is in a completely different module, hence my descriptions of "spooky action at a distance" (the function in which the bug is occurring is annotated#[inline(always)]
, but the bug still occurs if that attribute is commented out)Meta
This bug is easily reproducible and occurs on all versions of the Rust compiler and all operating systems I've tried. I've reproduced it locally on macOS and it also occurred on Linux/Ubuntu via GitHub Actions.
Here are some compiler versions I've tried:
Latest nightly as of opening this ticket:
It also broke in CI which tests it under the MSRV of 1.41.0.
The text was updated successfully, but these errors were encountered: