-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Regressions with large (2-4GB) stack arrays on large stacks #83060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This regressed in nightly-2021-01-17, which is bc39d4d...8a65184. I guess it's #77885. |
@hyd-dev Just to confirm, it looks like the commits you tracked this down to are indeed not included in 1.50.0 stable. So that may potentially be the cause of the regression from stable to nightly. Separate from that, there's an apparent soundness bug in 1.50.0 stable. It may be that the change in nightly is masking the soundness issue in stable. |
Does this attempt to remove |
@nagisa Yes, that looks like the same wrongness. That example may not capture all the cases that mishandle large stack offsets, but it captures some of them, enough so to serve as a reduced testcase for at least part of the problem. |
Also, looking more closely at the generated code, I realized that for stable-debug, diffing 1gb to 2gb would be more helpful than diffing 2gb to 4gb, even though 2gb appears to "work". 1gb has large positive offsets that may make sense. 2gb has large negative offsets that look wrapped. 4gb has tiny positive offsets that look like they've been truncated. The overflow point appears to be just under the 2GB mark. Here's a diff close to the wrap point. (Note that tweaking the values to be closer together may result in a subset of the values wrapping, because they're at different stack offsets.) |
Yeah, 2GB case appears to be where the issues begin cropping up. It is taking addresses to stack slots that are way below the already adjusted down stack. |
Upstream report: https://bugs.llvm.org/show_bug.cgi?id=49567 |
Let's see the follow-up to the upstream issue report. Assigning |
The cause of this regression has been reverted on stable 1.52, beta 1.53 and nightly 1.54. |
@pietroalbini I don't think this should be closed:
The soundness issue ("prints a random different value every time (not 123)") still exists, AFAIK. |
Thought: could we take an upper-bound estimate of the stack size used by each monomorphisation of a function, and then abort compilation saying we don't support stack frames that large if it's more than 2GB? (unsized locals might make that difficult, not sure.). Not sure if 2GB is the right size on all supported platforms too. It wouldn't fix the issue but it would fix it being a soundness issue. |
I ran into a variation of this issue today, while trying to safely construct a large struct on the heap. On release builds, use std::{hint, mem::MaybeUninit};
#[repr(C)]
struct Big {
x: u64,
y: u64,
pad: MaybeUninit<[u8; 0xffff_fff0]>,
z: u64,
}
#[inline(never)]
fn broken(out: fn(&u64)) {
let big = Box::new(Big {
x: 1,
y: 2,
pad: MaybeUninit::uninit(),
z: 3,
});
out(&big.y);
}
fn main() {
let out: fn(&u64) = |y| println!("{y}");
broken(hint::black_box(out));
} |
Discussed during T-compiler 2023 Q1 P-high review, as part of visiting #100914 We think the right answer is to:
@rustbot label: +P-high -P-medium |
@rustbot label: +E-help-wanted +E-mentor +E-medium pnkfelix is willing to mentor at least the diagnostic addition part of this. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
Assigning myself to work on the LLVM part of this issue. |
I had an open question on this on how to fix the CI failure. If someone can help with that, I can finish this up soon. |
The LLVM part should be fixed by llvm/llvm-project#101840 |
In the course of responding to someone's question, I was experimenting with large (1-4 GB) arrays on the stack, using threads with large stacks created using
std::thread::Builder::stack_size
.Code
Here's the baseline test program I used:
This creates a thread with a 5GB stack, allocates a 4GB array on it, sets one value in that array, reads that value back, and prints it. It should always print
123
.Call this the 4GB variation. I also tested 2GB and 1GB variations, by changing both references to
CAP
toCAP>>1
andCAP>>2
respectively. (I could have changed the constant itself rather than dividing it, but I'm preserving exactly the code I used in case it affects the result.)Rust version info
I tested two versions of Rust: stable 1.50 (
rustc 1.50.0 (cb75ad5db 2021-02-10)
) and nightly (rustc 1.52.0-nightly (4a8b6f708 2021-03-11)
).stable `rustc --version --verbose`
nightly `rustc --version --verbose`
Results
I tested each of these three variations (1GB, 2GB, 4GB) on stable-debug (
cargo run
), stable-release (cargo run --release
), and nightly-release (cargo +nightly run --release
), and got three different results.On stable-release, all three variations worked as expected, printing
123
.On stable in debug mode, it works as expected at 1GB and 2GB, but the 4GB version prints a random different value every time (not 123). This seems likely to be a soundness issue, so I'm labeling this I-unsound.
On nightly in release mode, the 1GB version works as expected, but the 2GB and 4GB versions both run for an unexpectedly longer time (many seconds) and then get killed by the Linux OOM killer. This is a regression from stable to nightly, so I'm labeling this accordingly.
I don't know if these come from one or multiple underlying issues. I'm reporting all the details here, but this may need to be split into multiple issues once analyzed further.
I dumped the generated code of all nine builds using
objdump -d
, and I'm attaching those. The diffs between working and non-working versions seem unusual and potentially relevant.(I recommend filtering the objdumps through
sed 's/anon[0-9a-f.]*llvm[0-9a-f.]*/anon-ELIDED-llvm-ELIDED/'
to reduce spurious diff noise; that doesn't eliminate differences in code addresses, but it does eliminate differences in anonymous LLVM symbol names.)objdump-d-stable-debug-1gb.txt
objdump-d-stable-debug-2gb.txt
objdump-d-stable-debug-4gb.txt
objdump-d-stable-release-1gb.txt
objdump-d-stable-release-2gb.txt
objdump-d-stable-release-4gb.txt
objdump-d-nightly-release-1gb.txt
objdump-d-nightly-release-2gb.txt
objdump-d-nightly-release-4gb.txt
The diff from nightly-release-1gb to nightly-release-2gb shows what look like signs of incorrect sign extension of large values. Note how
sub $0x40000000,%r11
has becomesub $0xffffffff80000000,%r11
. Also note in the stack cleanup at the end thatadd $0x40000040,%rsp
has become a two-stepadd $0x7fffffff,%rsp
thenadd $0x41,%rsp
; looks like something in the cleanup broke the large offset into two steps (which isn't necessarily a problem but seems notable).highlights from diff of nightly-release-1gb to nightly-release-2gb
The diff from nightly-release-2gb to nightly-release-4gb shows further signs of incorrect handling of large values; note how
sub $0xffffffff80000000,%r11
has now becomesub $0x0,%r11
. Also note in the cleanup that the two-step add has now become a three-step add (add $0x7fffffff,%rsp
twice thenadd $0x42,%rsp
). I tested, and if I build code that uses a much larger 200GB array, LLVM does generate amovabs
into a register and then adds that register. So it looks like the stack cleanup code is correct with large stack arrays, but there's something wrong with the stack setup code and stack offset code.highlights from diff of nightly-release-2gb to nightly-release-4gb
The diff from stable-debug-2gb to stable-debug-4gb is much noisier due to code addresses, but filtering out code-address-related differences, I want to highlight a portion of the diff that again looks like issues with incorrect handling of large values. Note the changes from large negative stack offsets like
-0x7fffff98(%rsp)
and-0x7fffff90(%rsp)
to small positive stack offsets like0x68(%rsp)
and0x70(%rsp)
. The latter look like they should be larger negative values, but they wrapped. I'm wondering if something has treated these stack offsets as 32-bit values.highlights from diff of stable-debug-2gb to stable-debug-4gb
The text was updated successfully, but these errors were encountered: