Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate how LLVM deals with huge stack frames #18072

Open
arielb1 opened this issue Oct 15, 2014 · 8 comments
Open

Investigate how LLVM deals with huge stack frames #18072

arielb1 opened this issue Oct 15, 2014 · 8 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Oct 15, 2014

According to @thestinger, LLVM has UB when dealing with stack frames that have size of the order of magnitude of the address space. Investigate this and add the needed fixes.

This is related to #17913 and not dealt with in #18041.

@kmcallister kmcallister added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-codegen Area: Code generation I-wrong labels Oct 16, 2014
@steveklabnik
Copy link
Member

Triage: I'm not aware of anyone having done any investigation since.

@brson brson added P-low Low priority I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Jun 1, 2017
@bstrie
Copy link
Contributor

bstrie commented Jun 2, 2017

Is any of this actionable? Is the UB documented somewhere, or described by a bug report? Given that this bug is going on three years old, it's also possible that this isn't relevant anymore. Can someone demonstrate this with a test case? If not, I'd consider closing this.

@Mark-Simulacrum Mark-Simulacrum added C-bug Category: This is a bug. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. I-wrong labels Jul 22, 2017
@bstrie
Copy link
Contributor

bstrie commented Nov 17, 2017

It's been three years since this was filed and we still have no clue what this is referring to, no reproduction, and no way to know if this hasn't already been long since fixed on LLVM's side. Closing this, but feel free to reopen if anyone can find a relevant LLVM bug (though #45839 (comment) has set the precedent of "It doesn't make sense to have a bug for every LLVM bug", so use your discretion).

@bstrie bstrie closed this as completed Nov 17, 2017
@sunfishcode
Copy link
Member

I'm not aware of any specific undefined behavior, though there could well be bugs. Here's an experiment:

pub fn bar(foo: &Fn(&[u8], &[u8], &[u8], &[u8], &[u8], &[u8])) {
    let a = [0u8; 0x7fff_ffff];
    let b = [0u8; 0x7fff_ffff];
    let c = [0u8; 0x7fff_ffff];
    let d = [0u8; 0x7fff_ffff];
    let e = [0u8; 0x7fff_ffff];
    let f = [0u8; 0x7fff_ffff];
    foo(&a, &b, &c, &d, &e, &f);
}

Compile with rustc --crate-type rlib --emit asm -O with a 32-bit x86 target gets this prologue:

        pushl   %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        movl    $12884901884, %eax
        calll   __rust_probestack
        subl    %eax, %esp

12884901884 is the size we need, but this isn't valid 32-bit x86. llvm-mc silently wraps it, which is also what happens when LLVM uses it in integrated-as mode. GAS also wraps it, though at least gives a warning:

t.s:21: Warning: 00000002fffffffc shortened to 00000000fffffffc

I've now filed https://bugs.llvm.org/show_bug.cgi?id=35345 concerning this.

Since this code is entirely pathological, it seems reasonable to have compilers just reject it. I assume the fix for the above should be to have LLVM use report_fatal_error. I don't know when that bug will be fixed though, so it may be desirable to issue an error based on a conservative approximation before LLVM as well.

If Rust ever has the ability to emit dynamic-sized allocas, that may introduce other concerns not covered here.

@bstrie bstrie reopened this Nov 17, 2017
@bstrie
Copy link
Contributor

bstrie commented Nov 17, 2017

@sunfishcode Thank you for investigating! I've reopened this for the time being, now that we have something to actually go on.

As for alloca, you may be interested in this RFC: rust-lang/rfcs#1909 , which is currently in its final comment period, and your comments would be very welcome. :)

@Elinvynia Elinvynia added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@LeSeulArtichaut LeSeulArtichaut removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020
@lahwran
Copy link

lahwran commented Feb 19, 2021

A few questions I have on reading this issue and discussion:

  1. Will MIRI detect this issue now? if not, should it?
  2. has this ever been encountered in practice, or what code would be most likely to accidentally generate this situation?
  3. can valid, useful code ever generate this situation, or does encountering this situation necessarily entail invalid code? I don't think it makes sense to have a stack frame greater than the size of your address space, unless I've missed something :)
  4. how can we establish a guarantee around what behavior in LLVM caused the unsoundness? eg if I was going to understand this issue through its entire compilation pipeline as a motivated but new contributor, what test setup do I need and how do I ensure I have a full taxonomy of relevant branches in llvm's internals that affect its understanding of its input?
    -> I'm guessing this is going to be eg a full build of rustc+llvm with debug symbols and then step through to find the place where code is generated, then look that up in the llvm source and figure out why the invalid output isn't prevented. Is there an easier version of that, eg does anyone know where to start reading?

@RalfJung
Copy link
Member

I don't think Miri can do anything here; Miri doesn't really have the concept of a stack frame with a given "size". Similar to "protect against stack and heap overlapping", this is below the level of abstraction that Miri works on, and needs to be handled by the codegen backend.

If Rust ever has the ability to emit dynamic-sized allocas, that may introduce other concerns not covered here.

Note that Rust nowadays has that ability (for the unstable unsized_locals feature).

Cc @rust-lang/wg-llvm

@workingjubilee
Copy link
Member

  1. can valid, useful code ever generate this situation, or does encountering this situation necessarily entail invalid code? I don't think it makes sense to have a stack frame greater than the size of your address space, unless I've missed something :)

This situation can be generated by safe Rust code. In that sense, while it a priori does not make sense, we should error here, not miscompile. That is, if there is a problem, it is not in trying to allocate more space than our address space can contain, it is that we may fail to try to allocate too much space and crash the program at this point, instead getting some sort of incorrect behavior after silently wrapping that allows the program to continue to function while thinking its previous expressions were satisfied, thus accessing wrong locations that may somehow still be mapped into memory.

In practice, what I just described is a very unlikely sequence of events, but still: if a Rust program expresses a desire to go to a hell of its own making, we are honor-bound to send it there.

  1. how can we establish a guarantee around what behavior in LLVM caused the unsoundness? eg if I was going to understand this issue through its entire compilation pipeline as a motivated but new contributor, what test setup do I need and how do I ensure I have a full taxonomy of relevant branches in llvm's internals that affect its understanding of its input?

Yeah you nailed it pretty much, there. That would be the way to get full understanding.
However I believe that examining how we build code and thus emit LLVMIR, via our bindings to codegen backends and in particular LLVM, between rustc_codegen_ssa, rustc_codegen_llvm, and in particular rustc_llvm, may reveal a spot in pure Rust control flow where we would attempt to emit a request for nonsense code and allow us to block it there.

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests