Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probe-stack=inline-asm generates wrong DWARF information #83139

Closed
YangKeao opened this issue Mar 15, 2021 · 13 comments
Closed

probe-stack=inline-asm generates wrong DWARF information #83139

YangKeao opened this issue Mar 15, 2021 · 13 comments
Assignees
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@YangKeao
Copy link

I'm the author of pprof-rs, which is a library to perf rust programs by sampling backtraces in signal handler. After #77885 merged, it causes Segmentation Fault. After investigating this bug, I found that it was caused by wrong DWARF information. In the old version of rustc, __rust_probstack doesn't have DWARF information, so when a signal hits __rust_probstack, _Unwind_Backtrace will get a short backtrace (only containing the signal handler parts) but not break. However, after using inline-asm with a wrong DWARF information, _Unwind_Backtrace will get a wrong context->ra and dereferencing it will cause a Segmentation Fault

I tried this code:

#[inline(never)]
fn big_stack() {
    let _two_page_stack = [2u8; 8192];
    return
}

I expected to see this happen: The dwarf information of this function works well

Instead, this happened: In the probe-stack part of codes, the dwarf information is wrong and _Unwind_Backtrace will cause Segmentation Fault by dereferencing an unexpected address (actually, the address is "0x0202020202020202").

Part of big_stack assembly codes:

349:00000000000f42f0 <_ZN9big_stack9big_stack17h672b13eba3b25c0dE>:
350-   f42f0:	48 81 ec 00 10 00 00 	sub    $0x1000,%rsp
351-   f42f7:	48 c7 04 24 00 00 00 	movq   $0x0,(%rsp)
352-   f42fe:	00 
353-   f42ff:	48 81 ec 00 10 00 00 	sub    $0x1000,%rsp
354-   f4306:	48 c7 04 24 00 00 00 	movq   $0x0,(%rsp)
355-   f430d:	00 
356-   f430e:	48 81 ec 08 00 00 00 	sub    $0x8,%rsp
357-   f4315:	48 8d 7c 24 08       	lea    0x8(%rsp),%rdi
358-   f431a:	be 01 00 00 00       	mov    $0x1,%esi
359-   f431f:	ba 00 20 00 00       	mov    $0x2000,%edx

The DWARF information about this part is quite simple:

00000138 00000014 0000010c FDE cie=00000030 pc=0004ceb0...0004cef1
  Format:       DWARF32
  DW_CFA_advance_loc: 37
  DW_CFA_def_cfa_offset: +8208
  DW_CFA_advance_loc: 27
  DW_CFA_def_cfa_offset: +8

Though, the probe-stack part of this codes modified the rsp, the corresponding DWARF information doesn't exist.

Should I also file an issue in LLVM bugzilla? It seems to be a problem of LLVM. (The simplest way to fix it in my imagination is to use another register to loop over the stack, but not using rsp directly.)

Meta

rustc --version --verbose:

rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)
binary: rustc
commit-hash: 1d0d76f8dd4f5f6ecbeab575b87edaf1c9f56bb8
commit-date: 2021-01-24
host: x86_64-unknown-linux-gnu
release: 1.51.0-nightly
LLVM version: 11.0.1
@YangKeao YangKeao added the C-bug Category: This is a bug. label Mar 15, 2021
@jonas-schievink jonas-schievink added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. regression-untriaged Untriaged performance or correctness regression. labels Mar 15, 2021
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 15, 2021
@JohnTitor JohnTitor added regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed regression-untriaged Untriaged performance or correctness regression. labels Mar 15, 2021
@nagisa
Copy link
Member

nagisa commented Mar 15, 2021

Please check with a more recent nightly version. I mildly recall hearing a similar issue which might have been fixed since.

@nagisa
Copy link
Member

nagisa commented Mar 16, 2021

Upstream report: https://bugs.llvm.org/show_bug.cgi?id=49600

@YangKeao
Copy link
Author

Please check with a more recent nightly version. I mildly recall hearing a similar issue which might have been fixed since.

Tried with rustc 1.52.0-nightly (107896c32 2021-03-15), and it's not fixed yet.

@YangKeao
Copy link
Author

I've tried to fix this issue in llvm with this patch. After building rustc with this patch, it seems to be fine and the segmentation fault is also solved. I'll test it with bigger project (like tikv-server) tomorrow.

@apiraino
Copy link
Contributor

Assigning P-critical as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

Will be mentioned during today's meeting.

@rustbot label -I-prioritize +P-critical

@rustbot rustbot added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 18, 2021
@pnkfelix pnkfelix self-assigned this Mar 18, 2021
@pnkfelix
Copy link
Member

discussed in T-compiler meeting.

We plan to revert PR #77885 (or at least a portion of it) on beta before next week's release, in order to buy us more time to evaluate tradeoffs here.

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 23, 2021
…7885-for-issue-83139, r=nagisa

[stable] probe-stack=call everywhere again, for now.

To buy time on issue 83139, revert effect of PR 77885: We will not conditionally
enable probe-stack=inline-asm on LLVM 11+ anymore on any of our targets that
opted into doing so on PR rust-lang#77885 (and were subsequently configured to do so in a
fine grained manner on PR rust-lang#80838).

After we resolve 83139 (potentially by backporting a fix to LLVM, or potentially
by deciding that one cannot rely on the quality of our DWARF output in the
manner described in issue 83139), we can change this back.

cc rust-lang#83139
@apiraino
Copy link
Contributor

Issue priority downgraded as discussed during the team compiler meeting

@rustbot label +P-high -P-critical

@rustbot rustbot added P-high High priority and removed P-critical Critical priority labels Mar 25, 2021
@pnkfelix
Copy link
Member

pnkfelix commented Apr 8, 2021

discussed at team compiler meeting. A partial solution, addressing solely the case where the loop is unrolled, has landed in LLVM in https://reviews.llvm.org/D99579

We would probably be in favor of backporting D99579, even though it is only a partial solution to the problem here.

nagisa added a commit to nagisa/rust that referenced this issue Apr 17, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 17, 2021
Include a backport for wrong DWARF information

A partial fix for rust-lang#83139

r? `@cuviper` or `@nikic`
This was referenced Apr 29, 2021
bors pushed a commit to rust-lang-ci/rust that referenced this issue Apr 29, 2021
…in, for now.

We had already reverted the change on stable back in PR rust-lang#83412.

Since then, we've had some movement on issue rust-lang#83139, but not a 100% fix.

But also since then, we had bug reported, issue rust-lang#84667, that looks like outright
codegen breakage, rather than problems confined to debuginfo issues.

So we are reverting PR rust-lang#77885 on stable and beta. We'll reland PR rust-lang#77885 (or some
variant) switching back to an LLVM-dependent selection of out-of-line call vs
inline-asm, after these other issues have been resolved.
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 29, 2021
…ark-Simulacrum

Revert PR 77885 everywhere

Change to probe-stack=call (instead of inline-or-call) everywhere again, for now.

We had already reverted the change on stable back in PR rust-lang#83412.

Since then, we've had some movement on issue rust-lang#83139, but not a 100% fix.

But also since then, we had bug reported, issue rust-lang#84667, that looks like outright codegen breakage, rather than problems confined to debuginfo issues.    So we are reverting PR rust-lang#77885 on stable and beta. We'll reland PR rust-lang#77885 (or some    variant) switching back to an LLVM-dependent selection of out-of-line call vs    inline-asm, after these other issues have been resolved.
@pnkfelix
Copy link
Member

Discussed in T-compiler meeting today.

Due to this issue, and issue #84667, we are reverting PR #77885 on master and beta.

I will look into making regression tests for these issues.

@pietroalbini
Copy link
Member

A fix for this is available from 1.53 onwards (nightly and soon enough beta), and it's been backported to 1.52 stable. Closing the issue. If this is still happening please leave a comment and we'll reopen it!

@erikdesjardins
Copy link
Contributor

erikdesjardins commented Jan 11, 2022

Note: the underlying issue has been fixed upstream, see llvm/llvm-project#48944 (comment).

(Once LLVM 14 is released with those changes, I will revisit enabling inline asm stack probes on x86.)

Edit: actually #84667 / https://bugs.llvm.org/show_bug.cgi?id=50165 / llvm/llvm-project#49509 is still a problem

@bstrie
Copy link
Contributor

bstrie commented Mar 23, 2022

@erikdesjardins You say that #84667 is still a problem, but that issue is closed. Should it be reopened?

@erikdesjardins
Copy link
Contributor

I should've been clearer: #84667 occurred due to a bug in LLVM codegen related to inline-asm stack probes. It is not a problem for current-day Rust users, because we disabled inline-asm stack probes. But the underlying LLVM bug still exists, which prevents us from enabling inline-asm stack probes.

(I do not know if this means it should be reopened, presumably not, but hopefully this gives you enough to context to tell)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants