Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop implementations not being called on panic with Fat LTO. #64655

Closed
cstorey opened this issue Sep 21, 2019 · 10 comments · Fixed by #65020
Closed

Drop implementations not being called on panic with Fat LTO. #64655

cstorey opened this issue Sep 21, 2019 · 10 comments · Fixed by #65020
Assignees
Labels
A-destructors Area: Destructors (`Drop`, …) C-bug Category: This is a bug. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@cstorey
Copy link

cstorey commented Sep 21, 2019

Hi,

It looks like calls to Drop::drop() are getting removed when using lto = "fat" for a release build. It doesn't seem to occur with lto = "thin" or lto = "no".

I've got a test case here: https://github.com/cstorey/fat-lto-drop-repro, which can be reproduced like this:

: cez@ceri-no-mini; cargo --version; rustc --version; cargo run; cargo run --release
cargo 1.37.0 (9edd08916 2019-08-02)
rustc 1.37.0 (eae3437df 2019-08-13)
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/lto-drop`
About to panic
thread 'main' panicked at '???', src/libcore/option.rs:1034:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Dropping a Droppable
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/lto-drop`
About to panic
thread 'main' panicked at '???', src/libcore/option.rs:1034:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
: cez@ceri-no-mini;

(more details in the transcript)

For context, I first noticed this in a project where I am using an mpsc::channel to communciate between threads, where a producer thread reads messages from the network, and the main thread saves those to a database.

I was relying on the fact that the Drop implementation for a channel will in effect close the channel, so the main thread can find out that the producer has failed, and so we should kill the process. What I saw instead was the producer thread panicing, but the main thread would never get woken up.

I've seen this on macOS Darwin 17.7.0 x86_64 and Linux 5.0.0-25-generic #26-Ubuntu SMP Thu Aug 1 12:04:58 UTC 2019 x86_64.

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-destructors Area: Destructors (`Drop`, …) labels Sep 21, 2019
@pnkfelix
Copy link
Member

triage: P-high. Assigning to self. Removing nomination label.

@pnkfelix pnkfelix self-assigned this Sep 26, 2019
@pnkfelix pnkfelix added P-high High priority and removed I-nominated labels Sep 26, 2019
@pnkfelix
Copy link
Member

Doing some bisection. This was introduced between Rust 1.32 and 1.33

@pnkfelix pnkfelix added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Sep 30, 2019
@pnkfelix
Copy link
Member

pnkfelix commented Sep 30, 2019

More specifically, this was introduced between nightly-2018-12-08 and nightly-2018-12-14:

% rm -f main && rustc +nightly-2018-12-08 -C opt-level=1 -C lto=fat fat-lto-drop-repro/src/main.rs  && ./main
rm -f main && rustc +nightly-2018-12-08 -C opt-level=1 -C lto=fat fat-lto-drop-repro/src/main.rs  && ./main
About to panic
thread 'main' panicked at '???', src/libcore/option.rs:1008:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Dropping a Droppable
% rm -f main && rustc +nightly-2018-12-14 -C opt-level=1 -C lto=fat fat-lto-drop-repro/src/main.rs  && ./main
rm -f main && rustc +nightly-2018-12-14 -C opt-level=1 -C lto=fat fat-lto-drop-repro/src/main.rs  && ./main
About to panic
thread 'main' panicked at '???', src/libcore/option.rs:1008:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
% rustc +nightly-2018-12-08 --version
rustc +nightly-2018-12-08 --version
rustc 1.32.0-nightly (4a45578bc 2018-12-07)
% rustc +nightly-2018-12-14 --version
rustc +nightly-2018-12-14 --version
rustc 1.32.0-nightly (f4a421ee3 2018-12-13)
% 

(unfortunately no nightlies are available between those two points in the rustup archive.)

@pnkfelix
Copy link
Member

These are the PR's that bors merged between those two points:

% git log --author=bors --format=oneline 4a45578bc..f4a421ee3
f4a421ee3cf1259f0750ac7fabd19da1d8551e4c Auto merge of #56783 - alexcrichton:pinentry-mode, r=Mark-Simulacrum
7489ee9c6f92050a12a3a3097df0a7d3737d82ec Auto merge of #56461 - oli-obk:alloc_ids, r=RalfJung
9fe5cb5342244a716055fa0162e795deabd4985c Auto merge of #56161 - RalfJung:vecdeque-stacked-borrows, r=SimonSapin
ced7cc5c6523ff478599ed9188df37e91fd96c68 Auto merge of #56090 - nnethercote:filesearch, r=eddyb
2f35a1016b0c0cc1132c19875dcd88d7b2825eae Auto merge of #55982 - alexcrichton:panic-extern-abort, r=zackmdavis
0076f58d5333f24f709aa46b4bad760ffb51b9b0 Auto merge of #55992 - cramertj:pin-docs, r=alexcrichton
dd8fc7dc06dea00afbd365468cf4804f68a3531c Auto merge of #56735 - Mark-Simulacrum:fix-sign, r=alexcrichton
bd47d6825bf4090517549d33cfef10d3300b4a75 Auto merge of #56092 - alexcrichton:no-more-std-subodules, r=Mark-Simulacrum
ae3833db3b0acd5d7f06cad333e61e21e39be295 Auto merge of #56039 - ljedrz:sorted_map_upgrades, r=matthewjasper
a64cdec1b48b0d042e5f0e38634a7c438c104b85 Auto merge of #56010 - euclio:intra-doc-spans, r=QuietMisdreavus
8375ab4ff43474c73e3572c2b226560f8cc8e695 Auto merge of #53497 - fukatani:test-debuginfo-function-call, r=tromey
3499575282b5cda1e98220baae4f6c87e1863926 Auto merge of #56243 - RalfJung:test-deterministic, r=alexcrichton
3a3121337122637fa11f0e5d42aec67551e8c125 Auto merge of #56703 - alexcrichton:fix-tools, r=Mark-Simulacrum
4c0116e13ffd4b84e6691cd3b1f09269c4e76728 Auto merge of #56627 - alexcrichton:update-cargo, r=alexcrichton
da1527cb06c7245f83ca51903ea2a27631820215 Auto merge of #56688 - GuillaumeGomez:rollup, r=GuillaumeGomez
1137d29d5e551e377579c5a601fe7c444057d00c Auto merge of #56666 - Xanewok:rustfmt, r=kennytm
3a75e80557a103497cffbcab395a2f37061a77ea Auto merge of #56157 - RalfJung:park, r=nagisa
9567a1cf5993d46c00ee2f2b363f3eabe90b2a0e Auto merge of #56624 - RalfJung:miri, r=oli-obk
286dc37d1bd30ecd419e889c7f3888575deac5fc Auto merge of #56369 - nnethercote:rm-Delimited, r=petrochenkov
e2c329c72c3d764423c3909c7483cf2fd6659626 Auto merge of #56269 - nnethercote:_match-Matrix-SmallVec, r=simulacrum
9cb38a84e7d593106946cae8e25d9cdbf24751ee Auto merge of #56463 - ljedrz:slice_concat_join, r=nikic
b755501043d5b27b39f94bcadd57c8d5dedfd6ba Auto merge of #56444 - petrochenkov:uifull, r=davidtwco
850fc6a4791b3b0ab668f9fb67c35dddd838b01f Auto merge of #56644 - jens1o:patch-1, r=pietroalbini
ea007c6b10e728de9dfc1fe78b8e0aed4f08f1ab Auto merge of #56631 - matthiaskrgr:clippy, r=nikic
b7da2c6e12a6b2a2343e2f1e66fe4f6a1ad55463 Auto merge of #56630 - sinkuu:core_iter, r=kennytm
d7a9d961c349bb7826ece463e51f10667a9fb851 Auto merge of #56615 - integer32llc:update-book, r=GuillaumeGomez
8db23425a345381ec5f8d40e582c119a06b8cb3d Auto merge of #56616 - estebank:issue-56539, r=davidtwco
bdef56a3245460d1fb28c3b4d8ac4b17373c66ce Auto merge of #56632 - Eijebong:synup, r=Mark-Simulacrum
9772d02774534aa4ccd0b328364403d5b6cda1d0 Auto merge of #56623 - Centril:rollup, r=Centril
1ccb5b219d50b1bc96dbb85e82a8473f16422582 Auto merge of #56583 - RalfJung:vergen, r=oli-obk
059e6a6f57f4e80d527a3cd8a8afe7f51f01af8e Auto merge of #56578 - alexreg:cosmetic-1, r=alexreg
0a7798079608b4ff014471ae64b6c8201aa59cdf Auto merge of #56258 - euclio:fs-read-write, r=euclio

@pnkfelix
Copy link
Member

According to my bisection via local builds, this was injected by #55982 (!)

@pnkfelix
Copy link
Member

pnkfelix commented Sep 30, 2019

(Have I mentioned before that we probably need to be doing more testing of the various LTO configurations? Would it be feasible to run our test suite in all of the lto modes from {lto=no, lto=thin, lto=fat} ?)

@pnkfelix
Copy link
Member

pnkfelix commented Oct 1, 2019

For what its worth, @RalfJung 's PR #63909 fixes this

  • (I think; I need to double-check atop a rebased-on-master version).
  • yes, confirmed now atop rebased version

@cstorey
Copy link
Author

cstorey commented Oct 1, 2019

Thanks for tracking this down @pnkfelix!

@pnkfelix
Copy link
Member

pnkfelix commented Oct 2, 2019

Okay I added some instrumentation to the compiler and I think I know where the problem here is coming from.

Here's the executive summary:

I myself was confused by this comment, because it looked to me like the PR as written was taking pains to return unwind=true for abis that match Abi::Rust and Abi::RustCall.


So why is the test crate in this issue illustrating the problem outlined by @RalfJung above?

First of all, here is a standalone main.rs that reproduces the problem for me. (You still have to turn on LTO=fat to observe the issue, but no other optimizations; e.g. -C opt-level=0 -C lto=fat still reproduces the issue for me):

#![feature(core_panic)]
#![no_std]
extern crate std;

struct Droppable;

impl Drop for Droppable {
    fn drop(&mut self) {
        let msg = "Dropping a Droppable\n";
        extern "C" { fn putchar(b: u8); }
        for c in msg.chars() {
            unsafe { putchar(c as u8); }
        }
    }
}

fn main() {
    let _guard = Droppable;
    core::panicking::panic(&("???", "downstream.rs", 17, 4))
}

Running the compiler with -C lto=fat -C save-temps gave me intermediate .bc files that I could disassemble via llvm-dis and then inspect the resulting .ll files. This led me to notice that some functions had nounwind on them that I did not expect.

Second, I instrumented a local rustc to report (via println!) whenever it used either of these two cases:

if cx.tcx.is_foreign_item(id) {
// Foreign items like `extern "C" { fn foo(); }` are assumed not to
// unwind
false
} else if sig.abi != Abi::Rust && sig.abi != Abi::RustCall {
// Any items defined in Rust that *don't* have the `extern` ABI are
// defined to not unwind. We insert shims to abort if an unwind
// happens to enforce this.
false
} else {

Just imagine the above, with two inserted println! calls that look like:

        println!("choosing unwind=false for foreign item {:?} with sig: {:?}", id, sig);

and

        println!("choosing unwind=false for non-foreign {:?} with sig: {:?}", id, sig);

And then I reviewed the output this spits out during both the stage1 bootstrapping and the actual compilation of the test case above.

Here's the smoking gun from that output, I think:

choosing unwind=false for foreign item DefId(0:3120 ~ core[1ff9]::panicking[0]::panic_fmt[0]::[0]::panic_impl[0]) with sig: ([&panic::PanicInfo]; c_variadic: false)->!
  • Unfortunately, the fmt::Debug impl for FnSig does not include the abi (nor the unsafety). I'm guessing we'd see "Rust" there if that were included.

namely, we are going through that first branch, already mentioned by @RalfJung , when we compile this declaration:

// NOTE This function never crosses the FFI boundary; it's a Rust-to-Rust call
extern "Rust" {
#[lang = "panic_impl"]
fn panic_impl(pi: &PanicInfo<'_>) -> !;
}

This causes the generated declaration for panic_impl to be tagged nounwind, even though that function is pretty much the function that is going to unwind.

(But you need to have a somewhat nasty combination of factors to actually observe the optimization performed by LLVM in this case, which is how I guess this has gone relatively unobserved for so long, apart from what @RalfJung noticed in their own work.)

@pnkfelix
Copy link
Member

pnkfelix commented Oct 2, 2019

The short version of my previous comment is that I bet I could make a small change to mark extern fn's with Rust or RustCall ABIs as unwinding, which would have less impact than @RalfJung's proposal and thus be a better candidate for backport.

  • Update: Confirmed locally; yes we can fix this via a small targeted change. (Not sure if it will be the right long term solution, but it certainly seems like the right short-term step if only for backport purposes.)

Centril added a commit to Centril/rust that referenced this issue Oct 3, 2019
…arking-rust-abi-unwind-issue-64655, r=alexcrichton

Always mark rust and rust-call abi's as unwind

PR rust-lang#63909 identified a bug that had been injected by PR rust-lang#55982. As discussed on rust-lang#64655 (comment) , we started marking extern items as nounwind, *even* extern items that said they were using "Rust" or "rust-call" ABI.

This is a more targeted variant of PR rust-lang#63909 that fixes the above bug.

Fix rust-lang#64655

----

I personally suspect we will want PR rust-lang#63909 to land in the long-term

But:
 *  it is not certain that PR rust-lang#63909 *will* land,
 * more importantly, PR rust-lang#63909 almost certainly will not be backported to beta/stable.

The identified bug was more severe than I think anyone realized (apart from perhaps @gnzlbg, as noted [here](rust-lang#63909 (comment))).

Thus, I was motivated to write this PR, which fixes *just* the issue with extern rust/rust-call functions, and deliberately avoids injecting further deviation from current behavior (you can see further notes on this in the comments of the code added here).
Centril added a commit to Centril/rust that referenced this issue Oct 3, 2019
…arking-rust-abi-unwind-issue-64655, r=alexcrichton

Always mark rust and rust-call abi's as unwind

PR rust-lang#63909 identified a bug that had been injected by PR rust-lang#55982. As discussed on rust-lang#64655 (comment) , we started marking extern items as nounwind, *even* extern items that said they were using "Rust" or "rust-call" ABI.

This is a more targeted variant of PR rust-lang#63909 that fixes the above bug.

Fix rust-lang#64655

----

I personally suspect we will want PR rust-lang#63909 to land in the long-term

But:
 *  it is not certain that PR rust-lang#63909 *will* land,
 * more importantly, PR rust-lang#63909 almost certainly will not be backported to beta/stable.

The identified bug was more severe than I think anyone realized (apart from perhaps @gnzlbg, as noted [here](rust-lang#63909 (comment))).

Thus, I was motivated to write this PR, which fixes *just* the issue with extern rust/rust-call functions, and deliberately avoids injecting further deviation from current behavior (you can see further notes on this in the comments of the code added here).
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Oct 4, 2019
…arking-rust-abi-unwind-issue-64655, r=alexcrichton

Always mark rust and rust-call abi's as unwind

PR rust-lang#63909 identified a bug that had been injected by PR rust-lang#55982. As discussed on rust-lang#64655 (comment) , we started marking extern items as nounwind, *even* extern items that said they were using "Rust" or "rust-call" ABI.

This is a more targeted variant of PR rust-lang#63909 that fixes the above bug.

Fix rust-lang#64655

----

I personally suspect we will want PR rust-lang#63909 to land in the long-term

But:
 *  it is not certain that PR rust-lang#63909 *will* land,
 * more importantly, PR rust-lang#63909 almost certainly will not be backported to beta/stable.

The identified bug was more severe than I think anyone realized (apart from perhaps @gnzlbg, as noted [here](rust-lang#63909 (comment))).

Thus, I was motivated to write this PR, which fixes *just* the issue with extern rust/rust-call functions, and deliberately avoids injecting further deviation from current behavior (you can see further notes on this in the comments of the code added here).
@bors bors closed this as completed in 4b42e91 Oct 12, 2019
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Oct 22, 2019
…king-rust-abi-unwind-issue-64655, r=alexcrichton

Always mark rust and rust-call abi's as unwind

PR rust-lang#63909 identified a bug that had been injected by PR rust-lang#55982. As discussed on rust-lang#64655 (comment) , we started marking extern items as nounwind, *even* extern items that said they were using "Rust" or "rust-call" ABI.

This is a more targeted variant of PR rust-lang#63909 that fixes the above bug.

Fix rust-lang#64655

----

I personally suspect we will want PR rust-lang#63909 to land in the long-term

But:
 *  it is not certain that PR rust-lang#63909 *will* land,
 * more importantly, PR rust-lang#63909 almost certainly will not be backported to beta/stable.

The identified bug was more severe than I think anyone realized (apart from perhaps @gnzlbg, as noted [here](rust-lang#63909 (comment))).

Thus, I was motivated to write this PR, which fixes *just* the issue with extern rust/rust-call functions, and deliberately avoids injecting further deviation from current behavior (you can see further notes on this in the comments of the code added here).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-destructors Area: Destructors (`Drop`, …) C-bug Category: This is a bug. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants