Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in a no-unwind function leads to not dropping local variables #123231

Open
RalfJung opened this issue Mar 30, 2024 · 96 comments
Open

panic in a no-unwind function leads to not dropping local variables #123231

RalfJung opened this issue Mar 30, 2024 · 96 comments
Labels
F-c_unwind `#![feature(c_unwind)]` T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Mar 30, 2024

Raised by @CAD97:

I actually find the current behavior of #![feature(c_unwind)] unwinding in extern "C" somewhat strange. Specifically, any unwinding edges within the extern "C" fn get sent to core::panicking::panic_cannot_unwind, meaning that the unwind happens up to the extern "C" fn, but any locals in said function don't get dropped. I would personally not expect wrapping the body of an extern "C" function in an inner extern "Rust" function to change behavior, but it does.

Reproducing example:

#![feature(c_unwind)]

struct Noise;
impl Drop for Noise {
    fn drop(&mut self) {
        eprintln!("Noisy Drop");
    }
}

extern "C" fn test() {
    let _val = Noise;
    panic!("heyho");
}

fn main() {
    test();
}

I would expect "Noisy Drop" to be printed, but it is not.

IMO it'd make most sense to guarantee that with panic=unwind, this destructor is still called. @nbdd0121 however said they don't want to guarantee this.

What is the motivation for leaving this unspecified? The current behavior is quite surprising. If I understand @CAD97 correctly, we currently could make "Noisy Drop" be executed by tweaking the MIR we generate.

Tracking:

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 30, 2024
@RalfJung RalfJung added the F-c_unwind `#![feature(c_unwind)]` label Mar 30, 2024
@RalfJung RalfJung added the I-lang-nominated Nominated for discussion during a lang team meeting. label Mar 30, 2024
@RalfJung
Copy link
Member Author

Nominating for t-lang discussion to get their vibe on this question.

@nbdd0121
Copy link
Contributor

@RalfJung
Copy link
Member Author

Wow that was years ago. What's the take-away from that discussion?

@nbdd0121
Copy link
Contributor

As pointed out in the discussion, changing how we insert abort to functions is sufficient to change the observed behaviour of the implementation, but the key is to decide what's the allowed behaviour of any implementation.

Options:

  • To not call any destructors if panic happens in no-unwind context.
    • Is quite desirable and can be helpful for debugging
    • This is currently the behaviour for MSVC SEH in cleanup context.
    • We probably can't implement this behaviour for all platforms.
  • To call all destructors before the unwind.
    • Additional code needs to be injected to guarantee this (especially for MSVC SEH cleanup case)
    • Questionable value about running all destructors if abort is to happen.
  • Leave it unspecified whether destructors will be called at all, or how many call frames are to be unwound before aborting.
    • Maximum flexibility for implementation
    • Consistent with C++'s std::terminate specification.
    • Can avoid adding landing pads and destructor code if we know for certain abort is about to happen.

Some additional complexity involves a foreign exception. For example, if we have a mixture of stack frames with C++ and Rust then any specification may result in surprises. E.g. when a nounwind frame is introduced by a C++ noexcept function, it's up to C++ personality function to decide whether a Rust panic may trigger C++ std::terminate in phase 1 unwinding before any Rust destructor is called, or trigger in phase 2 after Rust destructor is called and terminate when it reaches C++. If we have a C++ -> Rust -> C++ -> Rust call stack if will mean that we might have some Rust frames have destructors called but not other Rust frames.

cc @rust-lang/wg-ffi-unwind

@RalfJung
Copy link
Member Author

RalfJung commented Mar 30, 2024

To not call any destructors if panic happens in no-unwind context.

Judging from prior discussion, "no-unwind context" is a very specific technical term here and not something visible in Rust? Or what exactly do you mean?

Is quite desirable and can be helpful for debugging

Why? If by this you mean the current behavior, I find it undesirable and confusing and thus hindering debugging.

Or do you mean that the panic will somehow predict whether it will during its unwinding hit a no-unwind stack frame and then change behavior early on? That's spooky-action-at-a-distance, so I also don't think that's desirable.

Maximum flexibility for implementation

That's a pretty poor argument IMO, our job is to provide consistent and predictable semantics to our users whenever that is possible with reasonable performance.

Some additional complexity involves a foreign exception.

For this issue I only care about Rust panics.

@saethlin saethlin added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 30, 2024
@nbdd0121
Copy link
Contributor

Let me explain in greater detail, this'll be long..

For unwinding, there are a few type of cases for nounwind/noexcept/whatever:

  1. A stack frame contains no unwind metadata / reaches end of frame
  2. Personality function determines the callsite is nounwind and unwinding should terminate
  3. There's a try/catch and the exception handler calls terminate.

Unwinders can either do single phase unwinding, or do two phase unwinding. For the latter, it do a unwind first without calling any destructors to find the frame that will catch the exception, and then starts an unwind with cleanups. If a catching frame cannot be found, the unwinding process fails at phase 1 (Let's ignore forced unwind for now). The Itanium C++ ABI unwinder has two phases.

So if you have a C code (compiled without unwind tables) that calls into Rust C-unwind function and panics, then it's case (1), and phase 1 will fail. No destructor will be executed.

A C++ noexcept function is of case (2) in GCC. With GCC's personality function implementation, the phase 1 will consider a noexcept function as a catching frame, and complete (stop at this frame) without failing. When unwind happens into the noexcept frame, no cleanup is performed and a termination happens immediately (similar to Rust's behaviour today).

A C++ noexcept function is of case (3) in Clang because it doesn't yet support encoding the information to be exposed to personality function. noexcept is codegened as try {...} catch(...) { terminate() } (roughly). Since it's a catch, phase 1 will stop at the frame. Upon phase 2 reaching this frame, it will execute terminate() without calling any destructors in the final frame (since it's a try/catch and thus terminate is executed with the variables still in-scope, so rightfully the destructor of the last frame is skipped). This is also similar to Rust's behaviour today.

There's a subtle difference between (2) and (3) w.r.t to optimisation. Say we have this code:

#include <cstdio>

struct D {
    ~D() {
        fprintf(stderr, "Drop!\n");
    }
};

static void foo() {
    D d;
    throw "";
}

static void bar() noexcept {
    D d;
    foo();
}

int main() {
    bar();
}

in both GCC and Clang, you will get one Drop! print only. If you turn on GCC's optimisation though, you will get no Drop! prints. You still get one Drop! print with Clang + opt. In no cases you get two prints. They're all acceptable behaviour because C++ spec says in this case it is unspecified whether the destructors are called.

The Rust behaviour today is very similar to clang's behaviour in the example.


Now to answer your questions

Or do you mean that the panic will somehow predict whether it will during its unwinding hit a no-unwind stack frame and then change behavior early on? That's spooky-action-at-a-distance, so I also don't think that's desirable.

Yes, I mean this. Since unwind is of two phases, the phase 1 can determine if unwind is possible and can skip phase 2 entirely. It's already the case that Rust panic will cause nounwind at all if it escapes into functions with no unwind tables, or, with MSVC SEH, into cleanup code.

It's helpful for debugging because all the stack frames are intact so you can inspect all frames upon abort. Currently when we unwind, hit an extern "C" frame, and print a panic-cannot-unwind message. If RUST_BACKTRACE is not enabled, then the first panic prints no stack trace and you already lost the information when abort happens. What could been done is to use phase 1 to figure out that unwind will cause terminate, and then print stack trace, along with an indication of which frame prevents the unwinding from happening.

That's a pretty poor argument IMO, our job is to provide consistent and predictable semantics to our users whenever that is possible with reasonable performance.

Given FFI is very important regard to extern "C" and all unwindable ABIs, I think it's also important to consider consistency with other languages. At detailed above, the implementation today is very consistent with C++ implementation's behaviour.

We certainly can define the behaviour to be "all destructors" being executed. It'll simply be requiring adding try/catch around all extern "C" functions. However, I am not sure it'll be desirable. When GCC backend is getting better or if LLVM adds support for encoding an "terminate" action, this will prevent us from moving over from case (3) to case (2) which can reduce code size quite significantly.

If leaving this unspecified allows better optimisation w.r.t. landing pad sizes, then IMO we should allow such optimisation given that a panic escaping to unwindable FFI interface is very rare and almost always a bug (given abort is imminent). If one wants all destructor to be called, they can very easily implement that behaviour with catch_unwind.

Some additional complexity involves a foreign exception.

This will happen with Rust panic in presence with foreign frames, as well as foreign exception with Rust frames. I think it'll less consistent if our specification of Rust panic behaviour depends on whether a foreign frame is present or not.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 30, 2024 via email

@chorman0773
Copy link
Contributor

chorman0773 commented Mar 30, 2024

The Itanium C++ ABI is the abi used by gcc and clang on most non-windows targets.
The ABI was standardized for IA-64, hense the name, but the spec is used on many platforms.

Rust already uses Itanium EH for panics on the same set of targets.

@bjorn3
Copy link
Member

bjorn3 commented Mar 30, 2024

The Itanium C++ ABI unwinder has two phases.

Itanium is dead, why should we care?

Basically every UNIX uses the same unwinder ABI as replacement for the old SjLj unwinder ABI which had non-zero overhead even when not throwing any exceptions. arm32 iOS is the only SjLj target we support(ed).

@RalfJung
Copy link
Member Author

Thanks for explaining the Itanium thing.

While exploring what C++ does is interesting, I don't think C++ is necessarily a good guiding star to follow. We tend to value cross-platform consistency and predictability much more than C++ does. Having the number of drops depend on the optimization level sounds completely unacceptable to me.

So, ignoring what C++ does -- what are the downsides to saying that consistently, everything must be dropped until the boundary, i.e. even in the last stackframe?

Yes, I mean this. Since unwind is of two phases, the phase 1 can determine if unwind is possible and can skip phase 2 entirely. It's already the case that Rust panic will cause nounwind at all if it escapes into functions with no unwind tables, or, with MSVC SEH, into cleanup code.

What exactly does this mean? You are assuming that I know what all these words mean. :) Can you state this in terms of what the Rust programmer sees as end-to-end behavior?

It's helpful for debugging because all the stack frames are intact so you can inspect all frames upon abort.

That argument applies to all panics. You are suggesting to make debugging better for some small subclass of panics. I don't think it's worth doing this only for "panics that happen to lead to an abort later". In fact I think that makes debugging worse because for some panics you'll see the full stack and for some you won't.

Instead, just set a breakpoint on some symbol inside the panic machinery. (AFAIK we have a dedicated symbol for that?) Or set panic=abort. In both cases the debugger will reliably trap before unwinding begins.

Given FFI is very important regard to extern "C" and all unwindable ABIs, I think it's also important to consider consistency with other languages. At detailed above, the implementation today is very consistent with C++ implementation's behaviour.

I think consistency with C++ is just as often something we explicitly don't want as we disagree with the C++ design philosophy. I also doubt most C++ programmers will even know that this is how C++ behaves, so the consistency only helps those few people that know the ins and outs of how unwinding is implemented.

If leaving this unspecified allows better optimisation w.r.t. landing pad sizes, then IMO we should allow such optimisation given that a panic escaping to unwindable FFI interface is very rare and almost always a bug (given abort is imminent). If one wants all destructor to be called, they can very easily implement that behaviour with catch_unwind.

All panics are always a bug.

The question is whether those landing pad size wins are worth it for the extra confusion that inconsistent behavior will cause. And as I said above I think making this opt-level-dependent is completely inacceptable. That would mean if I see a panic in my release build and then try to debug it in a debug build it will behave completely differently! Maybe it's okay to say that behavior can differ between targets and between Rust versions, but I don't think we want any more variability than that.

@RalfJung
Copy link
Member Author

@CAD97 also makes a good point:

I've personally settled into accepting turning an unwind through drop glue into an abort (e.g. by adding an abort call into the unwinding pads) as a practical choice. But not if an abort occurs without ever saying why, or at least calling the unwind handler with can_unwind=false.

If we allow the "unwind phase 1" to determine that unwinding will be skipped entirely, we end up with a can_unwind=true panic leading to an immediate abort. It can be quite hard for users to figure out what is even going on there, and why their destructors are not being executed.

In contrast, today we get a pretty nice error, where first there's a regular panic message and then when it hits a nounwind function (or drop, with the flag enabled), it prints a secondary message explaining why the panic was turned into an abort. (At least we get that on Linux. No idea if we reliably get it on all targets.)

@bjorn3
Copy link
Member

bjorn3 commented Mar 31, 2024

Instead, just set a breakpoint on some symbol inside the panic machinery. (AFAIK we have a dedicated symbol for that?) Or set panic=abort. In both cases the debugger will reliably trap before unwinding begins.

That won't work if you didn't have a debugger attached from the start, but are relying on a coredump produced at the point of the SIGILL.

@BatmanAoD
Copy link
Member

..., our job is to provide consistent and predictable semantics to our users whenever that is possible with reasonable performance.

It's difficult for me to object to this, because I think the principle is good. But I'm not sure that I agree it applies to specifying the precise behavior of a program that is already in the process of early termination.

A primary goal expressed to the working group from the start was to preserve unwind implementation flexibility, at least in terms of what is formally guaranteed. In fact, RFC-2945 originally did not even guarantee that a foreign exception entering Rust via extern "C" would be caught and trigger an abort.

It was also expressed that one downside of proposed changes to extern "C" is that it makes the ABI string a sort of half-baked effects system for exceptions, which is not really part of the proper role of an effect system. (Similar feedback also came from outside the Rust project; a Clang developer said something like "the ABI is not a sandbox" to me.)

The person inside the Rust project who expressed these concerns is no longer active in the project. But I nevertheless think we should be very cautious about introducing strong guarantees around the "abort" behavior unless we are very confident that they can be upheld on every platform on which we might wish for Rust to run, without jeopardizing performance.

All panics are always a bug.

Unfortunately, I'm not sure this is actually true, especially in the context of cross-FFI unwinding. One simple counterexample is allocator exhaustion: yes, this can often indicate a memory leak, but it's also possible that someone is simply running too much on a particular device.

@workingjubilee
Copy link
Member

As I have cheerily mentioned several times: I work on a library that catches longjmps, translates them into panics, and then translates them back into longjmps, using a baroque mechanism for making this actually conform to Rust's expected control flow semantics. A panic does not necessarily indicate a bug in the code that anyone using my library can actually control, because they do not necessarily have that much input into when the C code decides it wants to throw its home-baked "exceptions", and the alternatives to panicking when we run into these tend to be... worse. So I would turn it around to a different angle:

Even assuming it is "always" a bug, what should anyone do about it?

@RalfJung
Copy link
Member Author

RalfJung commented Apr 1, 2024

All panics are always a bug.

I said this in reply to a claim that "panics that will abort are a bug, therefore we can do weird semantics that make little sense unless one has studied unwinding ABIs for years". (I may have rephrased the argument a bit. ;) I don't think that's a valid argument, because sane behavior is important even in the presence of bugs. That's why UB is so nasty, it's the kind of bug where we don't have sane behavior any more. But here we're talking about cases which are explicitly not UB, they abort in a safe way, and I really don't think we should have UB-level of "spooky action at a distance" here -- something like the GCC behavior described above where with more optimizations, fewer destructors run. In terms of being able to debug and make sense of the situation, that's almost as bad as the nastiness one can see with UB. We have to ensure that will never happen.

It's difficult for me to object to this, because I think the principle is good. But I'm not sure that I agree it applies to specifying the precise behavior of a program that is already in the process of early termination.

The problem is, your definition of "being in the process of early termination" requires predicting the future. That goes entirely against the basic principles of an operational semantics, where we define step by step what happens.

I would like for unwinding to be a step-by-step process that just proceeds stack frame by stack frame. That would be a sane semantics people can understand and Miri can implement. But having to predict whether we will abort due to a condition that only becomes apparent later is a complete mess.

Basically, I am objecting to including anything like 2-phase-unwind into the opsem of Rust. 2-phase-unwind is an implementation detail, I don't see good reason why it should be in the spec. And without 2-phase-unwind, it must be the case that a panic that will abort 10 stack frames down, and a panic that will not abort, behave the same, since we can't predict the future.

(And even worse than having 2-phase-unwind in the spec would be having it in the spec only sometimes. That's just a nightmare scenario. And it seems like only some target do 2-phase-unwind so we couldn't even make the spec say that we always do 2-phase unwind.)

@RalfJung
Copy link
Member Author

RalfJung commented Apr 1, 2024

It was also expressed that one downside of proposed changes to extern "C" is that it makes the ABI string a sort of half-baked effects system for exceptions, which is not really part of the proper role of an effect system. (Similar feedback also came from outside the Rust project; a Clang developer said something like "the ABI is not a sandbox" to me.)

(I think this is not really related to this discussion, but I can't help but reply.^^)

I agree the extern "C" behavior here is somewhat surprising; I didn't expect this outcome either when the C-unwind work started. However it is a direct consequence of having ABIs where unwinding is UB, having panics in safe code, and having memory safety. (So I am not surprised a clang developer felt this looked strange, since I would not expect them to consider the memory safety implications.) We only have two choices: make extern "C" unsafe to write and put the responsibility of catching all panics onto the user, or make it safe and put that responsibility onto the compiler. I think you made the right choice here.

Also this is not like an effect system. An effect system would track which functions may or may not unwind, and just reject the code when you call a may-unwind function in an extern "C" function. That would be a third alternative besides "make it unsafe" and "make the compiler catch unwinding".

Using an optimized ABI is totally a possible role of an effect system, just like it is a role of a type system to enable optimized data representations. (Or did you mean to say "not really part of the proper role of an ABI"?)

@workingjubilee
Copy link
Member

workingjubilee commented Apr 1, 2024

honestly I think, ironically, that C++ sometimes is a good model for unwinding...

...specifically, MSVC.

my understanding is Windows adopts a different approach than the Itanium ABI unwinding, from the ground level up: the by-default behavior is to unify all mechanisms of nonlocal control flow for all languages compiled on it. that means it doesn't matter if you are a Rust panic, C++ exception, C longjmp, or even bare assembly: you get to participate in structured exception handling. throwing an exception, longjmp, and so on are all the same unwinding mechanism, by default, so everything works the same and everyone can catch and rethrow and in general understand each other's errors, even if C only sees all exceptions as int. and as far as I can tell the semantics tend to be about as mercilessly straightforward as Ralf would like, instead of having odd 2-phase properties.

the implementation of Visual C++ does have a compile option that allows C++ code to choose whether the try-catch interacts with the same exceptions C does by default, for reasons that are not clear to me. I believe the C++ code can still use __try and __except like C can, however.

for platforms that have this quirk it is probably very useful to preserve it.

@chorman0773
Copy link
Contributor

Basically, I am objecting to including anything like 2-phase-unwind into the opsem of Rust. 2-phase-unwind is an implementation detail, I don't see good reason why it should be in the spec. And without 2-phase-unwind, it must be the case that a panic that will abort 10 stack frames down, and a panic that will not abort, behave the same, since we can't predict the future.

This means that, for it to work with Itanium EH (again, used by almost every non-windows platform) and exceptions thrown by C++, the abort edge in an extern "C" function must be a handler (which is costly, and might cause some additional issues). I had expected being able to handle it the same way as a noexcept function in C++, thus using a smaller frame table (with fewer edge cases than effectively being a bloody catch handler).
It's hard to be consistent with semantics across all platforms when unwinding itself isn't.

@ChrisDenton
Copy link
Member

I believe the C++ code can still use __try and __except like C can, however.

This is necessary for some OS APIs. In Rust we have to resort to C shims to use such APIs, which isn't great.

@RalfJung
Copy link
Member Author

RalfJung commented Apr 1, 2024

This means that, for it to work with Itanium EH (again, used by almost every non-windows platform) and exceptions thrown by C++, the abort edge in an extern "C" function must be a handler (which is costly, and might cause some additional issues). I had expected being able to handle it the same way as a noexcept function in C++, thus using a smaller frame table (with fewer edge cases than effectively being a bloody catch handler).
It's hard to be consistent with semantics across all platforms when unwinding itself isn't.

I am primarily concerned with unwinding from Rust panics, not exceptions triggered by other languages. I don't have as strong opinions on how C++ exceptions should behave as they pass through Rust stack frames. AFAIK we don't currently say much about what the rules even are there?

@chorman0773
Copy link
Contributor

chorman0773 commented Apr 1, 2024

SEH rules are indeed nice and consistent, but we aren't ever going to have a world where they're the only rules.

don't have as strong opinions on how C++ exceptions should behave as they pass through Rust stack frames

Frankly, I'd expect them to interact with destructors the same way for consistency, both as a user, and an implementor here.

@Mark-Simulacrum
Copy link
Member

From #123231 (comment):

must be a handler (which is costly, and might cause some additional issues)

Don't we expose handlers via std::panic::catch_unwind already? My expectation is that at least in some cases I will be applying that to the body of every extern "C" exposed function to propagate panics as error codes. If there are problems with that (UB in some cases? Slow code? etc.) that would be great to know + document somewhere.

My mental model here aligns pretty closely with @RalfJung's I think: if the "extra cost" to making Drop behave the same way (e.g., if I'm debugging and stick an eprintln! in a Drop, I want to see it, even if the program aborts some amount of frames later!) regardless of whether there's some extern "C" function somewhere is just a bit of extra code in the binary, then I'd happily pay that price. Presumably, that code can be removed if LLVM (or Rust, via an effect system eventually) is able to statically prove a lack of Drop-requiring objects, too, and is necessary in every other function defined in Rust, right?

@chorman0773
Copy link
Contributor

chorman0773 commented Apr 1, 2024

Don't we expose handlers via std::panic::catch_unwind already? My expectation is that at least in some cases I will be applying that to the body of every extern "C" exposed function to propagate panics as error codes. If there are problems with that (UB in some cases? Slow code? etc.) that would be great to know + document somewhere.

It's EH Table size, and function size mostly IIRC. Compared to catch_unwind that inserts one handler somewhere (and doesn't even fully try to handle foreign exceptions, at least in the rustc impl), that's a lot nicer than adding a bunch of extra full handlers should interact properly with foreign exceptions, including C++. Specifically, we need to actually act in the two-phase system, returning _URC_HANDLER_FOUND in the search phase, and then actually jumping to the landing pad in the unwind phase.

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

This PR doesn't change the behaviour for cleanup blocks. Particularly, trying to initiate a panic in cleanup block would cause immediate abort in SEH.

I thought that aborting on cleanup destructors that panic was an intentional step to reduce code size (do we have an RFC for that?). There are some vibes for making all panics from destructors abort for code size/optimization reasons.

Also, I have an instinct that landing pad optimizations in extern "C" functions are not that interesting, since I expect that an extern "C" function + everything inlined into it to be quite small. Do we have any data?

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

In the hack.md option 1, what is the "extern "C-unwind" to extern "C" causes more unwinding" case? Is it that an extern "C" function prevents phase-1 abort?

What option would be "have an extern "C" function behave like a function with a local that has a destructor that aborts" (so if a phase-1 abort would happen, it would still happen, but if a phase-1 abort does not happen, all destructors are run and then the code aborts)? Ed: is the answer that we don't want even phase-1 unwinding to propagate past "extern C"?

@RalfJung
Copy link
Member Author

RalfJung commented Aug 26, 2024

@joshtriplett

Before Wednesday when we evaluate this, could we get a summary of what would go horribly wrong if we wanted to guarantee that?

This comment is a great summary. This one also argues against making this a hard guarantee. The summary of the summary is that the arguments against the guarantees are: consistency with C++, and code size, and a concern that we may not be able to implement the "guaranteed destructor execution" semantics on all targets. (I hope I got this right, @nbdd0121 please correct me if this is wrong.)

One point that came up later is that if a Rust panic escapes into another language, and that other languages has stack frames that indicate "unwinding aborts here", then this is outside of our control -- the unwind mechanism may decide to not unwind and just abort the process immediately. So there'll always be some cases where destructors get skipped for surprising reasons, the only question we have the power to decide is whether this will happen even in Rust programs that never let a panic escape into non-Rust code. See here for a proposed spec that accounts for that, but also see the reply by @nbdd0121 saying they'd prefer a weaker spec that lets us skip destructors even if the unwind stays within "Rust land". Here is another proposed spec, that frames the case of "skipping destructors because unwind turned into abort" as a case of "turning an unwinding panic into an aborting panic" -- which is quite elegant.

However, as this example shows, neither of these specs describes the current behavior: when an unwind stays entirely inside Rust but ends at an extern "C" fn, then some destructors are run but others are not. So we'd need a different spec that allows for this, like:

  • When an unwind event (unwinding panic or foreign exception) enters a Rust stack frame or is generated by Rust, then
    • Either that unwind is handled at some point (via catch_unwind in Rust, or similar mechanisms in another language). In that case the destructors of all stack frames that the unwind passes through are being executed.
    • Or the unwind eventually turns into an abort (e.g. an extern "C" Rust function). In that case the destructors of some prefix of the stack frames that the unwind passes through are being executed. (In current practice, it'll be either "no stack frame" or "all but the last stack frame" that have destructors executed.)

The reason this happens is that MIR building explicitly turns any unwinding reaching an extern "C" stack frame into an immediate abort, without even running any destructors for variables in that stack frame. (However, if other variables get inlined into this function, their destructors are still being executed.) This has nothing to do with the unwind runtime realizing that the unwind will turn into an abort down the road.

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

I am not sure if this accurately describes current behavior -- with transformations like outlining, could it be the case that some destructors in a stack frame run and others do not?

I believe LLVM optimizations are behavior-preserving, even in the presence of unwinding.

@RalfJung
Copy link
Member Author

I was about to say MIR inlining does not preserve behavior, but it seems like it does. I have no idea why, though -- after inlining, why do we execute the drop of test_inner's local variable but not the one in test, even though they are now both in a nounwind function?

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

It's because extern "C" functions today generate a terminator that immediately aborts

@nbdd0121
Copy link
Contributor

Thanks @RalfJung, I couldn't explain it better.

  • When an unwind event (unwinding panic or foreign exception) enters a Rust stack frame or is generated by Rust, then

    * Either that unwind is handled at some point (via `catch_unwind` in Rust, or similar mechanisms in another language). In that case the destructors of all stack frames that the unwind passes through are being executed.
    * Or the unwind eventually turns into an abort (e.g. an `extern "C"` Rust function). In that case the destructors of some prefix of the stack frames that the unwind passes through are being executed.
    

I am not sure if this accurately describes current behavior -- with transformations like outlining, could it be the case that some destructors in a stack frame run and others do not?

I think this is the correct description of the current behaviour.

I was about to say MIR inlining does not preserve behavior, but it seems like it does. I have no idea why, though -- after inlining, why do we execute the drop of test_inner's local variable but not the one in test, even though they are now both in a nounwind function?

The MIR inlining happens after the aborting unwinding call MIR pass, which explains this behaviour.

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

The MIR inlining happens after the aborting unwinding call MIR pass, which explains this behaviour.

Right, after the "aborting unwinding call MIR pass", which for semantics purposes counts as a part of MIR generation, the code is

.start:
x = Noisy
foo() [finish=.done, unwind=.die]

.done:
drop(x) [finish=.ret, unwind=.die]

.ret:
return

.die:
abort

Unwinding in MIR behaves like a normal control-flow edge. You can have some platform-dependent behaviors when calling panic causes an abort instead of an unwind, but in that case you never start unwinding.

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 26, 2024 via email

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 26, 2024

(In lccc for extern "C" et. al, I'd emit either a phase 1 stop attribute or a phase 2 stop attribute in xlang, rather than using something like unwind abort, so either behaviour would make no difference to me - the result of the phase 1 stop is that no destructors are executed, and the phase 2 stop is that all destructors are executed up to and including the ones in the frame with the stop attribute)

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

I think "unwind edges are normal control-flow edges, panic might abort for implementation-defined reasons" is the right model of thinking about things. Which probably means we want to take the hack.md option (1) or (2) because the Rust 1.81 control flow edges are weird.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 26, 2024

Right, after the "aborting unwinding call MIR pass", which for semantics purposes counts as a part of MIR generation

So there are actually two reason for destructors being skipped?

  1. The Itanium ABI two-phase thing. This will either run all destructors or none of them.
  2. Our own MIR generation, which will skip the destructors in the "last" stack frame.

I that case I think we should definitely change MIR generation to remove item 2 from this list.

FTR, C++ promises all-or-nothing for when destructors are run before a
std::terminate

Doesn't that contradict the observations made here?

Which probably means we want to take the hack.md option (1) or (2) because the Rust 1.81 control flow edges are weird.

Which hack.md?

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

So there are actually two reason for destructors being skipped?

Yes

hack.md=https://hackmd.io/pNOSw5QoRSeiJhIEyJB7dA

@RalfJung
Copy link
Member Author

Ah, so @nbdd0121 is preparing a summary as well, great. :)

Another case where it's not allowed to unwind past is cleanup blocks, which we use similar MIR construct today:

I would say behavior is quite different today between that and extern "C". For unwinding out of a destructor during cleanup, we abort the process when the destructor that we called during cleanup itself unwinds -- i.e., we abort after that "may not unwind" call is done. Any destructors inside that destructor are called.

In contrast, with extern "C" we abort the process when unwinding enters the "may not unwind" stack frame. That doesn't feel like the same situation at all.

@nbdd0121
Copy link
Contributor

Theoretically the destructor that we called is not the boundary that does not permit unwind, but the cleanup blocks. Similarly, the extern "C" function calls could unwind, but not out from extern "C" functions. So I think there are some similarity there.

@nbdd0121
Copy link
Contributor

I'd hope so. FTR, C++ promises all-or-nothing for when destructors are run before a std::terminate (though its per throw). If an exception would hit a terminate boundary (noexcept function, top of stack, or destructor called during stack unwinding), the compiler gets to choose whether or not stack unwinding occurs, and then, if it does, it requires it to unwind the stack straight up to the unwind boundary - it doesn't allow the compiler to only unwind certain frames. It can, however, distinguish per-exception, so the case of throw inside a noexcept function with no catch could cause an immediate abort even when a throw one call deep would unwind then abort, but in the one-call-deep case, it can't unwind the callee then see the noexcept in the caller, skip its destructors, and terminate immediately.

https://eel.is/c++draft/except.terminate#2:

In the situation where the search for a handler ([except.handle]) encounters the outermost block of a function with a non-throwing exception specification ([except.spec]), it is implementation-defined whether the stack is unwound, unwound partially, or not unwound at all before the function std​::​terminate is invoked.

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 26, 2024

Yeah, I just found that. I stand corrected in relation to non-throwing exception specifications.

Though, this sentance sticks out to me

An implementation is not permitted to finish stack unwinding prematurely based on a determination that the unwind process will eventually cause an invocation of the function std​::​terminate.

@RalfJung
Copy link
Member Author

Theoretically the destructor that we called is not the boundary that does not permit unwind, but the cleanup blocks. Similarly, the extern "C" function calls could unwind, but not out from extern "C" functions. So I think there are some similarity there.

I don't agree with that framing. From an end user perspective, there's no such thing as a "cleanup block" -- that's an implementation detail. What matters is: when a destructor gets called during unwinding, it may not itself unwind. An unwind out of a destructor called during unwinding leads to an immediate abort. Those semantics make a lot of sense intuitively at a high level, without going into MIR details --and they match the currently implemented semantics.

It is only the extern "C" case where the implemented semantics are not intuitive.

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

@RalfJung

At the end of the drop glue, not destructor.

e.g. this (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=dfa1c8e8df671512b6cc9bf1aae964e9):

struct Noisy1(Noisy2);

struct Noisy2(u32);

impl Drop for Noisy1 {
    fn drop(&mut self) {
        eprintln!("Noisy1(Noisy2({}))", self.0.0);
        panic!("unwind 2")
    }
}

impl Drop for Noisy2 {
    fn drop(&mut self) {
        eprintln!("Noisy2({})", self.0)
    }
}

fn main() {
    let _m = Noisy2(0);
    let _n = Noisy1(Noisy2(1));
    
    panic!("unwind")
}

Prints:

thread 'main' panicked at src/main.rs:22:5:
unwind
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Noisy1(Noisy2(1))
thread 'main' panicked at src/main.rs:8:9:
unwind 2
stack backtrace: <backtrace>
Noisy2(1)
thread 'main' panicked at library/core/src/panicking.rs:227:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.

Only the Noisy2(0) destructor gets skipped since it's a different drop glue, the Noisy2(1) destructor is still run.

@RalfJung
Copy link
Member Author

When I say "destructor" I think I mean what you call "drop glue": I am referring to drop_in_place.

@arielb1
Copy link
Contributor

arielb1 commented Aug 26, 2024

yes standard confusion between "destructor=Drop::drop" and "destructor=drop_in_place". Does the reference have a standard for the naming there (I think it does, calling drop_in_place "destructor" and not giving Drop::drop a special name)?

I find everything other than "drop impls are always nounwind" or "double-panics insta-abort" weird in some cases, and "drop impls are always nounwind" and "double-panics insta-abort" have their own disadvantages (tho I'm still not convinced they are not the right solution). But this is a digression.

@tmandry
Copy link
Member

tmandry commented Aug 27, 2024

I think there are two questions here:

  1. What should we guarantee
  2. What should the implementation do

Do we have any examples of real code that was depending on the current behavior? As I understand it, the point of this stabilization is to take code that was always technically UB but had no way to be correctly written, so I think we should try to be accommodating here.

@arielb1
Copy link
Contributor

arielb1 commented Aug 27, 2024

I think that "any given instance of unwinding might abort for implementation-defined reasons" is descriptive, but that as long as unwinding does not abort, unwinding control flow should be very well-defined.

And if it's well-defined, I think the version after #129582 is a more obvious control flow than the version in 1.81.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 27, 2024

@tmandry

What should we guarantee

I think we should guarantee that we run all destructors during unwind, and leave room for "unwind might fail to initiate and abort immediately instead" to account for 2-phase unwinding. That's reasonably easy to understand. This is currently not the case but #129582 implements that, IIUC.

I'm not aware of any code that would rely on the abort happening "early", i.e. skipping some destructors. On current stable, the destructor in the OP example actually does run, so the proposed guarantee (implemented by #129582) is also closer to the status quo than what happens in current beta.

What should the implementation do

It should implement the guarantee. :)

@RalfJung
Copy link
Member Author

@nbdd0121 what are the downsides of #129582?
Previously, arguments against "guaranteed destructor" semantics included

  • consistency with C++
  • code size
  • we may not be able to implement these semantics on all targets

The last point doesn't seem to apply for this PR as it is entirely target-independent. C++ AFAIK allows these destructors to be run, but gcc and clang decide against it. I would guess most C++ programmers are not aware of this, and it seems unlikely anyone would rely on this. Code size of course is increased if we generate more unwinding code, but people that build their code with -Cpanic=unwind presumably want that code to be emitted -- we have -Cpanic=abort for those that want those code size reductions.

@nbdd0121
Copy link
Contributor

As long as "unwind might fail to initiate and abort immediately instead" is allowed:

  • Last point won't apply
  • I think we can still omit landing pads, as long as we change the personality function to ensure that the unwind will always fail to initiate when landing pads are omitted, so code size improvement is still theoretically possible.

@arielb1
Copy link
Contributor

arielb1 commented Aug 27, 2024

I think we can still omit landing pads, as long as we change the personality function to ensure that the unwind will always fail to initiate when landing pads are omitted, so code size improvement is still theoretically possible.

This would be compliant with the specification, but I don't think the code size improvements in that case are worth it unless proven otherwise.

@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We now have nominated:

So we can handle this in that nomination.

@rustbot rustbot removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-c_unwind `#![feature(c_unwind)]` T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests