Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Well-defined unwinding through FFI boundaries #2699

Closed
wants to merge 23 commits into from
Closed
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
286 changes: 286 additions & 0 deletions text/XXX-unwind-FFI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
- Feature Name: `unwind-through-FFI`
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: ???
- Rust Issue: ???

# Summary
[summary]: #summary

Provide a well-defined mechanism for unwinding through FFI boundaries.

* Add an `#[unwind(allowed)]` function attribute that's supported on `extern` functions and `extern` function declarations.
which explicitly permits `extern` functions
to unwind (`panic`) without aborting.
* If this annotation is used anywhere in the dependency tree,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very undesirable.

For the libjpeg case, if the binary wants to abort, that's perfectly fine. libjpeg's default error handler even calls abort() itself. libjpeg only cares that the error handler function doesn't return normally, not whether it does it via abort or unwind.

OTOH I would not like mere presence of libjpeg wrapper anywhere in the executable to become a global configuration problem for the whole product.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kornelski In that case, I am somewhat confused as to why it's important to have a way to permit panics to propagate past FFI boundaries. In the post that called my attention to the problem, you wrote:

These two popular libraries are specifically designed to be unwound....

Rust 1.33 effectively killed my mozjpeg crate...

If aborting is "perfectly fine", why does mozjpeg need a way to unwind past FFI boundaries?

If indeed libjpeg wrappers do need to let unwind operations cross FFI boundaries, that is already a global configuration issue that must be solved for the whole binary (which may be a dynamic library rather than a final executable). This is because such unwinding is not safe unless the behavior of the unwind runtime is stable and guaranteed, and the default panic runtime does not and will not provide such a guarantee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kornelski I think part of the problem is what happens if you compile the app with panic=abort but the dynamic lib was compiled with panic=unwind. In that case the unwind machinery won't exist. I can think of two ways to solve this:

  1. Add a feature to the (mozjpeg) lib so users can opt out of #[unwind(allowed)] attributes if they binary is being compiled with panic=abort. I'm not sure how this would work if mozjpeg is an indirect dependency... I haven't ever tried to fiddle with a transitive denedency's feature flags.
  2. Modify this RFC so it supports #[unwind(allowed)] with panic=abort. This would require the compiler to implicitly wrap calls to the extern panic-able function in a catch block to catch any exception and immediately call abort. I'm not sure whether or not this is feasible in all situations (e.g., weird no_std embedded environments).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjbshaw I don't understand your second suggestion. If a dependency graph has a crate with panic=abort in it, then all panics will abort, and, if #[unwind(allowed)] is permitted (as it currently is in Nightly), it will have no effect whatsoever.

As for no_std environments, those currently can't panic without a custom panic_runtime crate (which is the main reason #![panic_runtime] exists).

generation of the final (binary) product will fail
unless the panic strategy is `unwind`
and a non-default panic runtime is specified.
* Stabilize the `#![panic_runtime]` annotation (from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? It seems like a major feature, much beyond just not adding nounwind attribute to functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because it's the only way to guarantee compatibility between the Rust unwind mechanism and the C/C++ stack frames.

[RFC #1513](less-unwinding)).
* Provide an `unwind` runtime in the standard library
that guarantees compatibility with the native exception mechanism
provided by the compiler backend.
* Provide a Cargo option under `[profile.foo]` to specify a `panic` runtime.

[less-unwinding]: https://github.com/rust-lang/rfcs/blob/master/text/1513-less-unwinding.md

# Motivation
[motivation]: #motivation

This will enable resolving
[rust-lang/rust#58794](rust-ffi-issue)
BatmanAoD marked this conversation as resolved.
Show resolved Hide resolved
without breaking existing code.

Currently, unwinding through an FFI boundary is always undefined behavior.
We would like to make the behavior safe
by aborting when the stack is unwound to an FFI boundary.
However, there are existing Rust crates
(notably, wrappers around the `libpng` and `libjpeg` C libraries)
that rely on the current implementation's compatibility
with the native exception handling mechanism in
GCC, LLVM, and MSVC.

This RFC will give crate authors the ability
to initiate stack-unwinding
that can propagate through other languages' stack frames,
as well as tools for ensuring compatibility with those languages.

[rust-ffi-issue]: https://github.com/rust-lang/rust/issues/58794

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Consider an `extern "C"` function that may `panic`:

```rust
extern "C" fn may_panic(i: i32) {
if i < 0 {
panic!("Oops, I should have used u32.");
}
}
```

In a future (TBD) version of Rust,
calling this function with an argument less than
zero will cause the program to be aborted.
This is the only way the compiler can ensure that the function is safe,
because there is no way for it to know whether or not the calling code
supports the same implementation of stack-unwinding used for Rust.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is assumed to never panic!, and this function, as well as code calling it, will be optimized accordingly.

A valid optimization on this function is therefore to remove the if i < 0 { panic!() } brach, since that code cannot ever be reached without invoking undefined behavior. That is, you end up with extern "C" fn may_panic(i: i32) { }, and calling that from C code is actually "fine", because no unwinding can happen.

This section has to somehow be reworked to account for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Would adding opt-level=0 be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, undefined behavior is optimization level independent. I think the content in the section is just incorrect and needs to be corrected. As in, the undefined behavior here has nothing to do with FFI. The following snippet has undefined behavior, and no FFI was involved:

extern "Rust" fn foo() { panic!() } 
fn bar () { foo() }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg The Rust reference says: "unwinding past the end of an extern function will cause the process to abort", which seems well-defined to me, not UB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, in a future version of Rust that will happen. Right now it is UB. It is unclear to me when the abort happens. I'll ask in the tracking issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg I'm not sure what point you're trying to make. You mention "a valid optimization" that causes the undefined behavior not to interact with the caller's runtime, but of course opt-level=0 wouldn't permit such an optimization.

As for the Rust-only, no-FFI version, that's only undefined behavior from the perspective of Rust users because Rust itself doesn't yet define the behavior, and it's only undefined behavior from the perspective of rustc itself because of the nounwind attribute. The point I am trying to make is that even without the nounwind attribute, for an FFI function, there would still be the potential for undefined behavior, because "there is no way for [the compiler] to know whether or not the calling code supports the same implementation of stack-unwinding used for Rust."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg I'm not sure what point you're trying to make. You mention "a valid optimization" that causes the undefined behavior not to interact with the caller's runtime, but of course opt-level=0 wouldn't permit such an optimization.

We do not guarantee that we don't optimize with -C opt-level=0, at least not in the language reference. Optimization levels are not part of the language, and undefined behavior is undefined, independently of which optimization level you are using. Otherwise, you could claim for any code that works at opt-level=0 that it has defined semantics, but this is not the case (one does not follow from the other).

The point I am trying to make is that even without the nounwind attribute, for an FFI function, there would still be the potential for undefined behavior, because "there is no way for [the compiler] to know whether or not the calling code supports the same implementation of stack-unwinding used for Rust."

I think the proof obligation here is in the code declaring the function, not the code defining it. Some code defines an extern "C" function like this, which is safe:

extern "C" fn foo() { ... }  // this is safe

Calling foo from that module is safe.

However, some other Rust code declares the function to be able to use it:

extern "C" { fn foo(); }

Calling foo from this code is unsafe because there is no way for the compiler to check whether this declaration matches the definition.

If you consider the panic-runtimes as part of the function ABI, then in the definition the compiler can make sure that everything is correct, so I don't see why that would need to be unsafe.

In the declaration, the compiler cannot check anything anyways, so that needs to be unsafe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it is UB. It is unclear to me when the abort happens. I'll ask in the tracking issue.

The implementation of aborting-at-FFI boundary makes it so that unwinding aborts at the FFI boundary after all Rust frames above the FFI boundary are unwound and all scheduled cleanups are executed.


As a concrete example,
this function may be invoked from C code on Linux or BSD
compiled without support for native stack-unwinding.
The C runtime would lack the necessary metadata
to properly propagate the unwinding operation,
so it would be undefined behavior to let the runtime attempt
to unwind the C stack frames.

However, if the C code is compiled
with support for the same stack-unwinding mechanism
used to by the Rust code,
unwinding across the FFI boundary is well-defined and safe,
and in fact it can be a useful way to handle errors
when working with certain C libraries, such as `libjpeg`.

Thus, the `may_panic` function may be makred `#[unwind(allow)]`,
BatmanAoD marked this conversation as resolved.
Show resolved Hide resolved
which ensures that the unwinding operation will be propagated to the caller
rather than aborting the process.
This annotation will also prevent the compiler
from marking the function as `nounwind`
(which permits the backend toolchain from optimizing based on the assumption
that an unwinding operation cannot escape from a function).

This annotation can only be used with an `unsafe` function,
since the compiler is unable to guarantee
that the caller will be compiled with a compatible stack-unwinding mechanism.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extern "C" functions can be also called from Rust as well, and there is no guarantee that they will ever be called from C at all.

AFAICT, if they were to be called from C, it would be up-to-C to mark them unsafe, and make sure that all proof-obligations are satisfied before calling them.

So I'm not sure what the point of requiring unsafe here is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process was that every piece of code imposing a safety-related contract that cannot be checked by rustc should be marked unsafe. Permitting unwind out of an extern function imposes such a contract: the calling code must be able to catch or propagate the unwind safely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...though I hadn't thought about the inconvenience of statically linking such a function against other Rust code, which would be perfectly safe but would require an unsafe block anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process was that every piece of code imposing a safety-related contract that cannot be checked by rustc should be marked unsafe

I'm not sure I follow your train of thought. What makes:

pub #[unwind(allow)] extern "Rust" fn foo() { ... }

special when compared with

pub fn bar() { ... }

?

Both can panic, and both can be called from other Rust crates. Why does one need to be unsafe, but the other does not ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, only the extern version can be called from a dynamically-loaded shared library. Is it possible to call pub fn bar() in this way? If so, then, since Rust permits custom panic runtimes, it should be possible to have undefined behavior by calling bar by compiling the shared library and the executable itself with different panic runtimes. If that's the case...I'm not sure what the solution would be.


This RFC does, however, provide a tool that will enable users
to provide this guarantee themselves.
We will stabilize the `!#[panic_runtime]` annotation,
which [designates a crate as the provider of the final product's panic runtime]
(less-unwinding).
Additionally, the standard library's current `panic=unwind` runtime crate,
`libpanic_unwind`, which is compatible with native C++ style exceptions,
will be provided under another name (such as `libpanic_native`)
that guarantees the implementation will remain compatible with C++ exceptions
(while the implementation of `libpanic_unwind` itself may change
to no longer maintain that compatibility).

In order for Cargo users to be able
to specify the C++ compatible `panic_runtime` implementation,
a new optional value, runtime, will be added to the `profile.foo.panic` option:

```toml
[profile.dev]
panic = { 'unwind', runtime = 'native' }
BatmanAoD marked this conversation as resolved.
Show resolved Hide resolved
```

`panic.runtime` may only be specified if the panic strategy is `unwind`.
And just as there may only be one strategy
in the dependency graph for a final product,
there may only be one `runtime`.

For now, the only valid `runtime` keys will be:

* `default` - identical to `panic = 'unwind'` with no `runtime` selected.
This will use Rust's default unwinding runtime,
unless another crate in the final product's dependency graph
specifies `panic = 'abort'`,
in which case the `abort` strategy will take precedence as usual.
which is not guaranteed to be compatible with native exceptions.
* `native` - as discussed above, this will preserve the behavior of the current
implementation of `libpanic_unwind`.
* `crate` - indicates that a non-`std` crate will use `#![panic_runtime]` to
provide the runtime.
* `self` - indicates that this crate itself provides `#![panic_runtime]`.

If the `native` or `crate` key is specified
anywhere in a final product's dependency graph,
no crate in that dependency graph may specify the `panic = abort` strategy;
this mismatch will cause the build to fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lacks motivation. abort is strictly more compatible than any implementation of unwinding. abort also matches the intended semantics of unwinding in Rust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about your "intended semantics of unwinding" comment has led me to rethink the motivation of the whole RFC.

In another Internals thread, I wrote,

I think Rust should be able to interoperate with existing libraries that use standard non-local control flow constructs...

So, to me, the fundamental semantic mismatch is that C++ treats stack-unwinding as a control-flow construct, while Rust treats it as a way of handling bugs. I think Rust's philosophy on this is better, but I stand by the implied statement in my quote above that C++'s view is "standard". (In fact, I believe I've read a Ruby book that explicitly explains exceptions in terms of "nonlocal control flow" rather than in terms of error handling.)

So, to me, the fundamental questions that must be answered are:

  • Should Rust have a means of interfacing with foreign APIs that use stack-unwinding as a means of (semantically) non-exceptional control flow?
  • If so, how?

I would be okay with putting this RFC PR on hold in order to go completely back to the drawing board on these questions, but I really feel like I need more help than I'm getting currently; I don't know nearly enough about how stack unwinding is actually implemented (and some of the other issues you've pointed in the RFC out reveal this ignorance), and unfortunately I'm not sure I can commit the time to learning these details in any sort of timely manner. (At my current job nearly all our code runs on the JVM, so my work on this PR is in no way relevant to my "real" work.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do go back to the drawing board, here are two thoughts I would like to take the time to consider in more depth:

  • In multi-threaded code, panic is actually semantically more like throw than in single-threaded code, since by default it only terminates one thread, and the "exception" information (i.e. the Error returned from join) may be useful to the parent thread. Perhaps it would be good to extend this model, permitting native exceptions to be caught by join but not by catch_panic (i.e. effectively limiting FFI-stack-unwinding to child threads).
  • The only other non-local control flow construct I can think of in Rust that permits returning to a prior stack frame is await. Maybe the most idiomatic (albeit nonstandard compared to existing languages) way to deal with exceptions-as-control-flow would be to model FFI code that unwinds as something akin to a generator that may yield an error rather than a valid value. (I realize this idea is at best half-baked.)


The function annotation `#[panic(allowed)]` will only be permitted in crates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean #[unwind(allowed)]? The #[panic] attribute hasn't been explained so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. My mistake.

that specify a non-default `panic.runtime`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't #[panic(allowed)] be allowed on crates that use panic = abort ? AFAICT that would mean that no panics do ever propagate, which would be correct for #[panic(allowed)] functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a change in semantics from the existing unsafe attribute, but I think #[panic(allowed)] should only be used where unwinding is required for correct behavior (as is the case with libjpeg).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is #[panic(allowed)] different from #[unwind(allowed)] ? I think I maybe confusing what the point of this attribute is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb mistake. I meant unwind(allowed). I also made that mistake once in the RFC text itself, so I've corrected that.


Crates that do not use the `profile.foo.panic` option at all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean [profile] of dependencies is used? AFAIK that's not currently the case (all profiles of all crates are ignored, except profile of the top-level target that overrides everything).

Also when Cargo gets custom profiles, it'd be impossible for a dependency to cooperate with users' custom profiles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure; from my reading of RFC 1513, I thought there was already some interdependence between different crates' profile sections for a single build.

will remain compatible with any `profile.foo.panic` configuration
used to generate the final product.

For non-Cargo users, equivalent `rustc` flags will be provided
(which will be how Cargo itself implements the option).

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Unwinding for functions with the `#[unwind(allowed)]` annotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "functions" include declarations or is it only applicable to definitions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only intended it to be applicable to definitions. If the RFC is modified to permit it on declarations, my knee-jerk reaction would be to somehow make it required on declarations (i.e. it would be an error if the definition uses unwind(allowed) but the declaration doesn't repeat the annotation). In any case, I think it would need to be an error to specify it on a declaration if the corresponding definition doesn't specify it (I don't see how the implementation could make the abort behavior dependent on the declaration).

is performed as if the function were not marked `extern`.
This annotation is not permitted on functions not marked `extern`.
It has no observable effect unless the marked function `panic`s
(e.g. it has no observable effect
when a function returns normally or enters an infinite loop).
The LLVM IR for such functions must not be marked `nounwind`.
BatmanAoD marked this conversation as resolved.
Show resolved Hide resolved

The compiler will have a new stable flag, `-C panic.runtime`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the first flag with a . in it.

which will be required to enable the `#[unwind(allowed)]` annotation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does one need to pass a panic-runtime for #[unwind(allowed)] extern "Rust" fn foo(...) -> ... { ... } ? AFAICT this function can only be called from other Rust code, and that code must always have a compatible panic-runtime ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statically-linked Rust code must have a compatible panic-runtime; but AFAIK we can't provide any guarantees about whether dynamically-linked code shares the same runtime (especially with this proposal, which makes it easy to use the new native runtime).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes meaning of #[unwind(allowed)] more like #[unwind(strictly-required)].

To me "allowed" implies something is optional — it may be used, but doesn't have to. But if you start requiring a very specific global setting just to use this attr, then it's more than "allowed". It requires a runtime.

Still, I'd prefer to avoid using an attribute in a dependency that require a specific setting for the whole binary or may cause a compilation failure.

Could it be more graceful? e.g.

  • The attribute can always be used anywhere
  • If the binary has a compatible unwinding runtime, then it allows unwinding
  • If the binary isn't built with a compatible runtime, emit a warning, and ignore the attribute (causing an abort instead of unwinding)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does this work today? IIUC, one can already provide custom panic run-times, and use those to compile libraries and link them, both statically and dynamically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kornelski The whole problem is that, for dynamically-linkable functions, the compiler can't know whether the caller will have a "compatible" runtime, i.e., it can't know what "compatible runtime" even means. So, from my perspective, the purpose of the unwind(allowed) attribute is to let the user say "trust me, the runtime I've selected is compatible with the runtime of any callers of this code." To make this assertion without actually selecting a runtime seems like a logic error from my perspective.

With that said, it does seem like it might be useful to have utility crates that can be used with binaries that use either unwind.runtime = "native" or unwind.runtime = "crate". Maybe instead of crate, there should be an option that just means "non-default" (compatible with self and native, but not with default).

@gnzlbg Yes, RFC 1513 provides this capability, but it explains that the dependency graph of a binary can only have one panic runtime.

as explained above, the flag will specify the expected source
of the panic runtime to be used in the final (binary) product.
If the source is `default`,
the `libpanic_unwind` crate will provide the runtime,
and the `#[unwind(allowed)]` annotation will not be permitted in that crate.

Different values of `panic.runtime` will not be permitted for different crates
in a single dependency graph,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if two objects are not part of the same dependency graph, but, e.g., are dynamically linked ? E.g. some Rust binary is dynamically linked against a Rust dynamic library that exposes a #[unwind(allowed)] extern "Rust" fn foo() {} function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, this is entirely out of our control, as I mentioned above. (The decision to link or not is essentially in the hands of the target platforms binary loader, right?) That's part of why the annotation needs unsafe.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "right" answer here, I think, is to define that the runtime is part of the ABI of the function. Allowing the function declaration to define what panic runtime it's using would solve the dynamic linking problem safely, at the cost of forcing us to stabilize the details of the ABI. For panic.runtime = "native", those details are already forced onto us anyways, but I'm not sure we want to force ourselves to take the step for future runtimes as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcranmer I may be misunderstanding you, but individual functions within the same shared library can't have different ABIs, can they? What would it mean for the runtime to be "part of the ABI of the function"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but individual functions within the same shared library can't have different ABIs, can they?

Of course, extern "C", extern "fastcall", extern "Rust, etc.

What would it mean for the runtime to be "part of the ABI of the function"?

Exactly that. In the same way one has to declare a function extern "C" when calling a function following that ABI, and using some other extern "X" when that function uses a different ABI, one would need to specify what the panic ABI of a function that can panic is.

That's what #[unwind(Rust)] did. It specified that this function can panic "with the Rust ABI". Since there can only be one panic implementation for all dynamically and statically linked Rust libraries, #[unwind(Rust)] meant "whatever ABI that has".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can system linkers actually detect ABI mismatches? How is this done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can system linkers actually detect ABI mismatches?

No. This is not a new problem, if you declare a function as extern "C", and that function happens to have a different ABI, you often don't get linker errors either.

with the exception of `self`, which may only be used once
in a dependency graph,
and is only compatible with crates using the `crate` value.
Because the `panic` runtime is lazily selected and linked,
crates that do not specify a value for `panic.strategy`
are compatible with crates using any of the four values.
Crates that explicitly specify the `default` runtime, however,
Copy link
Member

@nagisa nagisa Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially makes such crates incompatible with libstd/libcore etc, which come precompiled and use the default panic strategy AFAIR. Using such crates realistically would require user to compile their own libstd or libcore – or some other mechanism for distribution will need to happen (such as metadata-only-rlibs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 1513 says:

The standard library will need a runtime and will be lazily linked to a crate which has #![panic_runtime].

I thought the "lazily" was intended to imply that the standard library could be used with any panic runtime; am I misunderstanding the RFC? I intended the wording of my RFC to use the same "laziness" strategy to ensure compatibility between all unwind crates that don't specify conflicting strategies.

are not compatible with crates using other runtimes.

In order to verify that all crates in a dependency graph
are compatible in this way,
new metadata will need to be added to the `rlib` format.
Copy link
Member

@nagisa nagisa Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, like the concept of metadata, is a compiler implementation detail. Rather it is sufficient for a normative document to only specify what is checked, what is allowed and what is the expected outcomes in all sorts of situations are – just like you did in the section above.

Holistically following the RFC process (which we usually don’t, but that is not the point), this section may force our hand in implementing something in a way that’s not necessarily the easiest/best/etc approach.

I recommend removing this paragraph entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; I was trying to follow the example set by RFC 1513, which states,

This codegen option will default to unwind if not specified (what happens
today), and the value will be encoded into the crate metadata. This option is
planned with extensibility in mind to future panic strategies if we ever
implement some (return-based unwinding is at least one other possible option).

Would rephrasing to refer to "crate metadata" instead of "the rlib format" specifically be preferable to removing it?

It may be possible to combine this metadata with the `panic` strategy metadata.

# Drawbacks
[drawbacks]: #drawbacks

* This change involves several elements of the toolchain,
from Cargo all the way to codegen and linking.
It also involves the simultaneous stabilization of multiple features.
Although it may be possible to stabilize some of these features
independently of the others,
the proposed changes are still somewhat complex,
even though implementation should be fairly simple.
* The rules regarding the interaction between
`-C panic`, `-C panic.strategy`, and `#[unwind(allowed)]`
are somewhat complex.
* Even with custom `panic` runtimes,
users may still inadvertently cause undefined behavior
by trying to link shared libraries that use different unwind runtimes.
For instance, there is no easy way to know whether a C shared library
on a non-Windows system
was compiled with the `-fexceptions` GCC or LLVM flag;
with this flag, such a library would be unwind-runtime-compatible
with a Rust shared library compiled with `-C panic.strategy=native`,
but without this flag, any attempt to unwind through the C stack frames
(regardless of the runtime used)
would be undefined behavior.
There is no way for Rust to guard against this scenario.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

This version of this RFC replaces a prior version
that suggested a much more minimal change.
Specifically, it suggested introduing a different function annotation,
`#[unwind(Rust)]`, which would simply ensure that the marked function
would not be marked `noexcept` and would not `abort` on `panic`.
Because the current implementation of `libpanic_unwind`
is compatible with native (C++ style) exceptions,
no further guarantees were made
regarding the behavior of the unwinding operation itself.

Another possibility would be to add the `panic.runtime` options
as new `strategy` values for the `-C panic=foo` flag.
This may be simpler to teach and to implement,
but crates using `-C panic=native`
would be incompatible with all existing crates.

The current proposal ensures
that Rust provides full control over unwinding behavior
for maximum compatibility with other languages' runtimes
and with existing crates.
The interaction between the `#[unwind(allowed)]` annotation
and the `panic.strategy` option
provides the maximum safety that the toolchain can provide
without limiting Rust's ability to interoperate
with shared libraries in other languages
(particularly `libpng` and `libjpeg` as mentioned
[above](#motivation)).

# Prior art
[prior-art]: #prior-art

[RFC #1513](less-unwinding) introduced
the `#![panic_runtime]` attributes, the `-C panic` compiler flag, and
the corresponding `profile.foo.panic` Cargo option.

The `#[unwind(allowed)]` annotation already exists as an unstable feature,
though the current implementation does not impose any requirements
on the `panic` implementation.

As mentioned above, GCC and LLVM provide an `-fexceptions` flag
that makes the C++ exception mechanism interoperable with C stackframes.
The C standard does not refer to exceptions at all, however,
and the C++ standard does not make any guarantees
regarding the behavior of exceptions that escape from C++ stack frames
into stack frames written in another language.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

* Should the `panic.runtime` option and its permitted values be renamed?
* What is the minimal subset of features listed here that must be released
Copy link
Contributor

@kornelski kornelski Jul 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only need #[unwind(allow)] (or any variant of it) that:

  • stops adding nounwind LLVM attribute to the function
  • stops adding code that deliberately causes abort on unwinding
  • that's it, nothing else.

For the libjpeg case, C++ compatibility and mixing with other unwinding mechanisms are not needed, and not useful.

I'm very worried that this RFC even mentions C++, because proper C++ support would make this is a major new feature, a big complication, and has no relevance for the immediate problem of libjpeg compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kornelski That's already provided by #[unwind(allowed)] in Nightly. I believe the reason it can't be stabilized yet is that the default panic runtime is not guaranteed to be stable, and #[unwind(allowed)] would (as explained above) expose this runtime code that may be compiled with a different runtime.

Regarding C++, unwinding in C is C++ exception-propagating. That's why it requires the -fexceptions flag. C libraries compiled with GCC or LLVM without that flag will not properly support unwinding.

prior to stabilizing the FFI abort-on-`panic` logic
in order to prevent breaking crates that link with `libpng` and `libjpeg`?
* Would multiple `panic` runtimes within a single process space
be a potentially useful feature in the future?
If so, is this RFC forward-compatible with possible approaches
for providing this feature?
* Will C++ code be able to interact with `panic.runtime=native`
by catching the native exception?
What operations (`Drop`ing, re-throwing, introspecting...) would be safe
for the C++ runtime to perform?

# Future possibilities
[future-possibilities]: #future-possibilities

If it is safe for C++ code to interact with `panic` objects from Rust,
it may be useful to create a C header declaring an interface to do so.
Such a header would not necessarily be provided by the Rust team
or as part of the Rust toolchain distribution,
but the author of such a header may need assistance
from the author(s) of the native-unwinding runtime crate.