Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe Extern Blocks #3439

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions text/0000-unsafe_extern.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@

- Feature Name: `unsafe_extern`
- Start Date: 2023-05-23
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

In Edition 2024 it is `unsafe` to declare an `extern` function or static, but external functions and statics *can* be safe to use after the initial declaration.

# Motivation
[motivation]: #motivation

Simply declaring extern items, even without ever using them, can cause Undefined Behavior.
When performing cross-language compilation, attributes on one function declaration can flow to the foreign declaration elsewhere within LLVM and cause a miscompilation.
In Rust we consider all sources of Undefined Behavior to be `unsafe`, and so we must make declaring extern blocks be `unsafe`.
The up-side to this change is that in the new style it will be possible to declare an extern fn that's safe to call after the initial unsafe declaration.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Rust can utilize functions and statics from foreign code that are provided during linking, though it is `unsafe` to do so.

An `extern` block can be placed anywhere a function declaration could appear (generally at the top level of a module), and must always be prefixed with the keyword `unsafe`.

Within the block you can declare the exernal functions and statics that you want to make visible within the current scope.
Each function declaration gives only the function's signature, similar to how methods for traits are declared.
If calling a foreign function is `unsafe` then you must declare the function as `unsafe fn`, otherwise you can declare it as a normal `fn`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it maybe make sense to always mark the declaration as unsafe or safe (i.e. you can never omit it) similar to how raw pointers always need const or mut? I feel like making safe the default here could cause people to accidentally not mark a function as unsafe when they should. In particular the transition from the current extern blocks to the new ones in a new edition would be error prone if people simply mark the extern blocks as unsafe and don't think about the fact that they also need to mark all / most of the functions as unsafe now too, simply because it suddenly starts compiling again and they move on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fix can automatically add "unsafe" to each fn listed in an extern block, but yes that's probably too error prone.

My initial thought was to default to safe so that the declarations would look as close to "normal" declarations elsewhere, but having a contextual "safe" keyword seems reasonable as well.

Each static declaration gives the name and type, but no initial value.

* If the `unsafe_code` lint is denied or forbidden at a particular scope it will cause the `unsafe extern` block to be a compilation error within that scope.
* Declaring an incorrect external item signature can cause Undefined Behavior during compilation, even if Rust never accesses the item.

```rust
unsafe extern {
// sqrt (from libm) can be called with any `f64`
pub fn sqrt(x: f64) -> f64;

// strlen (from libc) requires a valid pointer,
// so we mark it as being an unsafe fn
pub unsafe fn strlen(p: *const c_char) -> usize;

pub static IMPORTANT_BYTES: [u8; 256];

pub static LINES: UnsafeCell<i32>;
}
```

Note: other rules for extern blocks, such as optionally including an ABI, are unchanged from previous editions, so those parts of the guide would remain.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

This adjusts the grammar of the language to *require* the `unsafe` keyword before an `extern` block declaration (currently it's optional and syntatically allowed but semantically rejected).

Replace the *Functions* and *Statics* sections with the following:

### Functions
Functions within external blocks are declared in the same way as other Rust functions, with the exception that they must not have a body and are instead terminated by a semicolon. Patterns are not allowed in parameters, only IDENTIFIER or _ may be used. The function qualifiers `const`, `async`, and `extern` are not allowed. If the function is unsafe to call, then the function must use the `unsafe` qualifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rationale for making extern blocks unsafe by default no longer apply? rust-lang/rust#2628 (implemented by rust-lang/rust#4599)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That issue is so old I'm not even entirely clear what it's referring to, but the declaration alone can be dangerous so that's what needs to be the unsafe part.


If the function signature declared in Rust is incompatible with the function signature as declared in the foreign code it is Undefined Behavior.

Functions within external blocks may be called by Rust code, just like functions defined in Rust. The Rust compiler will automatically use the correct foreign ABI when making the call.

When coerced to a function pointer, a function declared in an extern block has type
```rust
extern "abi" for<'l1, ..., 'lm> fn(A1, ..., An) -> R
```
where `'l1`, ... `'lm` are its lifetime parameters, `A1`, ..., `An` are the declared types of its parameters and `R` is the declared return type.

### Statics
Statics within external blocks are declared in the same way as statics outside of external blocks, except that they do not have an expression initializing their value. It is unsafe to declare a static item in an extern block, whether or not it's mutable, because there is nothing guaranteeing that the bit pattern at the static's memory is valid for the type it is declared with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, we're removing the ability for users to declare (immutable) statics that are unsafe to use. Would there ever be a use-cade for keeping those? E.g. having the ability to write pub unsafe static FOO: u8;, where the user discharges the safety obligations instead of the library author?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one use case where we might still want unsafe statics is when doing stuff that's relatively common in embedded C -- having special global variables that are just a way to get an address from the linker, they aren't real variables (you can't necessarily read/write them):

extern int bss_start;
// linker assigns the address to be right after the end of the memory allocated
// for the bss segment, you can't actually read/write this variable
extern int bss_end;
void _start() { // called by cpu reset, we need to initialize ram before we run main()
    // clear bss
    for(int *p = &bss_start; p != bss_end; p++)
        *p = 0;
    // initialize data
    for(int *p = &data_start, *src = &data_in_flash_start; p != data_end; p++, src++)
        *p = read_flash(src);
    main();
    while(true) {}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those can be emulated via static BSS_START: (); (or a named opaque type, I would have said static BSS_START: !;, but IIRC from discussion on rust-lang/rust#74840 that's not valid).

Is there any situation where reading the value is valid, but it's sometimes UB, which isn't covered by declaring it as static mut?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly related: #2937

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved in the new text, unsafe external statics are kept.


Extern statics can be either immutable or mutable just like statics outside of external blocks. An immutable static must be initialized before any Rust code is executed. It is not enough for the static to be initialized before Rust code reads from it. A mutable extern static is unsafe to access, the same as a Rust mutable static.

# Drawbacks
[drawbacks]: #drawbacks

* It is very unfortunate to have to essentially reverse the status quo.
* Hopefully, allowing people to safely call some foreign functions will make up for the churn caused by this change.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Incorrect extern declarations can cause UB in current Rust, but we have no way to automatically check that all declarations are correct, nor is such a thing likely to be developed. Making the declarations `unsafe` so that programmers are aware of the dangers and can give extern blocks the attention they deserve is the minimum step.

# Prior art
[prior-art]: #prior-art

None we are aware of.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

* Extern declarations are actually *always* unsafe and able to cause UB regardless of edition. This RFC doesn't have a specific answer on how to improve pre-2024 code.

# Future possibilities
[future-possibilities]: #future-possibilities

None are apparent at this time.