-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for structs with unspecified layouts. #79
Changes from all commits
3b4b96c
f303e45
c792a57
cf26f37
b883eca
262e32b
c67994a
d79b258
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
- Start Date: 2014-05-17 | ||
- RFC PR #: | ||
- Rust Issue #: | ||
|
||
# Summary | ||
|
||
Leave structs with unspecified layout by default like enums, for | ||
optimisation purposes. Use something like `#[repr(C)]` to expose C | ||
compatible layout. | ||
|
||
# Motivation | ||
|
||
The members of a struct are always laid in memory in the order in | ||
which they were specified, e.g. | ||
|
||
```rust | ||
struct A { | ||
x: u8, | ||
y: u64, | ||
z: i8, | ||
w: i64, | ||
} | ||
``` | ||
|
||
will put the `u8` first in memory, then the `u64`, the `i8` and lastly | ||
the `i64`. Due to the alignment requirements of various types padding | ||
is often required to ensure the members start at an appropriately | ||
aligned byte. Hence the above struct is not `1 + 8 + 1 + 8 == 18` | ||
bytes, but rather `1 + 7 + 8 + 1 + 7 + 8 == 32` bytes, since it is | ||
laid out like | ||
|
||
```rust | ||
#[packed] // no automatically inserted padding | ||
struct AFull { | ||
x: u8, | ||
_padding1: [u8, .. 7], | ||
y: u64, | ||
z: i8, | ||
_padding2: [u8, .. 7], | ||
w: i64 | ||
} | ||
``` | ||
|
||
If the fields were reordered to | ||
|
||
```rust | ||
struct B { | ||
y: u64, | ||
w: i64, | ||
|
||
x: u8, | ||
i: i8 | ||
} | ||
``` | ||
|
||
then the struct is (strictly) only 18 bytes (but the alignment | ||
requirements of `u64` forces it to take up 24). | ||
|
||
Having an undefined layout does allow for possible security | ||
improvements, like randomising struct fields, but this can trivially | ||
be done with a syntax extension that can be attached to a struct to | ||
reorder the fields in the AST itself. That said, there may be benefits | ||
from being able to randomise all structs in a program | ||
automatically/for testing, effectively fuzzing code (especially | ||
`unsafe` code). | ||
|
||
Notably, Rust's `enum`s already have undefined layout, and provide the | ||
`#[repr]` attribute to control layout more precisely (specifically, | ||
selecting the size of the discriminant). | ||
|
||
# Drawbacks | ||
|
||
Forgetting to add `#[repr(C)]` for a struct intended for FFI use can | ||
cause surprising bugs and crashes. There is already a lint for FFI use | ||
of `enum`s without a `#[repr(...)]` attribute, so this can be extended | ||
to include structs. | ||
|
||
Having an unspecified (or otherwise non-C-compatible) layout by | ||
default makes interfacing with C slightly harder. A particularly bad | ||
case is passing to C a struct from an upstream library that doesn't | ||
have a `repr(C)` attribute. This situation seems relatively similar to | ||
one where an upstream library type is missing an implementation of a | ||
core trait e.g. `Hash` if one wishes to use it as a hashmap key. | ||
|
||
It is slightly better if structs had a specified-but-C-incompatible | ||
layout, *and* one has control over the C interface, because then one | ||
can manually arrange the fields in the C definition to match the Rust | ||
order. | ||
|
||
That said, this scenario requires: | ||
|
||
- Needing to pass a Rust struct into C/FFI code, where that FFI code | ||
actually needs to use things from the struct, rather than just pass | ||
it through, e.g., back into a Rust callback. | ||
- The Rust struct is defined upstream & out of your control, and not | ||
intended for use with C code. | ||
- The C/FFI code is designed by someone other than that vendor, or | ||
otherwise not designed for use with the Rust struct (or else it is a | ||
bug in the vendor's library that the Rust struct can't be sanely | ||
passed to C). | ||
|
||
|
||
# Detailed design | ||
|
||
A struct declaration like | ||
|
||
```rust | ||
struct Foo { | ||
x: T, | ||
y: U, | ||
... | ||
} | ||
``` | ||
|
||
has no fixed layout, that is, a compiler can choose whichever order of | ||
fields it prefers. | ||
|
||
A fixed layout can be selected with the `#[repr]` attribute | ||
|
||
```rust | ||
#[repr(C)] | ||
struct Foo { | ||
x: T, | ||
y: U, | ||
... | ||
} | ||
``` | ||
|
||
This will force a struct to be laid out like the equivalent definition | ||
in C. | ||
|
||
There would be a lint for the use of non-`repr(C)` structs in related | ||
FFI definitions, for example: | ||
|
||
```rust | ||
struct UnspecifiedLayout { | ||
// ... | ||
} | ||
|
||
#[repr(C)] | ||
struct CLayout { | ||
// ... | ||
} | ||
|
||
|
||
extern { | ||
fn foo(x: UnspecifiedLayout); // warning: use of non-FFI-safe struct in extern declaration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about if it was a pointer to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would think that if the layout doesn't matter, then it should be typed as |
||
|
||
fn bar(x: CLayout); // no warning | ||
} | ||
|
||
extern "C" fn foo(x: UnspecifiedLayout) { } // warning: use of non-FFI-safe struct in function with C abi. | ||
``` | ||
|
||
|
||
# Alternatives | ||
|
||
- Have non-C layouts opt-in, via `#[repr(smallest)]` and | ||
`#[repr(random)]` (or similar). | ||
- Have layout defined, but not declaration order (like Java(?)), for | ||
example, from largest field to smallest, so `u8` fields get placed | ||
last, and `[u8, .. 1000000]` fields get placed first. The `#[repr]` | ||
attributes would still allow for selecting declaration-order layout. | ||
|
||
# Unresolved questions | ||
|
||
- How does this interact with binary compatibility of dynamic libraries? | ||
- How does this interact with DST, where some fields have to be at the | ||
end of a struct? (Just always lay-out unsized fields last? | ||
(i.e. after monomorphisation if a field was originally marked | ||
`Sized?` then it needs to be last).) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole idea is a good one; +1. Bikeshed: I'm not sure if
repr(C)
is the right notation - you might want a defined layout for other reasons - mayberepr(fixed)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to bikeshed, maybe
repr(declaration)
orrepr(as_written)
would be more descriptive?I think
repr(C)
is actually OK, because it's specifying that C layout rules should be used (i.e. declaration order), although it could easily be interpreted as "struct for C FFI". There are other possible "fixed" layouts (e.g. sorting by field size, or even alphabetically).In any case, I don't particularly care about the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought, but then it occurred to me that the only people who will insist on such control over representation would be coming from a C background anyway. I do like the analogy to
repr(C)
on enums, and it would be a shame to haverepr(C)
andrepr(fixed)
be aliases of each other.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
#[repr(C)]
is the only thing that makes sense for producing a C-compatible struct. If#[repr(fixed)]
would produce something other than#[repr(C)]
then it might be worth having, but if it produces the exact same layout as#[repr(C)]
then it seems unnecessarily redundant.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.
repr(C)
sounds like the best bet.