Skip to content

Commit a19d291

Browse files
committed
Add documentation
1 parent e90cec6 commit a19d291

File tree

2 files changed

+164
-0
lines changed

2 files changed

+164
-0
lines changed

book/src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,5 @@
2727
- [Generating Bindings to Objective-c](./objc.md)
2828
- [Using Unions](./using-unions.md)
2929
- [Using Bitfields](./using-bitfields.md)
30+
- [Using Flexible Array Members](./using-fam.md)
3031
- [FAQ](./faq.md)

book/src/using-fam.md

+163
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Using C structures with Flexible Array Members
2+
3+
Since time immemorial, C programmers have been using what was called "the struct
4+
hack". This is a technique for packing a fixed-size structure and a
5+
variable-sized tail within the same memory allocation. Typically this looks
6+
like:
7+
8+
```c
9+
struct MyRecord {
10+
time_t timestamp;
11+
unsigned seq;
12+
size_t len;
13+
char payload[0];
14+
};
15+
```
16+
17+
Because this is so useful, it was standardized in C99 as "flexible array
18+
members", using almost identical syntax:
19+
```c
20+
struct MyRecord {
21+
time_t timestamp;
22+
unsigned seq;
23+
size_t len;
24+
char payload[]; // NOTE: empty []
25+
};
26+
```
27+
28+
Bindgen supports these structures in two different ways.
29+
30+
## `__IncompleteArrayField`
31+
32+
By default, bindgen will the corresponding Rust structure:
33+
```rust,ignore
34+
#[repr(C)]
35+
struct MyRecord {
36+
pub timestamp: time_t,
37+
pub seq: ::std::os::raw::c_uint,
38+
pub len: usize,
39+
pub payload: __IncompleteArrayField<::std::os::raw::c_char>,
40+
}
41+
```
42+
43+
The `__IncompleteArrayField` type is zero-sized, so this structure represents
44+
the prefix without any trailing data. In order to access that data, it provides
45+
the `as_slice` unsafe method:
46+
```rust,ignore
47+
// SAFETY: there's at least `len` bytes allocated and initialized after `myrecord`
48+
let payload = unsafe { myrecord.payload.as_slice(myrecord.len) };
49+
```
50+
There's also `as_mut_slice` which does the obvious.
51+
52+
These are `unsafe` simply because it's up to you to provide the right length (in
53+
elements of whatever type `payload` is) as there's no way for Rust or Bindgen to
54+
know. In this example, the length is a very straightforward `len` field in the
55+
structure, but it could be encoded in any number of ways within the structure,
56+
or come from somewhere else entirely.
57+
58+
One big caveat with this technique is that `std::mem::size_of` (or
59+
`size_of_val`) will *only* include the size of the prefix structure. if you're
60+
working out how much storage the whole structure is using, you'll need to add
61+
the suffix yourself.
62+
63+
## Using Dynamically Sized Types
64+
65+
If you invoke bindgen with the `--flexarray-dst` option, it will generate
66+
something not quite like this:
67+
68+
```rust,ignore
69+
#[repr(C)]
70+
struct MyRecord {
71+
pub timestamp: time_t,
72+
pub seq: ::std::os::raw::c_uint,
73+
pub len: usize,
74+
pub payload: [::std::os::raw::c_char],
75+
}
76+
```
77+
Rust has a set of types which are almost exact analogs for these Flexible Array
78+
Member types: the Dynamically Sized Type ("DST"). For example:
79+
80+
This looks almost identical to a normal Rust structure, except that you'll note
81+
the type of the `payload` field is a raw slice `[...]` rather than the usual
82+
reference to slice `&[...]`.
83+
84+
That `payload: [c_char]` is telling Rust that it can't directly know the total
85+
size of this structure - the `payload` field takes an amount of space that's
86+
determined at runtime. This means you can't directly use values of this type,
87+
only references: `&MyRecord`.
88+
89+
In practice, this is very awkward. So instead, bindgen generates:
90+
```rust,ignore
91+
#[repr(C)]
92+
struct MyRecord<FAM: ?Sized = [::std::os::raw::c_char; 0]> {
93+
pub timestamp: time_t,
94+
pub seq: ::std::os::raw::c_uint,
95+
pub len: usize,
96+
pub payload: FAM,
97+
}
98+
```
99+
100+
That is:
101+
1. a type parameter `FAM` which represents the type of the `payload` field,
102+
2. it's `?Sized` meaning it can be unsigned (ie, a DST)
103+
3. it has the default type of `[c_char; 0]` - that is a zero-sized array of characters
104+
105+
This means that referencing plain `MyRecord` will be exactly like `MyRecord`
106+
with `__IncompleteArrayField`: it is a fixed-sized structure which you can
107+
manipulate like a normal Rust value.
108+
109+
But how do you get to the DST part?
110+
111+
Bindgen will also implement a set of helper methods for this:
112+
113+
```rust,ignore
114+
// Static sized variant
115+
impl MyRecord<[::std::os::raw::c_char; 0]> {
116+
pub unsafe fn flex_ref(&self, len: usize) -> &MyRecord<[::std::os::raw::c_char]> { ... }
117+
pub unsafe fn flex_mut_ref(&mut self, len: usize) -> &mut MyRecord<[::std::os::raw::c_char]> { ... }
118+
// And some raw pointer variants
119+
}
120+
```
121+
These will take a sized `MyRecord<[c_char; 0]>` and a length in elements, and
122+
return a reference to a DST `MyRecord<[c_char]>` where the `payload` field is a
123+
fully usable slice of `len` characters.
124+
125+
The magic here is that the reference is a fat pointer, which not only encodes
126+
the address, but also the dynamic size of the final field, just like a reference
127+
to a slice is. This means that you get full bounds checked access to the
128+
`payload` field like any other Rust slice.
129+
130+
It also means that doing `mem::size_of_val(myrecord)` will return the *complete*
131+
size of this structure, including the suffix.
132+
133+
You can go the other way:
134+
```rust,ignore
135+
// Dynamic sized variant
136+
impl MyRecord<[::std::os::raw::c_char]> {
137+
pub fn fixed(&self) -> (&MyRecord<[::std::os::raw::c_char; 0]>, usize) { ... }
138+
pub fn fixed_mut(&mut self) -> (&mut MyRecord<[::std::os::raw::c_char; 0]>, usize) { ... }
139+
pub fn layout(len: usize) -> std::alloc::Layout { ... }
140+
}
141+
```
142+
which takes the DST variant of the structure and returns the sized variant,
143+
along with the number of elements are after it. These are all completely safe
144+
because all the information needed is part of the fat `&self` reference.
145+
146+
The `layout` function takes a length and returns the `Layout` - that is, size
147+
and alignment, so that you can allocate memory for the structure (for example,
148+
using `malloc` so you can pass it to a C function).
149+
150+
Unfortunately the language features needed to support these methods are still unstable:
151+
- [ptr_metadata](https://doc.rust-lang.org/beta/unstable-book/library-features/ptr-metadata.html),
152+
which enables all the fixed<->DST conversions, and
153+
- [layout_for_ptr](https://doc.rust-lang.org/beta/unstable-book/library-features/layout-for-ptr.html),
154+
which allows he `layout` method
155+
156+
As a result, if you don't specify `--rust-target nightly` you'll just get the
157+
bare type definitions, but no real way to use them. It's often convenient to add
158+
the
159+
```bash
160+
--raw-line '#![feature(ptr_metadata,layout_for_ptr)]'
161+
```
162+
option if you're generating Rust as a stand-alone crate. Otherwise you'll need
163+
to add the feature line to your containing crate.

0 commit comments

Comments
 (0)