|
| 1 | +# Using C structures with Flexible Array Members |
| 2 | + |
| 3 | +Since time immemorial, C programmers have been using what was called "the struct |
| 4 | +hack". This is a technique for packing a fixed-size structure and a |
| 5 | +variable-sized tail within the same memory allocation. Typically this looks |
| 6 | +like: |
| 7 | + |
| 8 | +```c |
| 9 | +struct MyRecord { |
| 10 | + time_t timestamp; |
| 11 | + unsigned seq; |
| 12 | + size_t len; |
| 13 | + char payload[0]; |
| 14 | +}; |
| 15 | +``` |
| 16 | + |
| 17 | +Because this is so useful, it was standardized in C99 as "flexible array |
| 18 | +members", using almost identical syntax: |
| 19 | +```c |
| 20 | +struct MyRecord { |
| 21 | + time_t timestamp; |
| 22 | + unsigned seq; |
| 23 | + size_t len; |
| 24 | + char payload[]; // NOTE: empty [] |
| 25 | +}; |
| 26 | +``` |
| 27 | + |
| 28 | +Bindgen supports these structures in two different ways. |
| 29 | + |
| 30 | +## `__IncompleteArrayField` |
| 31 | + |
| 32 | +By default, bindgen will the corresponding Rust structure: |
| 33 | +```rust,ignore |
| 34 | +#[repr(C)] |
| 35 | +struct MyRecord { |
| 36 | + pub timestamp: time_t, |
| 37 | + pub seq: ::std::os::raw::c_uint, |
| 38 | + pub len: usize, |
| 39 | + pub payload: __IncompleteArrayField<::std::os::raw::c_char>, |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +The `__IncompleteArrayField` type is zero-sized, so this structure represents |
| 44 | +the prefix without any trailing data. In order to access that data, it provides |
| 45 | +the `as_slice` unsafe method: |
| 46 | +```rust,ignore |
| 47 | + // SAFETY: there's at least `len` bytes allocated and initialized after `myrecord` |
| 48 | + let payload = unsafe { myrecord.payload.as_slice(myrecord.len) }; |
| 49 | +``` |
| 50 | +There's also `as_mut_slice` which does the obvious. |
| 51 | + |
| 52 | +These are `unsafe` simply because it's up to you to provide the right length (in |
| 53 | +elements of whatever type `payload` is) as there's no way for Rust or Bindgen to |
| 54 | +know. In this example, the length is a very straightforward `len` field in the |
| 55 | +structure, but it could be encoded in any number of ways within the structure, |
| 56 | +or come from somewhere else entirely. |
| 57 | + |
| 58 | +One big caveat with this technique is that `std::mem::size_of` (or |
| 59 | +`size_of_val`) will *only* include the size of the prefix structure. if you're |
| 60 | +working out how much storage the whole structure is using, you'll need to add |
| 61 | +the suffix yourself. |
| 62 | + |
| 63 | +## Using Dynamically Sized Types |
| 64 | + |
| 65 | +If you invoke bindgen with the `--flexarray-dst` option, it will generate |
| 66 | +something not quite like this: |
| 67 | + |
| 68 | +```rust,ignore |
| 69 | +#[repr(C)] |
| 70 | +struct MyRecord { |
| 71 | + pub timestamp: time_t, |
| 72 | + pub seq: ::std::os::raw::c_uint, |
| 73 | + pub len: usize, |
| 74 | + pub payload: [::std::os::raw::c_char], |
| 75 | +} |
| 76 | +``` |
| 77 | +Rust has a set of types which are almost exact analogs for these Flexible Array |
| 78 | +Member types: the Dynamically Sized Type ("DST"). For example: |
| 79 | + |
| 80 | +This looks almost identical to a normal Rust structure, except that you'll note |
| 81 | +the type of the `payload` field is a raw slice `[...]` rather than the usual |
| 82 | +reference to slice `&[...]`. |
| 83 | + |
| 84 | +That `payload: [c_char]` is telling Rust that it can't directly know the total |
| 85 | +size of this structure - the `payload` field takes an amount of space that's |
| 86 | +determined at runtime. This means you can't directly use values of this type, |
| 87 | +only references: `&MyRecord`. |
| 88 | + |
| 89 | +In practice, this is very awkward. So instead, bindgen generates: |
| 90 | +```rust,ignore |
| 91 | +#[repr(C)] |
| 92 | +struct MyRecord<FAM: ?Sized = [::std::os::raw::c_char; 0]> { |
| 93 | + pub timestamp: time_t, |
| 94 | + pub seq: ::std::os::raw::c_uint, |
| 95 | + pub len: usize, |
| 96 | + pub payload: FAM, |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +That is: |
| 101 | +1. a type parameter `FAM` which represents the type of the `payload` field, |
| 102 | +2. it's `?Sized` meaning it can be unsigned (ie, a DST) |
| 103 | +3. it has the default type of `[c_char; 0]` - that is a zero-sized array of characters |
| 104 | + |
| 105 | +This means that referencing plain `MyRecord` will be exactly like `MyRecord` |
| 106 | +with `__IncompleteArrayField`: it is a fixed-sized structure which you can |
| 107 | +manipulate like a normal Rust value. |
| 108 | + |
| 109 | +But how do you get to the DST part? |
| 110 | + |
| 111 | +Bindgen will also implement a set of helper methods for this: |
| 112 | + |
| 113 | +```rust,ignore |
| 114 | +// Static sized variant |
| 115 | +impl MyRecord<[::std::os::raw::c_char; 0]> { |
| 116 | + pub unsafe fn flex_ref(&self, len: usize) -> &MyRecord<[::std::os::raw::c_char]> { ... } |
| 117 | + pub unsafe fn flex_mut_ref(&mut self, len: usize) -> &mut MyRecord<[::std::os::raw::c_char]> { ... } |
| 118 | + // And some raw pointer variants |
| 119 | +} |
| 120 | +``` |
| 121 | +These will take a sized `MyRecord<[c_char; 0]>` and a length in elements, and |
| 122 | +return a reference to a DST `MyRecord<[c_char]>` where the `payload` field is a |
| 123 | +fully usable slice of `len` characters. |
| 124 | + |
| 125 | +The magic here is that the reference is a fat pointer, which not only encodes |
| 126 | +the address, but also the dynamic size of the final field, just like a reference |
| 127 | +to a slice is. This means that you get full bounds checked access to the |
| 128 | +`payload` field like any other Rust slice. |
| 129 | + |
| 130 | +It also means that doing `mem::size_of_val(myrecord)` will return the *complete* |
| 131 | +size of this structure, including the suffix. |
| 132 | + |
| 133 | +You can go the other way: |
| 134 | +```rust,ignore |
| 135 | +// Dynamic sized variant |
| 136 | +impl MyRecord<[::std::os::raw::c_char]> { |
| 137 | + pub fn fixed(&self) -> (&MyRecord<[::std::os::raw::c_char; 0]>, usize) { ... } |
| 138 | + pub fn fixed_mut(&mut self) -> (&mut MyRecord<[::std::os::raw::c_char; 0]>, usize) { ... } |
| 139 | + pub fn layout(len: usize) -> std::alloc::Layout { ... } |
| 140 | +} |
| 141 | +``` |
| 142 | +which takes the DST variant of the structure and returns the sized variant, |
| 143 | +along with the number of elements are after it. These are all completely safe |
| 144 | +because all the information needed is part of the fat `&self` reference. |
| 145 | + |
| 146 | +The `layout` function takes a length and returns the `Layout` - that is, size |
| 147 | +and alignment, so that you can allocate memory for the structure (for example, |
| 148 | +using `malloc` so you can pass it to a C function). |
| 149 | + |
| 150 | +Unfortunately the language features needed to support these methods are still unstable: |
| 151 | +- [ptr_metadata](https://doc.rust-lang.org/beta/unstable-book/library-features/ptr-metadata.html), |
| 152 | + which enables all the fixed<->DST conversions, and |
| 153 | +- [layout_for_ptr](https://doc.rust-lang.org/beta/unstable-book/library-features/layout-for-ptr.html), |
| 154 | + which allows he `layout` method |
| 155 | + |
| 156 | +As a result, if you don't specify `--rust-target nightly` you'll just get the |
| 157 | +bare type definitions, but no real way to use them. It's often convenient to add |
| 158 | +the |
| 159 | +```bash |
| 160 | +--raw-line '#![feature(ptr_metadata,layout_for_ptr)]' |
| 161 | +``` |
| 162 | +option if you're generating Rust as a stand-alone crate. Otherwise you'll need |
| 163 | +to add the feature line to your containing crate. |
0 commit comments