examples/export: Replace unsound `to_padded_byte_vector()` implementation with `bytemuck`. #444

kpreid · 2025-01-12T22:07:34Z

The function to_padded_byte_vector() is unsound because:

It accepts an arbitrary Vec<T> without checking that T contains no padding, which is UB to read in any way including by reinterpreting as u8s.
It produces a Vec which thinks it has a different alignment than the allocation was actually created with.

To fix these problems, this change:

Uses bytemuck to check the no-padding condition.
Creates a new Vec instead of trying to reuse the existing one. (Conditional reuse would be possible, but more complex.)

An alternative to using bytemuck would be to make to_padded_byte_vector() an unsafe fn (or to accept Vertex only instead of T). However, I think it is valuable to demonstrate how to do this conversion using safe tools, to encourage people to use safe code instead of writing unsafe code without fully understanding the requirements.

kpreid · 2025-01-12T23:11:54Z

The CI failure seems to be in an unrelated import test.

alteous · 2025-01-13T11:28:49Z

I'm surprised this is unsound but I trust your intuition. I'd like to understand better:

It accepts an arbitrary Vec<T> without checking that T contains no padding, which is UB to read in any way including by reinterpreting as u8s.

Why is this undefined behaviour? Padding bytes between Ts would be acceptable in the output.

It produces a Vec which thinks it has a different alignment than the allocation was actually created with.

This is an interesting point. In C, I am used to expecting malloc to produce a region that is suitably aligned for any type. Rust's Allocator trait docs say the "block must be allocated with the same alignment as layout.align()", which I think is the problem you're referring to because align_of::<u8>() ≤ align_of::<T>() in general and this could be a problem for reallocation and deallocation.

…tion with `bytemuck`. The function `to_padded_byte_vector()` is unsound because: * It accepts an arbitrary `Vec<T>` without checking that `T` contains no padding, which is UB to read in any way including by reinterpreting as `u8`s. * It produces a `Vec` which thinks it has a different alignment than the allocation was actually created with. To fix these problems, this change: * Uses `bytemuck` to check the no-padding condition. * Creates a new `Vec` instead of trying to reuse the existing one. (Conditional reuse would be possible, but more complex.) An alternative to `bytemuck` would be to make `to_padded_byte_vector()` an `unsafe fn` (or to accept `Vertex` only instead of `T`). However, I think it is valuable to demonstrate how to do this conversion using safe tools, to encourage people to use safe code instead of writing unsafe code without fully understanding the requirements.

kpreid · 2025-01-13T18:19:55Z

Why is this undefined behaviour?

Excerpted from https://doc.rust-lang.org/reference/behavior-considered-undefined.html:

Rust code is incorrect if it exhibits any of the behaviors in the following list.

Producing an invalid value.

... Whether a value is valid depends on the type:

An integer (i*/u*), floating point value (f*), or raw pointer must be initialized, i.e., must not be obtained from uninitialized memory.

Reasons why the design is such that this is undefined behavior include that the compiler is permitted to do things like

transform a given set of reads of a place, that is not being mutated between any of them, into a different number and timing of reads, and
not actually store the bytes corresponding to padding (e.g. if the struct is entirely stored in registers, or was copied field-by-field to a different location).

Both of these transformations are critical to being able to produce efficient code, and just these two applied together could mean that a value that was read from uninitialized memory appears to have an unstable, changing value, because it ended up being read multiple times from different pieces of uninitialized memory.

Of course, it’s unlikely that any of these specific bad effects would apply to a simple bulk copy as this code is doing, but there isn’t and can’t be an exception in the rules of UB for “that would be silly”. Rigor is necessary to be able to have sound, highly optimizing compilers.

It’s also permitted for the hardware to trap on reads of uninitialized memory (though in most cases today, this only happens at page granularity).

Padding bytes between Ts would be acceptable in the output.

Separately from UB considerations, I disagree that they are acceptable. They may not affect the behavior of a glTF loader, but there are many other reasons to not to write them:

Copying the contents of uninitialized memory to external destinations can lead to leaking secrets from previous uses of the same memory (as in the famous bug Heartbleed).
In use cases like asset pre-processing, it’s useful for the process to be deterministic, i.e. the output is the same if the input is the same. Padding bytes would not be reliably the same.
Including arbitrary bytes makes the data less compressible.

It’s fine to include guaranteed zero bytes, of course, but not uninitialized bytes.

I am used to expecting malloc to produce a region that is suitably aligned for any type.

Indeed that is a difference, but the key difference here is that Rust deallocation has to be given the layout information and the allocator may rely on it being accurate, whereas free() is obligated to be able to work given only a pointer.

Rust's Allocator trait docs say the "block must be allocated with the same alignment as layout.align()", which I think is the problem you're referring to because align_of::<u8>() ≤ align_of::<T>() in general and this could be a problem for reallocation and deallocation.

This is true, but not a good way to think about writing unsafe code. When you call an unsafe function, you must obey all of the safety requirements from that specific function’s documentation. Those in Vec::from_raw_parts() include:

T needs to have the same alignment as what ptr was allocated with.

This rule was violated, so no further analysis is required to conclude that the call is unsound.

kpreid force-pushed the unsafe branch from fb04347 to 24bcd54 Compare January 13, 2025 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/export: Replace unsound `to_padded_byte_vector()` implementation with `bytemuck`. #444

examples/export: Replace unsound `to_padded_byte_vector()` implementation with `bytemuck`. #444

kpreid commented Jan 12, 2025

kpreid commented Jan 12, 2025

alteous commented Jan 13, 2025

kpreid commented Jan 13, 2025 •

edited

Loading

examples/export: Replace unsound to_padded_byte_vector() implementation with bytemuck. #444

Are you sure you want to change the base?

examples/export: Replace unsound to_padded_byte_vector() implementation with bytemuck. #444

Conversation

kpreid commented Jan 12, 2025

kpreid commented Jan 12, 2025

alteous commented Jan 13, 2025

kpreid commented Jan 13, 2025 • edited Loading

examples/export: Replace unsound `to_padded_byte_vector()` implementation with `bytemuck`. #444

examples/export: Replace unsound `to_padded_byte_vector()` implementation with `bytemuck`. #444

kpreid commented Jan 13, 2025 •

edited

Loading