Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TryFromBytes - conditional conversion analogous to FromBytes #5

Closed
23 of 26 tasks
Tracked by #671
joshlf opened this issue Sep 19, 2022 · 8 comments
Closed
23 of 26 tasks
Tracked by #671
Labels
compatibility-nonbreaking Changes that are (likely to be) non-breaking

Comments

@joshlf
Copy link
Member

joshlf commented Sep 19, 2022

Co-authored with @jswrenn.

Overview

Add a TryFromBytes trait, which supports byte-to-type conversions for non-FromBytes types by performing runtime validation. Add a custom derive which generates this validation code automatically.

Many thanks to @kupiakos and @djkoloski for providing invaluable feedback and input on this design.

Progress

Motivation

Many use cases involve types whose layout is well-defined, but which cannot implement FromBytes because there exist bit patterns which are invalid (either they are unsound in terms of language semantics or they are unsafe in the sense of violating a library invariant).

Consider, for example, parsing an RPC message format. It would be desirable for performance reasons to be able to read a message into local memory, validate its structure, and if validation succeeds, treat that memory as containing a parsed message rather than needing to copy the message in order to transform it into a native Rust representation.

Here's a simple, hypothetical example of an RPC to request log messages from a process:

/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[repr(C)]
struct RequestLogsArgs {
    max_logs: u64,
    since: LogTime,
    level: LogLevel,
}

/// Log time, measured as time on the process's monotonic clock.
#[repr(C)]
struct LogTime {
    secs: u64,
    // Invariant: In the range [0, 10^9)
    nsecs: u32,
}

/// Level of log messages requested from `RequestLogs`.
#[repr(u8)]
enum LogLevel {
    Trace,
    Debug,
    Info,
    Warn,
    Error,
}

None of these types can be FromBytes. For LogLevel, only the u8 values 0 through 4 correspond to enum variants, and constructing a LogLevel from any other u8 would be unsound. For LogTime, any sequence of the appropriate number of bytes would constitute a valid instance of LogTime from Rust's perspective - it would not cause unsoundness - but some such sequences would violate the invariant that the nsecs field is in the range [0, 10^9).

While these types can't be FromBytes, we'd still like to be able to conditionally reinterpret a sequence of bytes as a RequestLogsArgs - it's just that we need to perform runtime validation first. Ideally, we'd be able to write code like:

/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[derive(TryFromBytes)]
#[repr(C)]
struct RequestLogsArgs {
    max_stats: u64,
    since: LogTime,
    level: LogLevel,
}

/// Log time, measured as time on the process's monotonic clock.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "is_valid")]
#[repr(C)]
struct LogTime {
    secs: u64,
    // Invariant: In the range [0, 10^9)
    nsecs: u32,
}

impl LogTime {
    fn is_valid(&self) -> bool {
        self.nsecs < 1_000_000_000
    }
}

/// Level of log messages requested from `RequestLogs`.
#[derive(TryFromBytes)]
#[repr(u8)]
enum LogLevel {
    Trace,
    Debug,
    Info,
    Warn,
    Error,
}

The TryFromBytes trait - the subject of this design - provides the ability to fallibly convert a byte sequence to a type, performing validation at runtime. At a minimum, the validation code simply ensures soundness - for example, in the case of LogLevel, validating that byte values are in the range [0, 4]. The custom derive also supports user-defined validation like the LogTime::is_valid method (note the validator annotation on LogTime), which can be used to enforce safety invariants that go above and beyond soundness.

Given these derives of TryFromBytes, an implementation of this RPC could be as simple as:

fn serve_request_logs_rpc<F: FnMut(&RequestLogsArgs)>(server: &mut RpcServer, f: F) -> Result<()> {
    loop {
        let bytes = [0u8; mem::size_of::<RequestLogsArgs>()];
        server.read_request(&mut bytes[..])?;
        let args = RequestLogsArgs::try_from_bytes(&bytes[..]).ok_or(ParseError)?;
        f(args);
    }
}

The design proposed in this issue implements this API.

Design

TODO

This design builds on the following features:

/// A value which might or might not constitute a valid instance of `T`.
// Builds on the custom MaybeUninit type described in #29
pub struct MaybeValid<T: AsMaybeUninit + ?Sized>(MaybeUninit<T>);

// Allows us to use the `project!` macro for field projection (proposed in #196)
unsafe impl<T, F> Projectable<F, AlignedByteArray<F>> for AlignedByteArray<T> {
    type Inner = T;
}

impl<T> MaybeValid<T> {
    /// Converts this `MaybeValid<T>` to a `T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub const unsafe fn assume_valid(self) -> T { ... }

    /// Converts this `&MaybeValid<T>` to a `&T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub const unsafe fn assume_valid_ref(&self) -> &T { ... }

    /// Converts this `&mut MaybeValid<T>` to a `&mut T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub unsafe fn assume_valid_mut(&mut self) -> &mut T { ... }
}

/// # Safety
///
/// `is_bit_valid` is correct. If not, can cause UB.
pub unsafe trait TryFromBytes {
    fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool;

    fn try_from_ref(bytes: &[u8]) -> Option<&Self> {
        let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_ref();
        if Self::is_bit_valid(maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid_ref`.
            Some(unsafe { maybe_valid.assume_valid_ref() })
        } else {
            None
        }
    }

    fn try_from_mut(bytes: &mut [u8]) -> Option<&mut Self>
    where
        Self: AsBytes + Sized,
    {
        let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_mut();
        if Self::is_bit_valid(maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid_ref`.
            Some(unsafe { maybe_valid.assume_valid_ref() })
        } else {
            None
        }
    }

    fn try_read_from(bytes: &[u8]) -> Option<Self>
    where
        Self: Sized
    {
        let maybe_valid = <MaybeValid<T> as FromBytes>::read_from(bytes)?;
        if Self::is_bit_valid(&maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid`.
            Some(unsafe { maybe_valid.assume_valid() })
        } else {
            None
        }
    }
}

Here's an example usage:

/// A type without any safety invariants.
#[derive(TryFromBytes)]
#[repr(C)]
struct MySimpleType {
    b: bool,
}

// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MySimpleType {
    fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool {
        // `project!` is described in #196
        let b: &MaybeValid<bool> = project!(&bytes.b);
        TryFromBytes::is_bit_valid(b)
    }
}

/// A type with invariants encoded using `validate`.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "validate")]
#[repr(C)]
struct MyComplexType {
    b: bool,
}

// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MyComplexType {
    fn is_bit_valid(bytes: &AlignedByteArray<Self>) -> bool {
        // `project!` is described in #196
        let b: &MaybeValid<bool> = project!(&bytes.b);
        if !TryFromBytes::is_bit_valid(b) { return false; }
        // If there's no interior mutability, then we know this is sound because of preceding
        // validation. TODO: What to do about interior mutability?
        let slf: &MyComplexType = ...;
        MyComplexType::validate(slf)
    }
}

impl MyComplexType {
    fn validate(slf: &MyComplexType) -> bool { ... }
}

Unions

See for a discussion of how to support unions in TryFromBytes: #696

Relationship with other traits

There are obvious relationships between TryFromBytes and the existing FromZeroes and FromBytes traits:

  • If a type is FromZeroes, then it should probably be TryFromBytes (at a minimum, we must know something about the type's layout and bit validity to determine that it is genuinely FromZeroes)
    • This implies that we should change FromZeroes to be FromZeroes: TryFromBytes
  • If a type is FromBytes, then it is trivially TryFromBytes (where is_bit_valid unconditionally returns true)
    • This implies that we should provide a blanket impl impl<T: FromBytes> TryFromBytes for T

Unfortunately, neither of these are possible today.

FromZeroes: TryFromBytes

The reason this bound doesn't work has to do with unsized types. As described in the previous section, working with unsized types is difficult. Luckily for FromZeroes, it doesn't have to do anything with the types it's implemented for - it's just a marker trait. It can happily represent a claim about the bit validity of a type even if that type isn't constructible in practice (over time, FromZeroes will become more useful as more unsized types become constructible). By contrast, TryFromBytes is only useful if we can emit validation code (namely, is_bit_valid). For that reason, we require that TryFromBytes: AsMaybeUninit since that bound is required in order to support the MaybeValid type required by is_bit_valid.

This means that we have two options if we want FromZeroes: TryFromBytes:

  • We can keep TryFromBytes: AsMaybeUninit. As a result, some types which are FromZeroes today can no longer be FromZeroes, and some blanket impls of FromZeroes would require more complex bounds (e.g., today we write impl<T: FromZeroes> FromZeroes for Wrapping<T>; under this system, we'd need to write impl<T: FromZeroes> FromZeroes for Wrapping<T> where <T as AsMaybeUninit>::MaybeUninit: Sized, or alternatively we'd need to write one impl for T and a different one for [T]).
  • We could move the AsMaybeUninit bound out of definition of TryFromBytes and into is_bit_valid (and callers). As a result, we can keep existing impls of FromZeroes, but now T: TryFromBytes is essentially useless - to do anything useful, you need to specify T: TryFromBytes + AsMaybeUninit.

Neither option seems preferable to just omitting FromZeroes: TryFromBytes. Callers who require both can simply write T: FromZeroes + TryFromBytes.

(Note that the same points apply if we consider FromBytes: TryFromBytes)

impl<T: FromBytes> TryFromBytes for T

This conflicts with other blanket impls which we need for completeness:

  • impl<T: TryFromBytes> TryFromBytes for [T]
  • impl<const N: usize, T: TryFromBytes> TryFromBytes for [T; N]

As a result, we have to leave TryFromBytes and FromBytes as orthogonal. We may want to make it so that derive(FromBytes) automatically emits an impl of TryFromBytes, although in the general case that may require custom DST support.

Open questions

  • Is there any way to recover the blanket impl of TryFromBytes for T: FromBytes? Unlike FromZeroes: TryFromBytes, where you may need to perform runtime validation, if you know that T: FromBytes, then in principle you know that is_bit_valid can unconditionally return true without inspecting its argument, and so in principle it shouldn't matter whether you can construct a MaybeValid<Self>. Is there some way that we could allow FromBytes types to specify <Self as AsMaybeUninit>::MaybeUninit = () or similar in order to bypass the "only sized types or slices can implement AsMaybeUninit" problem?
    • One approach is to wait until a KnownLayout trait lands. There's a good chance that, under that design, we'd end up with FromZeroes: KnownLayout. If KnownLayout: AsMaybeUninit (or just absorbs the current definition of AsMaybeUninit into itself), it'd solve this problem since all zerocopy traits would imply support for MaybeValid.
  • In the first version of this feature, could we relax the Self: Sized bounds on try_from_ref and try_from_mut (without needing full custom-DST support)?
  • Should derive(FromBytes) emit an impl of TryFromBytes? What about custom DSTs?
  • What should the behavior for unions be? Should it validate that at least one variant is valid, or that all variants are valid? (This hinges somewhat on the outcome of rust-lang/unsafe-code-guidelines#438.)
  • What bounds should we place on T when implementing TryFromBytes for Unalign<T> (#320)?

Future directions

  • In this design, we ban interior mutability entirely. For references, this is unavoidable - e.g., if we were to allow types containing UnsafeCell in try_from_ref, then the user could obtain an &UnsafeCell and a &[u8] view of the same memory, which is unsound (it's unsound to even exist under Stacked Borrows, and unsound to expose to safe code in all cases). For values (i.e., try_read_from), we'd like to be able to support this - as long as we have some way of performing validation, it should be fine to return an UnsafeCell by value even if its bytes were copied from a &[u8]. Actually supporting this in practice is complicated for a number of reasons, but perhaps a future extension could support it. Reasons it's complicated:
    • is_bit_valid operates on a NonNull<Self>, so interior mutability isn't inherently a problem. However, it needs to be able to call a user's custom validator, which instead operates on a &Self, which is a problem.
    • Even if we could solve the previous problem somehow, we'd need to have is_bit_valid require that it's argument not be either experiencing interior mutation or, under Stacked Borrows, contain any UnsafeCells at all. When the NonNull<Self> is synthesized from a &[u8], this isn't a problem, but if in the future we want to support type-to-type conditional transmutation, it might be a problem. If, in the future, merely containing UnsafeCells is fine, then we could potentially design a wrapper type which "disables" interior mutation and supports field projection. This might allow us to solve this problem.
  • Extend TryFromBytes to support validation context #590

Prior art

The bytemuck crate defines a CheckedBitPattern trait which serves a similar role to the proposed TryFromBytes.

Unlike TryFromBytes, CheckedBitPattern introduces a separate associated Bits type which is a type with the same layout as Self except that all bit patterns are valid. This serves the same role as MaybeValid<Self> in our design. One advantage for the Bits type is that it may be more ergonomic to write validation code for it, which is important for manual implementations of CheckedBitPattern. However, our design expects that manual implementations of TryFromBytes will be very rare. Since CheckedBitPattern's derive doesn't support custom validation, any type with safety invariants would need a manual implementation. By contrast, the TryFromBytes derive's support for a custom validation function means that, from a completeness standpoint, it should never be necessary to implement TryFromBytes manually. The only case in which a manual implementation might be warranted would be for performance reasons.

@joshlf joshlf added the compatibility-nonbreaking Changes that are (likely to be) non-breaking label Aug 12, 2023
@joshlf joshlf changed the title Support conditional conversion analogous to FromBytes Support TryFromBytes - conditional conversion analogous to FromBytes Aug 14, 2023
joshlf added a commit that referenced this issue Aug 16, 2023
TODO: How to support `project!` when `Projectable` is implemented
multiple times for `ByteArray`, `Align`, and `AlignedByteArray`?
If we only emit an impl for `AlignedByteArray`, everything is fine,
but once we emit the other two, type inference fails.

Makes progress on #5
joshlf added a commit that referenced this issue Aug 17, 2023
TODO: How to support `project!` when `Projectable` is implemented
multiple times for `ByteArray`, `Align`, and `AlignedByteArray`?
If we only emit an impl for `AlignedByteArray`, everything is fine,
but once we emit the other two, type inference fails.

Makes progress on #5
joshlf added a commit that referenced this issue Aug 17, 2023
TODO: How to support `project!` when `Projectable` is implemented
multiple times for `ByteArray`, `Align`, and `AlignedByteArray`?
If we only emit an impl for `AlignedByteArray`, everything is fine,
but once we emit the other two, type inference fails.

Makes progress on #5
joshlf added a commit that referenced this issue Aug 17, 2023
TODO: How to support `project!` when `Projectable` is implemented
multiple times for `ByteArray`, `Align`, and `AlignedByteArray`?
If we only emit an impl for `AlignedByteArray`, everything is fine,
but once we emit the other two, type inference fails.

Makes progress on #5
joshlf added a commit that referenced this issue Aug 17, 2023
TODO: How to support `project!` when `Projectable` is implemented
multiple times for `ByteArray`, `Align`, and `AlignedByteArray`?
If we only emit an impl for `AlignedByteArray`, everything is fine,
but once we emit the other two, type inference fails.

Makes progress on #5
joshlf added a commit that referenced this issue Aug 17, 2023
TODO: How to support `project!` when `Projectable` is implemented
multiple times for `ByteArray`, `Align`, and `AlignedByteArray`?
If we only emit an impl for `AlignedByteArray`, everything is fine,
but once we emit the other two, type inference fails.

Makes progress on #5
@joshlf joshlf mentioned this issue Aug 20, 2023
joshlf added a commit that referenced this issue Aug 21, 2023
Makes progress on #5
joshlf added a commit that referenced this issue Aug 21, 2023
Makes progress on #5
joshlf added a commit that referenced this issue Aug 21, 2023
Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Should we require `FromBytes: FromZeroes + TryFromBytes` or
  `FromZeroes: TryFromBytes`? We obviously only need one of these, but
  which one?

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Should we require `FromBytes: FromZeroes + TryFromBytes` or
  `FromZeroes: TryFromBytes`? We obviously only need one of these, but
  which one?

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Should we require `FromBytes: FromZeroes + TryFromBytes` or
  `FromZeroes: TryFromBytes`? We obviously only need one of these, but
  which one?

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Should we require `FromBytes: FromZeroes + TryFromBytes` or
  `FromZeroes: TryFromBytes`? We obviously only need one of these, but
  which one?

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Should we require `FromBytes: FromZeroes + TryFromBytes` or
  `FromZeroes: TryFromBytes`? We obviously only need one of these, but
  which one?

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 22, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 23, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 23, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 23, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 23, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
joshlf added a commit that referenced this issue Aug 23, 2023
TODO:
- Finish writing safety comments
- Finish writing documentation
- Add tests

Makes progress on #5
@joshlf joshlf added the blocking-next-release This issue should be resolved before we release on crates.io label May 30, 2024
@joshlf
Copy link
Member Author

joshlf commented Jul 1, 2024

@kupiakos I discussed this with @jswrenn and @djkoloski and decided to handle this as part of #1289 (comment), when we're doing more work to consider the broader design space. We likely won't support this for 0.8, but may support it for 0.9 depending on how our design for #1289 goes.

@joshlf joshlf removed the blocking-next-release This issue should be resolved before we release on crates.io label Jul 1, 2024
jswrenn added a commit that referenced this issue Sep 8, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
jswrenn added a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
github-merge-queue bot pushed a commit that referenced this issue Sep 9, 2024
Makes progress towards #5.
joshlf added a commit that referenced this issue Sep 24, 2024
joshlf added a commit that referenced this issue Oct 1, 2024
When deriving `FromBytes` on a type with no generic parameters, the
implied `TryFromBytes` derive's `is_bit_valid` impl is generated as
always returning `true`. This is faster to codegen, is faster to
compile, and is friendlier on the optimizer.

Makes progress on #5
joshlf added a commit that referenced this issue Oct 2, 2024
When deriving `FromBytes` on a type with no generic parameters, the
implied `TryFromBytes` derive's `is_bit_valid` impl is generated as
always returning `true`. This is faster to codegen, is faster to
compile, and is friendlier on the optimizer.

Makes progress on #5
joshlf added a commit that referenced this issue Oct 2, 2024
When deriving `FromBytes` on a type with no generic parameters, the
implied `TryFromBytes` derive's `is_bit_valid` impl is generated as
always returning `true`. This is faster to codegen, is faster to
compile, and is friendlier on the optimizer.

Makes progress on #5
joshlf added a commit that referenced this issue Oct 2, 2024
When deriving `FromBytes` on a type with no generic parameters, the
implied `TryFromBytes` derive's `is_bit_valid` impl is generated as
always returning `true`. This is faster to codegen, is faster to
compile, and is friendlier on the optimizer.

Makes progress on #5
github-merge-queue bot pushed a commit that referenced this issue Oct 2, 2024
When deriving `FromBytes` on a type with no generic parameters, the
implied `TryFromBytes` derive's `is_bit_valid` impl is generated as
always returning `true`. This is faster to codegen, is faster to
compile, and is friendlier on the optimizer.

Makes progress on #5
@joshlf joshlf closed this as completed Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility-nonbreaking Changes that are (likely to be) non-breaking
Projects
None yet
Development

No branches or pull requests

4 participants