Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasi-common] Re-design of generated types, for safer use and more complete code generation #734

Closed
pchickey opened this issue Dec 18, 2019 · 2 comments
Labels
wasi:impl Issues pertaining to WASI implementation in Wasmtime

Comments

@pchickey
Copy link
Contributor

pchickey commented Dec 18, 2019

wig presently generates wasi-common types from witx descriptions of the standard. This is a great start, but there are a handful of ways in which the design of wasi-common is difficult to automatically generate more code for. I want to generate as much of the boilerplate for each hostcall as I can, but before we get there, we need to sort out some difficulties with datatypes.

This proposal seeks to redesign we manage 1. validating and chasing pointers into guest memory, ensuring that all references into guest memory are safe, and 2. validating enum and flag values.

We currently depend on some hand-written enc_/dec_ functions to serialize and deserialize some types to and from guest memory. I want to replace those with automatically generated implementations without losing any of the zero-copy optimizations we currently have.

Generated Types

There are two purposes for generated types:

  • Using a value that resides in guest memory. This includes reading a value
    from the guest memory (via a pointer), or writing a value into guest memory.
    These should only be used in the host calls, and not by any business logic,
    except for logic that determines if a value has a valid encoding (memory
    out-of-bounds, memory alignment, or value out-of-range).

    • struct GuestMemory<'a> encapsulates a Linear Memory and ensures all
      borrows of that memory are safe Rust - there can be multiple immutable borrows of the same memory, but at most one mutable borrow. Constructed from a *mut u8 pointer
      to the start of linear memory, and usize length. The base pointer for linear
      memory be pointer-aligned to simplify alignment checks for pointers to structs
      in this memory, but this can be relaxed in the future. Lifetime parameter
      ensures that use of memory doesn't survive longer than the host call.

      • The biggest problem that GuestMemory needs to solve is to track
        all of the mutable and immutable borrows at run-time to make sure they
        do not cause UB. It will keep a map of the immutable and mutable
        borrows, and use the drop impls on GuestPtr and friends to remove
        those borrows from the map.
    • MemoryError describes an out-of-bounds memory access or misaligned pointer.

    • struct GuestPtr<'a, T> is a pointer to a read-only T in guest memory.
      Constructed from GuestMemory<'a>::ptr(&self, ptr: i32) -> Result<GuestPtr<'a, T>, MemoryError>

    • struct GuestPtrMut<'a, T> is a pointer to a writable T in guest memory.
      Constructed from GuestMemory<'a>::ptr_mut(&self, ptr: i32) -> Result<GuestPtrMut<'a, T>, MemoryError>

    • struct GuestArray<'a, T> is an array of read-only T in guest memory.
      Constructed from GuestMemory<'a>::array(&self, ptr: i32, len: i32) -> Result<GuestArray<'a, T>, MemoryError>

    • All T above have a GuestValue trait constraint. This trait describes
      the memory layout of the value.

    • Construction of the above are only possible if they point to a valid guest
      memory location.

    • GuestPtr and GuestPtrMut each have a fn offset(&self, elems: i32) -> Result<GuestPtr{Mut}<'a, T>, MemoryError> for accessing subsequent
      elements. This allows them to be used like an array, in types like iovec
      and ciovec.

    • ReprError describes an invalid representation of a type in guest memory.

    • impl GuestPtr<'a, T> { pub fn read(&self) -> Result<T, ReprError> }
      dereferences a pointer. As part of dereferencing, it validates the pointee

      • that will check if enum and flag values are in bounds, and recursively
        for each member of a struct. It won't do any validation on a union - you
        have to pick a variant in order to validate it.
    • impl GuestPtrMut<'a, T> { pub fn write(&self, T) } writes a T: GuestValue
      into the memory it references.

    • The T: GuestValue for a witx enum should be the host enum $Typename. The
      GuestValue impl will be used to validate and decode memory into this
      owned type. This decoding can be done inside GuestPtr::read, or on a bare
      integer of the enum's repr size.

    • The T: GuestValue for a witx flags should be the host struct $Typename.
      As above, GuestValue does the validation into an owned type.

    • The T: GuestValue for a witx struct has layout defined by witx. The generated
      struct depends on whether the struct recursively contains any unions, pointers,
      or arrays:

      • If it does contain a guest memory reference - lets call these complex GuestValues until I think of a better name, it is a mod guest { struct $Typename }. Rather than have public members, the struct has methods
        corresponding to each field, with a constructor function to create complex GuestValues.

        • All fields have a
          fn $fieldname_{ref,mut}(&self) -> Result<GuestPtr{Mut}<T>, MemoryError>
          accessor to take references of fields
        • Fields containing flat values are fn $fieldname(&self) -> Result<T, ReprError>
        • Fields which are a (mut) pointer are fn $fieldname(&self) -> Result<GuestPtr{Mut}<’a, T>, MemoryError>. Same for arrays.
        • Fields which are a union are fn $fieldname(&self) -> T
      • If it does not - lets call these flat GuestValues - then additionally derive ToOwned
        to be used without ceremony by the host

    • The T: GuestValue for a witx union is an opaque struct with a method
      for each variant: fn $variantname(&self) -> Result<T, ReprError>

  • Using a value in the Rust implementation of wasi-common. This should be an
    idiomatic, wholly owned Rust value. It should look like a Rust type: use modules
    for namespacing instead of repeated prefixes, use CamelCase not
    __SHOUTING_SNAKE_CASE.

    • A witx struct should be a rust struct
    • A witx union should be a rust enum
    • A witx enum should be a rust enum
    • A witx flags should be a rust struct $Typename { val: repr_type }, and a
      rust enum $TypenameFlag { flag_variants... }, and Typename should have
      setter and getter methods in terms of $TypenameFlag.
    • A witx array should be a rust Vec
    • A witx (const,) pointer should be a Box<T> where T is the owned rust value.
@kubkon kubkon added the wasi:impl Issues pertaining to WASI implementation in Wasmtime label Jan 3, 2020
@kubkon
Copy link
Member

kubkon commented Jan 17, 2020

@pchickey and @sunfishcode, I wanted to sync up with you guys and let you know I've now started looking at this issue with the intermediate results stored in the kubkon/wiggle repo. I've decided to work on this out-of-tree since it's simply easier this way (less legacy code to battle against at least in the early stages of prototyping).

In the short time I've looked at this, I've had an initial stab at tweaking value generation according to your guidelines Pat. There are rough edges left of course, but I reckon it's already a good start for a fruitful discussion.

Anyhow, you're both added as collaborators, have a look and let me know what you reckon.

@pchickey
Copy link
Contributor Author

pchickey commented Jun 16, 2021

wiggle has been done for a while now! its neat to go back and remember the bad old days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasi:impl Issues pertaining to WASI implementation in Wasmtime
Projects
None yet
Development

No branches or pull requests

2 participants