Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling a composed component that exchanges strings #143

Open
Finfalter opened this issue Apr 11, 2023 · 3 comments
Open

Handling a composed component that exchanges strings #143

Finfalter opened this issue Apr 11, 2023 · 3 comments

Comments

@Finfalter
Copy link

As discussed in Zulip > general > composing components, wasmtime-py doesn't yet support handling a composed component that exchanges strings. Since I really would like to use this feature in one of my projects, I raise this issue in the sense of a feature request. A minimal example of what is expected together with an illustration of what error is raised can be found here.

For illustration: trying to compose component1 and component2 reflecting the following two interfaces
Interface of component1

interface exports {
  greet: func(s: string) -> string
}

default world greetworld {
  export greeting: self.exports
}

Interface of component2

interface imports {
  greet: func(s: string) -> string
}

interface exports {
  greet: func(s: string) -> string
}

default world bettergreetworld {
  import greeting: self.imports
  export exports: self.exports
}

yields the following error

#[..]
Caused by:
    wasm trap: wasm `unreachable` instruction executed
@alexcrichton
Copy link
Member

Thanks for the report! I won't personally have the chance to get to this for a bit, so I'm going to write down some notes here. This shouldn't be the trickiest thing in the world if something is feeling particularly intrepid to take this on, but it's also noat necessarily a great first-task either. I can try to help out along the way with questions if someone's interested though!

What's happening here is that this assertion is being tripped. This construct is indicating that a core wasm function needs to be synthesized to transcode strings from one component to another. This involves reading the string from one linear memory, validating its encoding, and then reencoding it into a destination linear memory. The specifics of this operation are well-defined but subtle as well because the encodings on both halves may be different, for example utf-8 and utf-16.

The GlobalInitializer comes from here in wasmtime and the Transcoder struct looks like this:

pub struct Transcoder {
    /// The index of the transcoder being defined and initialized.
    ///
    /// This indicates which `VMCallerCheckedFuncRef` slot is written to in a
    /// `VMComponentContext`.
    pub index: RuntimeTranscoderIndex,
    /// The transcoding operation being performed.
    pub op: Transcode,
    /// The linear memory that the string is being read from.
    pub from: RuntimeMemoryIndex,
    /// Whether or not the source linear memory is 64-bit or not.
    pub from64: bool,
    /// The linear memory that the string is being written to.
    pub to: RuntimeMemoryIndex,
    /// Whether or not the destination linear memory is 64-bit or not.
    pub to64: bool,
    /// The wasm signature of the cranelift-generated trampoline.
    pub signature: SignatureIndex,
}

This Transcoder structure represents a Python function that needs to be generating. The python function would be named something like _transcoder_i where i is the index field. The op field looks like this:

pub enum Transcode {
    Copy(FixedEncoding),
    Latin1ToUtf16,
    Latin1ToUtf8,
    Utf16ToCompactProbablyUtf16,
    Utf16ToCompactUtf16,
    Utf16ToLatin1,
    Utf16ToUtf8,
    Utf8ToCompactUtf16,
    Utf8ToLatin1,
    Utf8ToUtf16,
}

pub enum FixedEncoding {
    Utf8,
    Utf16,
    Latin1,
}

which describes the transcoding operation being performed. Each variant here requires a different Python function to implement it. At a high level all these algorithms are defined in this document and represents sort of a fused load_string and store_string function. Before I go too much more into this though the other fields of Transcoder are:

  • from and to - the linear memories that are being read from and written to. The indexes here are created by previous GlobalInitializer::ExtractMemory items so in the generated code the linear memory can be referenced as self._core_memory{i}. Interacting with these objects should use the standard wasmtime.Memory APIs from Python (or add more APIs to that as necessary).
  • from64 and to64 - these can be asserted as false for now. This implies memory64 support but memory64 isn't well supported with the component model, so asserting that as false is basically a TODO item to flesh out if someone hits it later (like this issue!)
  • signature probably isn't necessary, but it's the core wasm signature of the function if necessary.

So the main meat of this is the op and the transcoding op. The two linear memories then describe where to read data and where to write data. Each op can have its own signature, not all transcoders ascribe to the same signature. A description of the signature of each transcoding operation can be found here and the Rust implementation of all transcoders can be found in this file.

That's all somewhat abstract, though, and the full power here isn't necessarily required. For example the above component probably only needs utf8-to-utf8 which is relatively simple compared to other encodings. I'll go through that in a bit more detail, and the other op variants can be left as unimplemented!() for now too.

For utf8-to-utf8 the original source string is validated as correctly encoded and then it's memcpy'd to the destination string. The Rust implementation is here which is called from a bit of macro-soup to handle fiddly bits but the general signature is here and here (as the op will be Copy(Utf8)).

This means that the Python host function will look something like:

def transcoderN_impl(caller: wasmtime.Caller, from_ptr: int, from_len: int, to_ptr: int) -> None:
    from: wasmtime.Memory = self._core_memoryA
    to: wasmtime.Memory = self._core_memoryB

    from_bytes = from.read(caller, from_ptr, from_ptr + from_len)
    # assert that `from_bytes` is valid utf-8 in Python, I'm not actually sure how to do this
    to.write(caller, from_bytes, to_ptr)

transcoderN_ty = FuncType(...)
transcoderN = Func(store, transcoderN_ty, transcoderN_impl, access_caller = True)

And that... might be the majority of it? Worth testing for sure!

The final bit to fill out will be this one which is implemented similar to the Lowering branch as transcoder{i} is Func you're accessing. (or at least I'm pretty sure).

That's hopefully enough for someone who's interested to get started, but I can answer more questions as well!

@Finfalter
Copy link
Author

"some notes" sounds like a quantum of understatement....thank you for this elaborated sketch! Sooo, I can see the metaphorical double shampooed unrolled red carpet right in front of me. Since this feature is definitely on my wishlist, I will have a look (starting in a couple of days). I guess, it will take a longer while to complete. If you do not mind, I will use this channel in case of further questions.

@zifeo
Copy link
Contributor

zifeo commented Apr 22, 2023

@Finfalter I am also interested in this, let me know if I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants