Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Host Modules #482

Open
evacchi opened this issue Aug 21, 2024 · 2 comments
Open

Introduce Host Modules #482

evacchi opened this issue Aug 21, 2024 · 2 comments

Comments

@evacchi
Copy link
Collaborator

evacchi commented Aug 21, 2024

Every Wasm module is able to require some items, such as functions, at instantiation phase.
These are called imports, and every import is labeled by "a two-level name space, consisting of a
module name and a name for the entity"
.

The spec also shallowly defines the concept of Host Function as of

a function expressed outside WebAssembly but passed to a module as an import.

Because imports are alway qualified and because, generally, related functions
belong to the same module, they are usually qualified by the same name,
and they might even share some form of state.
wazero has introduced the concept of a Host Module.

For instance, suppose that a module need_add.wasm imports a function env.add:

(module (import "env" "add" (func)))

Instead of declaring the function <env, add>, users define a host module
env and then declare all of the functions that belong to that module, including add.

wazero's Host Module is nothing more than a bundle of related functions that, for convenience,
are defined together. On a surface level, this is a convenience for end users, who will
declare modules in a more concise way; but in reality, bundling related functions
into a host module simplifies management within the engine, and ultimately improves
lifecycle control.

This is a proposal to introduce a similar construct in Chicory.

Benefits for End Users

End users will be able to define a collection of related host functions more concisely.

For instance, the signature for the HostFunction constructor is currently:

    public HostFunction(
            WasmFunctionHandle handle,
            String moduleName,
            String fieldName,
            List<ValueType> paramTypes,
            List<ValueType> returnTypes) {
        this.handle = handle;
        this.moduleName = moduleName;
        this.fieldName = fieldName;
        this.paramTypes = paramTypes;
        this.returnTypes = returnTypes;
    }

We can imagine to provide a HostModule.Builder in the same fashion as wazero's:

	_, err := r.NewHostModuleBuilder("env").
		NewFunctionBuilder().
		WithFunc(func(v uint32) {
			fmt.Println("log_i32 >>", v)
		}).
		Export("log_i32").
		NewFunctionBuilder().
		WithFunc(func() uint32 {
			if envYear, err := strconv.ParseUint(os.Getenv("CURRENT_YEAR"), 10, 64); err == nil {
				return uint32(envYear) // Allow env-override to prevent annual test maintenance!
			}
			return uint32(time.Now().Year())
		}).
		Export("current_year").
		Instantiate(ctx)

Eventually, we could even provide a way for end-users to define a module as a simple class, and derive
a host module (and hence, host functions) automatically. For instance (strawman syntax):

@HostModule("env")
public class EnvModule {
    @WasmExport("log_i32")
    public int logI32() {
        ...
    }

    @WasmExport("current_year")
    public int currentYear() {
        ...
    }
}

which would internally something like:

    HostModule.builder(EnvModule.class).
        withExport("log_i32", EnvModule::logI32, (...params...), (...returns...))
        withExport("current_year", EnvModule::currentYear, (...params...), (...returns...))

Benefits for the Engine

The higher-level concept of a host module instead of the lower-level concept of freestanding
host functions allows to treat the lifecycle of such host functions similarly to the lifecycle
of a module.

For instance, host functions are able to close over their environment, but there is no unified
way to initialize their state, or an explicit way to release resources they might be indirectly refer to.

For instance, imagine a host function closing over a file handle. As long as the host function is kept around,
directly or indirectly (for instance, because of some transitive import directive), that resource
will be held onto.

By introducing host modules, we can uniformly control the lifecycle of both Wasm modules
and host modules.

This will become increasingly important, as it is common to be able to automatically cross-link
modules, using the import/export system. Unifying Wasm modules to host modules will simplify
introducing a form of automated linking (to be discussed in a separate issue) because it will also
make explicit initialization and destruction of both Wasm modules and host functions.

For instance, consider this example from the [wazero documentation][wazero-host-accesss]:

The module need_add.wasm we introduced at the beginning imports a function env.add
and exports a function use_add function which calls env.add:

(module ;; need_add
    (import "env" "add" (func))
    (export "use_add" (func))
)

Users define a host module and load it together with need_add.wasm. When both modules are instantiated,
wazero ensures that a module named env is available and it provides an add function. It is irrelevant
whether the function is a host function or it is defined by another Wasm module, as long as the
name and signature can be resolved successfully:

                                                        func add(foo, bar int32) int32 {
                                                            return foo + bar
                                                        }         |
                                                                  |
                                                                  | implements
                                host module                       v
+---------+                +------------------+          +-----------------+
| Runtime | -------------> | (module: myhost) | -------> | (function: add) |
+---------+  ^             +------------------+  export  +-----------------+
    \       /                                                       /
     \instantiate                                                  /
      \   /                                                       /
       \ v                                                       /
        \                                                       /
         \                                                     / imported
          \ (import "myhost" "add" (func))                    /
           \                                                 /
            \                                   +-----------/------+
             \                                  |          v       |
              \                                 |   (myhost.add)   |
               v                                |        ^         |
                +--------------------+          |        | call    |
                | (module: need_add) |--------->| (export:use_add) <----- Exported
                +--------------------+          |                  |
                                                +------------------+
                                            functions in need_add's sandbox
  • by unifying host modules and wasm modules it will be no longer necessary to bridge explicitly between host functions and Wasm functions, as they will be treated uniformly by the engine.
  • by providing a unified concept of module, both modules can be instantiated and destroyed in a similar way
    when they are no longer necessary.
@bhelx
Copy link
Contributor

bhelx commented Aug 21, 2024

This looks good to me and I support this. A few comments:

We can imagine to provide a HostModule.Builder in the same fashion as wazero's:

I think the way wazero does it is perhaps too granular for my taste. Something about a deeply nested builder feels to me like it would be hard to write without copy pasting an example. I'd prefer the "internal" option you suggested. Though if you think Java users are used to building deeply nested objects like that, i think that's okay. I think the class option is ideal too esp as it gives you the ability to attach some state to the instance.

@evacchi
Copy link
Collaborator Author

evacchi commented Aug 21, 2024

oh yes, the Go version is just for reference.

As I was porting over WasiPreview1 as an example, I am realizing that maybe class-based option will be needed in some fashion, especially because we probably want to be able to instantiate with some state, and then close() to cleanup that state.

The builder will be mostly used for one-offs 🤔
EDIT: or not? we probably want to keep the pair Module/Instance... 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants