Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm: run wasmtime on the reactor threads #14046

Merged
merged 16 commits into from
Oct 11, 2023

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Oct 9, 2023

This patch set moves our usage of Wasmtime over to using the async API for invoking the VM.

The async API is well documented here. In short, Wasm is executed on a seperate stack
(as well as our host functions) and the VM can swap back to the callers stack to "yield"
execution back to the caller. This will allow us to yield control back to the scheduler
so we don't go over time budget. This yielding behavior is not yet implemented, this only
sets us up so that we can do this, and it's only a matter of configuration and tuning
the amount of fuel we give the VM.

Moving on the reactor has greatly improved throughput of transforms based on my benchmarking
(3x more!), and will allow us to change our ABI so that we can use upstream golang in addition
to tinygo. More details on the new ABI will come in another patch set.

Some nuts + bolts/reviewer notes: The async API uses basically an entirely seperate set
of APIs - to make sure tests pass in every commit I just temporarily fork the implementation
to another file that isn't built then merge it back in. So individual commits may not be valid,
but are more broken up to hopefully aid the review process.

The set of APIs used here are very new to wasmtime, and if you're curious
about under the hood how some of these work, these PRs might be helpful
to review:

bytecodealliance/wasmtime#7140
bytecodealliance/wasmtime#7106

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

To pull in a fix for a header being typed incorrectly.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Upcoming commits are going to move wasmtime calls onto the reactor,
which means our for_each_record helpers needs to be async. We still need
the sync version as the current version runs on an alien thread.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Fork the wasmtime code into another file so that the following patches
can incrementally change the impleemntation into logical pieces for
reviewers.

Wasmtime's async support requires an entire seperate set of APIs to be
used, otherwise there are runtime errors, forking the implementation
then merging it back in allows for functionality changes to be split up
without needing to figure out how to make everything compile and pass
tests at each commit.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
The working model here is that wasmtime allocates a seperate stack to
execute Wasm on. That stack can be yielded to and will allow us to run
this on the reactor directly. Our host functions are also executed this
new stack, so now our host functions will only have 64 KiB to execute
on. We should add some test/debug support for ensuring that our host
functions do not use more stack memory than that. If more stack memory
is required, the option of making the host function async is possible,
as then the continuations will be executed on "normal" reactor thread
stacks, otherwise we'll have to consider a seperate memory pool for
stacks, as 128KiB is about the most we want to allocate at once in
Redpanda.

As another note these stacks are current mmap'd but there will be work
done in Wasmtime to plugin a seperate allocator in the runtime. That
will be a followup change to plug that in once it's ready.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
We're going to use wasmtime's async function support this so cleanup the
host module registration by removing the alien 👾

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Just a refactor to move the down so the engine is in scope.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Async host functions are implemented via polling similar to Rust's
Future type. We implement this via a "done" boolean that is threaded
around and we mark it as true when the future completes. Wasmtime
ensures that all the arguments to the function will stay alive until our
polling function returns true.

There will be a followup where we will register the future with the
engine to not poll the status until future has completed.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This moves all code compilation to a single place. The prior
implementation compiled all our native code, and then for each instance
created small trampoline functions had to be compile to glue the module
and it's host functions together. This allows us to make creating
instances quicker (no alien threads) and simpler.

Additionally, this commit removes the alien thread usage from the
engine. This is currently using the blocking APIs but followup commits
will use the async Wasmtime APIs.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
When we start an engine we now need to create an instance from the
preinitialized instance. The reasoning for doing this is that we support
restarting an engine by stopping and starting it and the expectation is
that clears the memory for the instance. The only way to truly reset a
store (the holder of memory for an instance) is by deleting it and
creating a new one.

We instantiate a preinitialized module using wasmtime's async API, which
any call into the Wasm VM returns a future that each time the future is
polled executes some work on the VM. Right now we have not (yet) configured
the yielding behavior so yielding can only happen if an asynchronous
host function is called.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Use wasmtime's async API to call the _start entry point to main for WASI.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj rockwotj requested review from BenPope and a team as code owners October 9, 2023 17:05
@rockwotj rockwotj requested review from savex and removed request for a team October 9, 2023 17:05
This uses wasmtime's async APIs to invoke the transform function that we
expect to have exported as part of our ABI contract.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj
Copy link
Contributor Author

/ci-repeat

Registering the futures of an async host function with the engine allows
for not polling the wasm VM when we already know it's suspended because
something in schema registry, etc is happening on another core.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Using a 1:1 alien thread pool is wasteful in terms of memory (I believe
by default threads have ~8 MB stacks in linux?), so only use a single
thread. Possibly in the future we can temporarily spin up more threads
if we need to compile many modules (like on startup for instance).

Also we need to unblock wasmtime's signals on the reactor threads now
that we're running wasm directly on the reactor thread.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
See the comment in the code, but seastar::thread's usage of swapcontext
can re-block signals, so we need to unblock those in the tests when they
run in debug mode.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This switches our implementation of wasm::engine to use wasmtime's async
implementation from the sync API.

This commit is essentially `mv wasmtime_async.cc wasmtime.cc`

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This was only used on an alien thread when wasm was executed in a
blocking context, but now we're using the async version so it can be
removed.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj
Copy link
Contributor Author

Force push: Folded test fix back into commit history so that all commits pass tests

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@rockwotj
Copy link
Contributor Author

CI Failures: #7148, #13997, #14053

Comment on lines 513 to 531
ssx::background
= std::move(host_future_result)
.then_wrapped(
[status, trap_ret, mem = std::move(mem)](ss::future<> fut) {
if (fut.failed()) {
auto msg = ss::format(
"Failure executing host function: {}",
fut.get_exception());
*trap_ret = wasmtime_trap_new(msg.data(), msg.size());
}
*status = async_call_done::yes;
});

continuation->env = status;
continuation->finalizer = [](void* env) {
// NOLINTNEXTLINE(*owning-memory)
delete static_cast<async_call_done*>(env);
};
continuation->callback = [](void* env) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. so the caller that passes in the continuation will use the continuation to wait on the result of the backgrounded future to occur (or the trap_ret to be set)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the actual implementation is here:
https://github.com/bytecodealliance/wasmtime/pull/7106/files#diff-609e139b86cac725103dd2d38ce4d0018e0eb41a56278255217354e523b784bdR117-R136

The .await is a fancy way of waiting until true is returned here, and stuff on the stack is kept alive, similar to a coroutine in C++, see this: https://github.com/bytecodealliance/wasmtime/pull/7106/files#diff-609e139b86cac725103dd2d38ce4d0018e0eb41a56278255217354e523b784bdR70-R81

But maybe my answer to your other question sheds light on whats happening under the covers here. It's really cool stuff!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks. it is very cool.

continuation.await;

does this correspond to some thread managed by wasmtime?

Copy link
Member

@dotnwat dotnwat Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind i think it is answered in the next commen ton this pr related to poll

Comment on lines 334 to 335
while (!wasmtime_call_future_poll(fut.get())) {
co_await ss::coroutine::maybe_yield();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's on the other side of this, like work on another stack or something? is this call driving execution of whatever is on the other side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#asynchronous-wasm

The poll method of the futures returned by Wasmtime will perform the actual work of calling the WebAssembly. Wasmtime won’t manage its own thread pools or similar, that’s left up to the embedder.
To implement futures in a way that WebAssembly sees asynchronous host functions as synchronous, all async Wasmtime futures will execute on a separately allocated native stack from the thread otherwise executing Wasmtime. This separate native stack can then be switched to and from. Using this whenever an async host function returns a future that resolves to Pending we switch away from the temporary stack back to the main stack and propagate the Pending status.

Basically calling poll here switches to the "Wasm" stack and runs the VM. Host functions that are async end up being propagated to poll and the continuation we return in async host functions is the bit that checks if a poll is able to go back to executing Wasm within the VM. This is why we need to pass back the future from continutation otherwise we'll busy loop asking if the future is done, instead of letting seastar reschedule this fiber when the future is done (at that point the continuation will return true and go back to running the VM).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@piyushredpanda piyushredpanda merged commit 8d57443 into redpanda-data:dev Oct 11, 2023
22 of 25 checks passed
@rockwotj rockwotj deleted the async-wasm branch October 11, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants