Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial WebAssembly subsystem #12666

Merged
merged 10 commits into from
Aug 17, 2023

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Aug 8, 2023

Add the initial Wasm engine to Redpanda.

Right now we are using WasmEdge as the underlying engine but the engine that is used is abstracted away behind some interfaces so that we can easily extend it to other runtimes. I have a branch that shows this interface works fine for wasmtime as well.

Generally this component exists in isolation from the rest of the system, but this new subsystem will eventually be consumed by the transform subsystem.

Major pieces:

  • ffi: A set of classes/functions that is used to abstract away exposing host functions into a Wasm engine
  • wasi: A (mostly stub) implementation of the WASI standard. This allows user code to write to the transform subsystem logs, and access environment variables
  • wasmedge: The actual implementation of our wasm engine abstraction using wasmedge
  • transform_module: An implementation of the ABI contract, it includes host functions that we expose into the Wasm module as well as helpers for calling into the runtime correctly
  • schema_registry: A wrapper around the pandaproxy::schema_registry subsystem for usage within our host module
  • schema_registry_module: A Wasm module that exposes schema registry to user transforms

The guest side of the ABI is implemented in #12322, and there is a video overview of how the ABI contract works here.

Wasm powered Data Transforms will be available in the community edition of Redpanda hence this code all being written under the BSL

NOTE: Right now I am checking in the .wasm test fixtures, but after #12322 goes in I will hook that into the build system and enable more tests. I hooked up the tests into the build system.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@rockwotj rockwotj requested review from BenPope and a team as code owners August 8, 2023 18:42
@rockwotj rockwotj requested review from andrewhsu and removed request for a team August 8, 2023 18:42
@rockwotj rockwotj mentioned this pull request Aug 8, 2023
7 tasks
@rockwotj rockwotj mentioned this pull request Aug 8, 2023
7 tasks
@rockwotj rockwotj force-pushed the rockwood/initial-wasm-engine branch from 0689bae to 673bdd0 Compare August 8, 2023 18:49
@rockwotj rockwotj force-pushed the rockwood/initial-wasm-engine branch from 673bdd0 to 1abf895 Compare August 8, 2023 19:55
Comment on lines 14 to 27
#include "seastarx.h"
#include "ssx/future-util.h"
#include "utils/mutex.h"

#include <seastar/core/abort_source.hh>
#include <seastar/core/future-util.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/thread.hh>
#include <seastar/util/noncopyable_function.hh>

#include <exception>
#include <type_traits>
#include <utility>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clean up includes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks done.

// Each transform revision has a UUID, which is the key for the wasm_binary
// topic and can be used to uniquely identify this revision.
uuid_t uuid;
// The offset of the commmitted WASM source in the wasm_binary topic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks fixed.

vlog(
wasm_log.trace,
"Transforming batch: {}",
batch.header().record_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can log the whole header here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly 😄

std::vector<ss::sstring> args{_meta.name()};
absl::flat_hash_map<ss::sstring, ss::sstring> env = _meta.environment;
env.emplace("REDPANDA_INPUT_TOPIC", _meta.input_topic.tp());
env.emplace("REDPANDA_OUTPUT_TOPIC", _meta.output_topics.begin()->tp());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do i understand that correctly that for now we support only one output topic ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Added a comment here.

args, env, _logger);
register_wasi_module(wasi_module.get(), wasmedge_wasi_module);

(void)_worker;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line seems like not needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks removed, we'll add compilation on an alien thread to later.

// Per transform probe
class transform_probe {
public:
using hist_t = log_hist_public;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this histogram light enough to be used per transform ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(log_hist_public) == 160 so I think so? If we have 1000 different transforms on a shard for 1000 partitions then we get 160kB which feels small enough. Happy to just switch this to a counter and have a stat for CPU utilization instead?

@BenPope BenPope self-requested a review August 9, 2023 12:31
@@ -324,3 +324,18 @@ ExternalProject_Add(protobuf
-Dprotobuf_BUILD_TESTS=OFF
${common_cmake_args}
)

ExternalProject_Add(WasmEdge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done - also removed v8

@BenPope BenPope self-requested a review August 9, 2023 13:07
@rockwotj rockwotj force-pushed the rockwood/initial-wasm-engine branch from 1abf895 to 1bdd788 Compare August 9, 2023 20:00
@rockwotj rockwotj force-pushed the rockwood/initial-wasm-engine branch from 1bdd788 to a910092 Compare August 9, 2023 20:01
@rockwotj rockwotj requested a review from mmaslankaprv August 9, 2023 20:02
@rockwotj rockwotj force-pushed the rockwood/initial-wasm-engine branch 3 times, most recently from 4460b5b to 059366b Compare August 10, 2023 02:54
@rockwotj
Copy link
Contributor Author

Force push: Added named constants to redpanda Wasm module error numbers

@dotnwat
Copy link
Member

dotnwat commented Aug 16, 2023

Maybe it's worth building a malformed guest so we can verify the error paths are correctly handled? I think that should come in a followup.

great idea, and follow is reasonable

This will be used in the Wasm engine integration and is a useful
abstraction for fibers to interact with a running seastar thread.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Right now util depends on json, so this prevents models from depending
on util

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj rockwotj force-pushed the rockwood/initial-wasm-engine branch from d3d4f61 to c9d1d7b Compare August 16, 2023 17:37
@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 16, 2023

Force push: Added tests for FFI helpers, removed shared library cycle with json -> model -> utils.

dotnwat
dotnwat previously approved these changes Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
tinygo doesn't yet use wasm/wasi1p build tags like upstream go, so just
switch to using tinygo completely as the build tag.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Now that we're building assets as part of CMake

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj
Copy link
Contributor Author

Force pushed to fix a test that changed after the seastar upgrade

@dotnwat dotnwat merged commit 01a317b into redpanda-data:dev Aug 17, 2023
@rockwotj rockwotj deleted the rockwood/initial-wasm-engine branch August 28, 2023 19:11
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 30, 2023
These functions where already defined in redpanda-data#12666 and their encoding.

We also add a stub version of the ABI for IDEs that don't use the tinygo
build tags (and we can build/test this package in standard go).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Sep 5, 2023
These functions where already defined in redpanda-data#12666 and their encoding.

We also add a stub version of the ABI for IDEs that don't use the tinygo
build tags (and we can build/test this package in standard go).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Sep 7, 2023
These functions where already defined in redpanda-data#12666 and their encoding.

We also add a stub version of the ABI for IDEs that don't use the tinygo
build tags (and we can build/test this package in standard go).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants