Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom transforms design investigation #1071

Closed
rukai opened this issue Mar 9, 2023 · 2 comments · Fixed by #1160
Closed

Custom transforms design investigation #1071

rukai opened this issue Mar 9, 2023 · 2 comments · Fixed by #1160

Comments

@rukai
Copy link
Member

rukai commented Mar 9, 2023

We have long held the idea that shotover needs some kind of user writable custom transforms.
But we've never known what it should actually look like.

Some of the possibilities are:

  • dynamically linked libraries (a plugin system)
  • wasm transforms
  • some kind of embedded scripting language (lua?, rune?)
  • statically linked libraries (the user compiles their custom transform directly into shotover)

Hard requirements

  • Plugins must be at least writable in rust
  • Plugins must have performance comparable to internal transforms

Dynamically linked (C ABI) or wasm

I've joined these two approaches into one section because they largely force shotovers design into the same shape.
The one difference is that we would need Dynamically linking to get the OS access required for the kafka IO accelerator project.

For rust to rust FFI over we should use this crate: https://github.com/rodrimati1992/abi_stable_crates/
As I understand it, going lower level than this would just result in us reimplementing similar abstractions.

I found the readme really confusing though, so I suggest taking a look at this example to better understand what using this library looks like.
Key points to note:

  • Lots of boilerplate with repr(c), externs, #[derive(.)]. And even still I'm not sure how much type safety we actually get to prevent us from hitting UB? If you say load an outdated or wrong library that is definitely UB.
  • No access to stdlib types, need to use reimplemented Strings, Vec, Option etc.

This is very similar to the restrictions of working with wasm.

The core problem when working with these restrictions is figuring out how we are going to pass messages into the plugin.
We have complex AST's for each protocol and the AST relies on stdlib types which cant go across the dynamic lib boundary.
I think the simplest solution is to minimize the surface area of our exposed API by taking the Message struct and removing the AST and exposing the message as pure raw bytes.
Then we separately expose our Frame struct and all its parsers and encoders as a separate crate which the plugin can statically link to.
The plugin can then take the C ABI message object extract the raw bytes, perform its own parsing and interact with the complex ast through a nice statically linked rusty API.
The plugin will be expected to always reencode any changes to the frame it wants to make permanent before passing the message on.

This has a large performance impact as each call to the transform requires the message to be reparsed and reencoded.
If the topology.yaml contains only 2 transform (the plugin + the sink) then the cost shouldnt be too bad as the sink and sources tend be good about avoiding unnecessarily parsing the messages.
But as soon as we introduce multiple transforms or even worse multiple plugin transforms then we will take a large hit.
This approach will also increase the size of each plugin on disk (and cpu cache usage!) due to duplicating all of our parsing logic across plugins.

Another fun unsolved problem is hooking up tokio tracing from the plugin to shotover.

Dynamically linked (rust ABI)

The rust abi is unstable, so shotover and a plugin compiled on different versions of rustc are incompatible.
This approach would help eliminate some of the problems with C ABI dynamic linking or wasm.
It is however not clear to me how many problems would be solved by this as using the rust ABI for dynamic linking is very untrodden ground, I cant find any examples of people actually using it!
I do at least know that we will ultimately have to go through rusts FFI in some way to access the library.
The fact that no one uses this means there could also be some scary issues waiting for us that no one has hit yet.

The core issue with the rust ABI is its incompatible across rustc releases.
This means that each shotover release will need its plugins to be compiled with the same rustc version.

Keeping shotover stuck on a single rustc forever is absolutely not an option as many dependencies rely on features from the latest releases.

A possible solution is expecting plugins to be recompiled for every shotover release.
This roughly matches up since shotover wont release as often as rustc and we always immediately bump our rustc version.
Plugins could then be exposed over HTTP at an address like <plugin_name>-<plugin_version>-<shotover_version>.so allowing shotover to fetch the compatible version itself.
This has a multiplicative blow up issue of having to compile every release of the plugin for every release of shotover, not to mention having to compile new binaries for every plugin version whenever a new shotover version is released.
And I think at this point we have really defeated the entire point of a plugin.
We SHOULD be able to update plugins and shotover independently.

scripting

I am going to ignore scripting for now as it is a hard requirement that we have a high performance solution.
Theoretically we could introduce scripting at a later date for low performance needs but it would require a similar message refactor as described in the Dynamaically linked (C ABI) or wasm section.

Statically linked

In order to statically link the transform into shotover we would create a crate that compiles shotover and the plugin.

Our main.rs

use shotover_proxy::runner::Runner;

/// this module contains a transform defined in the same way as internal transform are
mod redis_get_rewrite;

fn main() {
    Runner::new().run_block()
}

The built shotover can then use the statically linked transform just as it would any internal transform.
Here is a yaml making use of the statically linked RedisGetRewrite transform.

---
sources:
  redis_prod:
    Redis:
      listen_addr: "127.0.0.1:6379"
chain_config:
  redis_chain:
    - RedisGetRewrite:
        result: "Rewritten value"
    - RedisSinkSingle:
        remote_address: "127.0.0.1:1111"
        connect_timeout_ms: 3000
source_to_chain_mapping:
  redis_prod: redis_chain

I've already proved this approach works, the only remaining steps would be:

  • landing: Make TransformConfig into a trait #1064
  • adding a Custom(Box<dyn TransformTrait>) variant to the Transforms enum, that way we allow for custom transforms without adding the dynamic dispatch hit to our internal transforms.
  • tweaking the shotover_proxy::runner::Runner API that we expose to users.
  • doing a pass over what types and methods we expose in our public API, we want to be a lot more conservative to avoid future breaking changes.

When you consider modern deployment practices of immutable infrastructure, this approach shouldn't introduce much friction over a plugin approach.

  • Rust is super easy to build with just a cargo build and the rust version can be pinned in the rust-toolchain.toml
  • One cargo update PR will pull in security patches for both shotover and the custom transforms

sharing statically linked custom transforms

transforms can exist in their own crate and be uploaded to crates.io
This means crates.io can act as a plugin registry.

The user sets a Cargo.toml with just:

shotover_proxy = "1.0"
foo_transform = "2.0"
bar_transform = "0.1"

then they can just set their main.rs to:

use shotover_proxy::runner::Runner;

/// Need to import the configs to include them in the binary and allow typetag to do its magic
use foo_transform::FooTransformConfig;
use bar_transform::BarTransformConfig;

fn main() {
    Runner::new().run_block()
}

Far beyond the MVP we could take advantage of this by building a frontend around this process.
There are many ways to go about it some ideas:

  • build a webapp that displays shotover custom tranform crates by querying crates.io for dependencies of shotover and/or crates matching a specific tag
  • CLI tool that takes a shotover version number and topology.yaml and spits out a custom shotover binary with any custom transforms built into it.
    • this shouldnt be hard to implement, just need to:
      • find all the transforms listed in the topology.yaml that are not known internal ones
      • query crates.io for these transforms
      • find the latest release of each transform that depends on the specified shotover version
      • generate a crate that uses each transform
      • then compile in release and move the binary to the local directory.

cargo also supports private registries enabling us to run our own registry if desired.

Version mismatch handling

If the user has the following setup:

shotover_proxy = "2.0"
bar_transform = "0.1" # depends on shotover 1.0

Then the bar_transform will just be silently unused, and the user will only find out when they try to load a topology.yaml using the plugin and shotover complains that the transform does not exist.

This is not ideal and we can fix by setting the links field.
While its designed for specifying links to non rust dependencies, setting it will not cause any problems and will enforce at compile time that only one version of shotover is in the dependency tree.

Possibly the error message would be a bit confusing still and we may be able to get a better error message with an alternate approach:
Have a macro generate the Config enum, then have shotover_proxy::Runner be generic over the enum, that way we'll get a type error if there is a version mismatch, and the API still remains reasonably clean.
As a bonus it will also make including custom transforms less magic, in rust it feels very strange that having or not having a use statement alone can influence runtime behaviour.

@benbromhead
Copy link
Member

benbromhead commented Mar 20, 2023

Thanks for your patience on this.

I think given the current state of things we should probably just keep to the static linking path as the main way to do things (as loath as I am to punt the plugin concept further down the road).

  • Let's put together a bit of documentation about using a cargo.toml as a lightweight plugin system (essentially what you wrote above). We can look into some tooling around this when more folk use it.
  • If we have some time I would still be super interested to see an experimental WASM plugin transform... as slow as it might be. This would also have a nice harmony with introducing WebSocket support for a number of protocols we support, but this is not high priority right now.
  • If a transform really needs to access a shared dynamic library, we can just leave that as an exercise for the transform author, but shotover won't be responsible for supporting that dynamically linked library as a "plugin" and the transform will just either be in tree... or as a seperate rust lib using the method you pointed to above.

One question that is not on the critical path to this discussion:

  • What are your thoughts on the ABI safe primitives provided in https://docs.rs/abi_stable/0.11.1/abi_stable/std_types/index.html ? I really don't want to stray away from the stdlib on these given the fundamental nature of them, but if there was some way to leverage these to minimise parsing on the plugin side (e.g. if these can convert between stdlib types for "free").

@rukai
Copy link
Member Author

rukai commented Mar 20, 2023

Sounds good to me, thanks for reviewing.

While both of these are many years off, the rust team is exploring what a stable ABI could look like and wasm is working towards a more typed ABI between host and plugin so one day one of those could become a possibility for us.

What are your thoughts on the ABI safe primitives provided in https://docs.rs/abi_stable/0.11.1/abi_stable/std_types/index.html ? I really don't want to stray away from the stdlib on these given the fundamental nature of them, but if there was some way to leverage these to minimise parsing on the plugin side (e.g. if these can convert between stdlib types for "free").

You can see for example the into_vec method here: https://docs.rs/abi_stable/0.11.1/abi_stable/std_types/struct.RVec.html#method.into_vec
It is "free" to convert to a vec within the plugin that created it, but as soon as it goes from host -> plugin or plugin -> host or even plugin1 -> plugin2 it needs to allocate to perform the conversion.
This is not to mention the development cost of recreating the large tree of structs that are provided by cassandra-protocol, kafka-protocol within these abi_stable::std_types types and then implementing the logic to perform the conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants