-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Vector more scriptable #1041
Comments
This all looks fantastic! One question:
This means pretty much we have to do all the batching in rust then invoke the handler with the batch. Does this mean it may be harder to do streaming-based sinks with scripting? |
The batch size can be set to 1 or a special option can be added to send individual events instead of arrays of events. |
I think this might benefit from some coordination with #1328. Ideally we want as much of the user-written code as possible to be operating on the topology components instead of individual events. Such operations on entire streams of events can be called vector operations, while operations on individual events can be called scalar operations. In most of the cases the users would be recommended to use the vector operations, resorting to the scalar ones only in very specific cases. This would make it possible to keep most of the flexibility, while maintaining native Rust performance, which is not achievable from any scripting transform. Then most of our components would turn into functions. This means that while we don't have advanced scripting, adding new components only benefits us, as they could be nicely wrapped into highly-performant functions later. For example, there could be a function representing With this approach in mind, I think it might be beneficial to not rush in implementing as much as possible "scalar" APIs, but rather focus on adding high-quality components that can be later represented as "vectorized" functions acting on the streams. |
I absolutely love this.
Big 👍 from me. I wouldn't mind seeing more concrete examples of this to make sure I'm understanding this correctly. |
I was thinking about something like this:
|
Noting, this will be covered via our WASM roadmap to extend Vector's components. The ideas here are good inspiration for the API we expose. |
Closing this since we have the |
In this issue I want to discuss on high level adding scripting APIs to Vector. It might not be the top priority at the moment, but I'm creating this issue now to give us enough time to think about it and discuss it.
Introduction
Goals and scope of this issue
The goal is to define how ideal APIs should look like, even if they cannot be easily implemented at the moment. This would allow us to have the whole picture in advance to ensure that when we implement separate scripting-related features they play together well.
In the text below JavaScript will be used as the scripting language because I'm familiar with it and it can be implemented on top of QuickJS engine (see #721). Most of the ideas described here can be translated to some other scripting language, for example Lua, if we find out that it fits better.
Intended usage of scripting
I want to highlight that scripting is intended to be used for non-standard things and can't replace native sources/transforms/sinks because a scripting language will always be an order of magnitude slower than native Rust code. However, it still could be indispensable to users who need to do something custom.
Additionally, if we actually arrive at the point where scripting is flexible enough, it would be possible to prototype some new features as scripts before actual high-performant and user-friendly implementation in Rust.
Scriptable components
Overview
We can have all three possible types of scriptable components:
Below there are examples of how they can be used.
Config structure
Each of the components needs to have some kind of a handler function which implements the logic. The config could either
load the source from a file
where the path by default is relative to the path of the config file, and not to the current working directory to ensure that running
vector --config <config path>
works with any working directory.or contain inlined source as a string:
I also find it reasonable to allow usage of anonymous handlers for simple cases, where entire source contains (and evaluates to) a single function. In that case the handler field should be skipped and the config could look just like
Sources
The general idea of is similar to #992, but instead of a shell command a JavaScript function is periodically executed and generates either actual event or promise that resolves to an event.
Just a counter with state, can be used for tests:
HTTP API reader with fetch API:
File reader. Here a decision has to be made on filesystem APIs, as they are not standard. This example uses Node-style readFileSync, while QuickJS provides more low-level filesystem functions. In particular, it can be used to read values from
/sys
or/proc
filesystems:It should also be possible to returned multiple events at once by returning an array or no events by returning
null
.Transforms
Scripted transforms are a generalization of #721 with support of promises in addition to normal events. The promise support is mostly needed for the fetch API, which can be used as described below. The promise can resolve to a single event, an array or events, or
null
. Returning actual event (or array ornull
) instead of promises needs to be supported too.Examples:
Currency rate conversion:
Merging events from different sources. For example, there could be two sources, one of which produces events containing current room temperature and another containing current atmospheric pressure. They can be combined using the following config:
with script source looking like
Sinks
While scriptable sinks seems to be less important to me than sources and transforms, I think the main use case here not covered by native sinks can be making multi-step HTTP requests, writing to the filesystem using complex logic, or invoking command-line programs with arguments combined using complex logic. The handler should take an
events
array containing a batch of events and process them either synchronously or asynchronously.Example:
Send HTTP requests requiring temporary authentication tokens:
Proposed API groups to be implemented
From examples and use cases listed above, I think the following list of APIs provided to the user scripts would be useful:
system
/popen
)Questions
The text was updated successfully, but these errors were encountered: