-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add VRL VM RFC #10011
chore: add VRL VM RFC #10011
Conversation
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
✔️ Deploy Preview for vector-project canceled. 🔨 Explore the source changes: 24cfba0 🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/61941d1f854d52000a5a5130 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. I'm excited to see this progress. I think the RFC is a good start, but there are some parts still missing. I left some comments that hopefully help flesh this RFC out a bit more.
rfcs/2021-11-12-9811-vrl-vm.md
Outdated
The instructions field is a `Vec` of `OpCode` cast to a usize. The reason for | ||
the cast is because not all instructions are `OpCode`. For example the | ||
instructions `[.., Constant, 12, ..]` when evaluated will load the constant | ||
stored in the `values` `Vec` that is found in position 12 onto the stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the vein of what we discussed during our get-together, I'd like to get as much safety as possible through Rust's type system, including introducing wrapper types to avoid misusing the VM API.
For example, for instructions, we could/should codify the invariants a bit more in the type system using something like:
enum Instruction {
Opcode(Opcode),
Literal(LiteralIndex),
}
pub struct LiteralIndex(usize);
There are more examples like that, we don't have to have consensus on all of them in this RFC, but we should have a section that explicitly mentions this as a requirement to successfully implementing the RFC, we can then iterate on the exact shape of those APIs in the implementation PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also keep the index in the OpCode itself like:
enum OpCode {
Constant(usize),
Jump(usize),
...
}
This gives us even more type safety and size_of::<OpCode>
doesn't get any bigger. I've added this to the alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also keep the index in the OpCode itself like:
I don't see how it gives more type safety, given that the index for one opcode can point to a different store than the index for another opcode. With a wrapper type, you can ensure that the index for one store cannot be mixed with the index into another store.
Also, it might be worth looking into repr(C)
and NonZeroUsize
and the like, that allow the Rust compiler to add optimizations during compilation to reduce the word count of these wrapper types. I'm not that familiar with how this works, but for example Option<NonZeroUsize>
is equal in size to usize
because the compiler can use the first bit (the 0
value of the usize
) to encode whether the value is a Some
or None
.
But, I don't think we need to sweat all those minor details in this RFC, I think the goal should be to get a nice performance improvement while still having as much type safety as possible. Any additional enhancements that reduce that type-safety can be filed as separate issues to discuss separately and implement once there's consensus on the cost vs. benefit trade-off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safety in the second proposal is in the access to the index. In the first proposal, an opcode that requires a literal requires fetching the next index in the stream, checking if it is in fact a literal, and then accessing the value. In the second case, the literal is just there. We can newtype wrap the literal in the opcode the same way.
I agree, in any case, that this RFC and initial implementation should describe the simplest MVP to prove the promise of some performance gain and then work in optimizations, since we aren't committing ourselves to any particular external representation or interoperability.
pub struct Vm { | ||
instructions: Vec<usize>, | ||
values: Vec<Literal>, | ||
targets: Vec<Variable>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these only be internal targets, or also event (external) targets?
If the former, I think it should be sufficient to store Vec<Ident>
, if the latter, we need to look at the existing Target
type which covers all potential targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to represent both, but there's no reason why it couldn't be split up into:
internal_targets: Vec<Ident>,
external_targets: Vec<LookupBuf>,
rfcs/2021-11-12-9811-vrl-vm.md
Outdated
# RFC 9811 - 2021-11-12 - VRL bytecode VM | ||
|
||
This RFC proposes implementing a bytecode VM. VRL will be compiled to the bytecode | ||
and executed by this VM, with the aim to significantly improve the performance of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a section to this RFC discussing (and proving) the performance gains we've seen from the early spike, as those gains are what warrants the effort needed to implement this RFC.
However, VRL is a very simple language, which does also mean the VM will be | ||
simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VRL is also still evolving. We're adding iteration soon, and there's no way to know what else we might introduce in the future, so this is somewhat of a moot point, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that even with more features added, the goal of the language will still be to keep things simple.
However, it is true that iteration and in particular lexical scoping will increase the complexity. This will also change the planned approach to iteration, so needs some thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious that these complexities in the language necessarily increase the complexity of the VM, just like the complexities in Rust don't necessarily require a more complex CPU instruction set. There will be some new requirements, but I would expect them to be relatively small.
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
I would strongly encourage looking into dynamically sized bytecode. This would require the instructions to be a pure The main benefit of doing this is you get to make the most common opcodes exactly 1 byte. With an enum, the smallest an opcode could feasibly be is 4 bytes, and it could easily be larger. That could make a massive difference in caching. It also allows you to encode inputs directly into the bytecode, so you don't have extra indirection from a separate |
risks involved in running VRL with a VM and the measures we will take to | ||
mitigate those risks. | ||
|
||
### Out of scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It came to mind that I think it would be good to add "user-facing breaking changes" as out of scope for this RFC. That is, whatever implementation we end up with, there shouldn't be any difference to the user, other than a faster runtime.
If we do want to do optimizations that require a breaking change, I suggest pulling those out into a separate issue, and discuss them separately, so that we don't block any work listed in this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy evaluation aspect of parameters to function calls could be considered a potentially breaking change, and this proposal does change that behavior, so I'm not sure it is strictly out of scope. It should be out of scope, but I don't know if we can maintain the current lazy evaluation without much higher complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It is very much the plan as a part of the testing to be able to run the current implementation alongside the VM to make sure the output doesn't change in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to add to this, there should be no user facing changes other than VRL is faster. I've talked with Stephen offline a bit about this but I'd like to see us have property tests that assert the equivalent operation of the tree walker and the enum VM. That'll allow us to be sure any future optimization passes don't introduce gross regressions.
The `ArgumentList` parameter that is passed into `call` will have access to the | ||
parameter stack and the parameter list exposed by the `Function`. This can use | ||
these to return the appropriate values for the `required`, `optional` etc.. | ||
functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see a stripped-down example of one of the existing function implementations (f.e. upcase
), to get a better sense of how this would work, and if this pattern fits all existing use-cases.
- Currently each stdlib function is responsible for evaluating their own | ||
parameters. This allows parameters to be lazily evaluated. With the Vm, the | ||
parameters will need to be evaluated up front and the stack passed into the | ||
function. This could impact performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially be a big downside, given that almost all VRL programs are function-call-heavy.
I'd love to see this fleshed out a bit more, with some real-world examples of functions that would most likely be impacted by this, and how that impact changes their behaviour.
The issues with using WASM are that it would require data to be serialized in | ||
order to move between Vector and WASM. This would incur a significant | ||
performance penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in our meeting, but this isn't actually accurate. Wasm allows both the host and the Wasm module to mutably access the same chunk of memory.
See, for example, the wasmer
crate.
It's true, though, that we'll need wrappers to represent our types, but there's also a lot of ongoing work with Wasm interface types to make this simpler.
Still, I agree with the conclusion that this is a step too far for a first iteration, but I want to avoid that conclusion being based on the wrong assumptions.
we wouldn't be able to dedupe the constant, so for example, if the code uses the | ||
same string twice, it would be represented twice in the bytecode. | ||
|
||
#### Bitmask the OpCode data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting proposal, and worth exploring in the future, but my initial preference would be to leverage Rust's types and type-safety as much as possible. The suggestion here isn't necessarily "unsafe", but it adds another point where we can mess up (and requires a closed-down API to avoid others from doing the same).
I would probably suggest moving this (and "Include value in the OpCode") out of the "alternatives" section, and into "future work", as I don't think it's either this or the other, but rather we can start safer but potentially a bit slower, and then move towards more "unsafe" solutions that give us an extra few percentages of performance gains.
I'm in the opposite camp on this. I agree it might increase the performance gain, but I'd rather take it step-by-step, and start with a more safe (and easier to grok) implementation that leverages Rust's types/type-safety as much as possible. All other optimizations shouldn't require an extensive rewrite in the future, and can thus be split up into issues that we can tackle as the need arises, similar to how we started with a simple tree-based runtime implementation in the first version because it was "fast enough" for most use-cases at the time, and meant it was easier for us to iterate on the language in the beginning stages. We aren't at the "the language is done" stage yet (heck, we have a bunch of breaking changes we're likely still to do before we release 1.0), so while I concur that we need this change to get us closer to the ideal performance, we still need to strike a balance, and that balance for me lies at going with a VM as described in this RFC, but leave some performance gains on the table for later, in favour of making it as easy/safe as possible to use/iterate on. |
I plan to benchmark both approaches to see how significant the difference is. We can decide from there. |
risks involved in running VRL with a VM and the measures we will take to | ||
mitigate those risks. | ||
|
||
### Out of scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy evaluation aspect of parameters to function calls could be considered a potentially breaking change, and this proposal does change that behavior, so I'm not sure it is strictly out of scope. It should be out of scope, but I don't know if we can maintain the current lazy evaluation without much higher complexity.
Literal(LiteralIndex), | ||
} | ||
|
||
pub struct LiteralIndex(usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect this index to ever be larger than u32
? Even u16
is likely reasonable for most use cases, but we'll probably eventually run into problems with that limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine for the vast majority of programs u8
would be sufficient, the remaining would never exceed u16
.
This is the appeal of the dynamically sized bytecode VM. We could stick with u8
for most programs and only stretch out over two bytes in the rare situations it's necessary..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this to u16
for now..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth measuring if dynamic sizing has an effect on runtime speed versus complexity, but I'd say initial goal should be to crank a VM out with the simplest possible implementation, whether that's fixed sized representations or dynamic. That said, I would tend to avoid usize
since it's a type that changes size depending on machine architecture.
Parameters are optional and may not be specified. However, to avoid passing | ||
parameters to the wrong function, an OpCode still needs to be emitted to move | ||
a placeholder value - `None` - to the parameter stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this, you might find value in having a specific opcode to push a None
onto the stack.
|
||
Check each instruction to ensure: | ||
|
||
- each Jump Opcode jumps to a valid location in the bytecode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, given that there are no loops, also check that each jump is a forward jump. I understand that may be short lived, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically we could create different OpCodes for forward and backward jumps which would still allow for this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even go so far as to say VRL design should be kept so that we can maintain this check.
However, VRL is a very simple language, which does also mean the VM will be | ||
simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious that these complexities in the language necessarily increase the complexity of the VM, just like the complexities in Rust don't necessarily require a more complex CPU instruction set. There will be some new requirements, but I would expect them to be relatively small.
It also means we do not need to store a separate list of constants. On the plus | ||
side, this is one less lookup at runtime to load the constant. On the down side, | ||
we wouldn't be able to dedupe the constant, so for example, if the code uses the | ||
same string twice, it would be represented twice in the bytecode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worse, since the values are boxed, you have now introduced a non-linear indirection to get at the value, which is exactly the kind of memory accesses this RFC is trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, as a POC/spike, I'm good with what's laid out here.
A lot of the minutiae has been gone over here, but I'm personally more interested in the testing framework (side-by-side VM/AST execution to validate results, property testing, etc) as a way to ensure that the design is meeting the level of functionality and quality of the current AST-based implementation.
We can always tweak and play with opcode sizes, segmented stacks, peephole optimizations, and all sorts of other tricks down the road.
Great work so far.
risks involved in running VRL with a VM and the measures we will take to | ||
mitigate those risks. | ||
|
||
### Out of scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to add to this, there should be no user facing changes other than VRL is faster. I've talked with Stephen offline a bit about this but I'd like to see us have property tests that assert the equivalent operation of the tree walker and the enum VM. That'll allow us to be sure any future optimization passes don't introduce gross regressions.
Literal(LiteralIndex), | ||
} | ||
|
||
pub struct LiteralIndex(usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth measuring if dynamic sizing has an effect on runtime speed versus complexity, but I'd say initial goal should be to crank a VM out with the simplest possible implementation, whether that's fixed sized representations or dynamic. That said, I would tend to avoid usize
since it's a type that changes size depending on machine architecture.
|
||
Check each instruction to ensure: | ||
|
||
- each Jump Opcode jumps to a valid location in the bytecode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even go so far as to say VRL design should be kept so that we can maintain this check.
- We lose some safety that we get from the Rust compiler. There will need | ||
to be significant fuzz testing to ensure that the code runs correctly under | ||
all circumstances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core team will help you get the CI infra in place for this, spec out the fuzz tests.
Current `stdlib` functions are composed of a combination of `compile` and | ||
`resolve` functions. These function will need to be combined into a single | ||
function `call`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to be careful of here is that we don't lose the ability to do work at compile time that can be reused across calls (e.g. pre-compiling regex). In general it'd be nice for functions to be defined directly in terms of the types they operate on, and have the validation and preparation of those values be a separate step that we're able to omit when we have sufficient information to know they're not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm not totally on board with some of the minutia of the approach outlined here, I agree with @tobz that what happens from here (testing framework, validation of results, etc) is more important than bike shedding this document much further.
@StephenWakely it looks like this is good to merge? |
Ref #9811
Readable version
Signed-off-by: Stephen Wakely fungus.humungus@gmail.com