Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add VRL VM RFC #10011

Merged
merged 5 commits into from
Dec 8, 2021
Merged

chore: add VRL VM RFC #10011

merged 5 commits into from
Dec 8, 2021

Conversation

StephenWakely
Copy link
Contributor

@StephenWakely StephenWakely commented Nov 12, 2021

Ref #9811

Readable version

Signed-off-by: Stephen Wakely fungus.humungus@gmail.com

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
@netlify
Copy link

netlify bot commented Nov 12, 2021

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 24cfba0

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/61941d1f854d52000a5a5130

Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. I'm excited to see this progress. I think the RFC is a good start, but there are some parts still missing. I left some comments that hopefully help flesh this RFC out a bit more.

rfcs/2021-11-12-9811-vrl-vm.md Outdated Show resolved Hide resolved
rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
Comment on lines 80 to 83
The instructions field is a `Vec` of `OpCode` cast to a usize. The reason for
the cast is because not all instructions are `OpCode`. For example the
instructions `[.., Constant, 12, ..]` when evaluated will load the constant
stored in the `values` `Vec` that is found in position 12 onto the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vein of what we discussed during our get-together, I'd like to get as much safety as possible through Rust's type system, including introducing wrapper types to avoid misusing the VM API.

For example, for instructions, we could/should codify the invariants a bit more in the type system using something like:

enum Instruction {
	Opcode(Opcode),
	Literal(LiteralIndex),
}

pub struct LiteralIndex(usize);

There are more examples like that, we don't have to have consensus on all of them in this RFC, but we should have a section that explicitly mentions this as a requirement to successfully implementing the RFC, we can then iterate on the exact shape of those APIs in the implementation PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also keep the index in the OpCode itself like:

  enum OpCode {
   Constant(usize),
   Jump(usize),
   ...
 }

This gives us even more type safety and size_of::<OpCode> doesn't get any bigger. I've added this to the alternatives.

Copy link
Contributor

@JeanMertz JeanMertz Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also keep the index in the OpCode itself like:

I don't see how it gives more type safety, given that the index for one opcode can point to a different store than the index for another opcode. With a wrapper type, you can ensure that the index for one store cannot be mixed with the index into another store.

Also, it might be worth looking into repr(C) and NonZeroUsize and the like, that allow the Rust compiler to add optimizations during compilation to reduce the word count of these wrapper types. I'm not that familiar with how this works, but for example Option<NonZeroUsize> is equal in size to usize because the compiler can use the first bit (the 0 value of the usize) to encode whether the value is a Some or None.

But, I don't think we need to sweat all those minor details in this RFC, I think the goal should be to get a nice performance improvement while still having as much type safety as possible. Any additional enhancements that reduce that type-safety can be filed as separate issues to discuss separately and implement once there's consensus on the cost vs. benefit trade-off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety in the second proposal is in the access to the index. In the first proposal, an opcode that requires a literal requires fetching the next index in the stream, checking if it is in fact a literal, and then accessing the value. In the second case, the literal is just there. We can newtype wrap the literal in the opcode the same way.

I agree, in any case, that this RFC and initial implementation should describe the simplest MVP to prove the promise of some performance gain and then work in optimizations, since we aren't committing ourselves to any particular external representation or interoperability.

pub struct Vm {
instructions: Vec<usize>,
values: Vec<Literal>,
targets: Vec<Variable>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these only be internal targets, or also event (external) targets?

If the former, I think it should be sufficient to store Vec<Ident>, if the latter, we need to look at the existing Target type which covers all potential targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to represent both, but there's no reason why it couldn't be split up into:

internal_targets: Vec<Ident>,
external_targets: Vec<LookupBuf>,

rfcs/2021-11-12-9811-vrl-vm.md Outdated Show resolved Hide resolved
# RFC 9811 - 2021-11-12 - VRL bytecode VM

This RFC proposes implementing a bytecode VM. VRL will be compiled to the bytecode
and executed by this VM, with the aim to significantly improve the performance of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a section to this RFC discussing (and proving) the performance gains we've seen from the early spike, as those gains are what warrants the effort needed to implement this RFC.

rfcs/2021-11-12-9811-vrl-vm.md Outdated Show resolved Hide resolved
Comment on lines +141 to +142
However, VRL is a very simple language, which does also mean the VM will be
simple.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VRL is also still evolving. We're adding iteration soon, and there's no way to know what else we might introduce in the future, so this is somewhat of a moot point, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that even with more features added, the goal of the language will still be to keep things simple.

However, it is true that iteration and in particular lexical scoping will increase the complexity. This will also change the planned approach to iteration, so needs some thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that these complexities in the language necessarily increase the complexity of the VM, just like the complexities in Rust don't necessarily require a more complex CPU instruction set. There will be some new requirements, but I would expect them to be relatively small.

rfcs/2021-11-12-9811-vrl-vm.md Outdated Show resolved Hide resolved
rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
@fuchsnj
Copy link
Member

fuchsnj commented Nov 16, 2021

I would strongly encourage looking into dynamically sized bytecode. This would require the instructions to be a pure Vec<u8>, which means you do lose some of the typing that an enum gives you, but it is almost certainly going to be more performant.

The main benefit of doing this is you get to make the most common opcodes exactly 1 byte. With an enum, the smallest an opcode could feasibly be is 4 bytes, and it could easily be larger. That could make a massive difference in caching. It also allows you to encode inputs directly into the bytecode, so you don't have extra indirection from a separate constant list. (although you might still consider a separate list for larger inputs such as Strings)

@StephenWakely StephenWakely changed the title chore: add VRL bytecode VM RFC chore: add VRL enum VM (VenuM) RFC Nov 16, 2021
risks involved in running VRL with a VM and the measures we will take to
mitigate those risks.

### Out of scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It came to mind that I think it would be good to add "user-facing breaking changes" as out of scope for this RFC. That is, whatever implementation we end up with, there shouldn't be any difference to the user, other than a faster runtime.

If we do want to do optimizations that require a breaking change, I suggest pulling those out into a separate issue, and discuss them separately, so that we don't block any work listed in this RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy evaluation aspect of parameters to function calls could be considered a potentially breaking change, and this proposal does change that behavior, so I'm not sure it is strictly out of scope. It should be out of scope, but I don't know if we can maintain the current lazy evaluation without much higher complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It is very much the plan as a part of the testing to be able to run the current implementation alongside the VM to make sure the output doesn't change in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to add to this, there should be no user facing changes other than VRL is faster. I've talked with Stephen offline a bit about this but I'd like to see us have property tests that assert the equivalent operation of the tree walker and the enum VM. That'll allow us to be sure any future optimization passes don't introduce gross regressions.

rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
Comment on lines +160 to +163
The `ArgumentList` parameter that is passed into `call` will have access to the
parameter stack and the parameter list exposed by the `Function`. This can use
these to return the appropriate values for the `required`, `optional` etc..
functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see a stripped-down example of one of the existing function implementations (f.e. upcase), to get a better sense of how this would work, and if this pattern fits all existing use-cases.

- Currently each stdlib function is responsible for evaluating their own
parameters. This allows parameters to be lazily evaluated. With the Vm, the
parameters will need to be evaluated up front and the stack passed into the
function. This could impact performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be a big downside, given that almost all VRL programs are function-call-heavy.

I'd love to see this fleshed out a bit more, with some real-world examples of functions that would most likely be impacted by this, and how that impact changes their behaviour.

Comment on lines +262 to +264
The issues with using WASM are that it would require data to be serialized in
order to move between Vector and WASM. This would incur a significant
performance penalty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in our meeting, but this isn't actually accurate. Wasm allows both the host and the Wasm module to mutably access the same chunk of memory.

See, for example, the wasmer crate.

It's true, though, that we'll need wrappers to represent our types, but there's also a lot of ongoing work with Wasm interface types to make this simpler.

Still, I agree with the conclusion that this is a step too far for a first iteration, but I want to avoid that conclusion being based on the wrong assumptions.

we wouldn't be able to dedupe the constant, so for example, if the code uses the
same string twice, it would be represented twice in the bytecode.

#### Bitmask the OpCode data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting proposal, and worth exploring in the future, but my initial preference would be to leverage Rust's types and type-safety as much as possible. The suggestion here isn't necessarily "unsafe", but it adds another point where we can mess up (and requires a closed-down API to avoid others from doing the same).

I would probably suggest moving this (and "Include value in the OpCode") out of the "alternatives" section, and into "future work", as I don't think it's either this or the other, but rather we can start safer but potentially a bit slower, and then move towards more "unsafe" solutions that give us an extra few percentages of performance gains.

@JeanMertz
Copy link
Contributor

JeanMertz commented Nov 17, 2021

I would strongly encourage looking into dynamically sized bytecode. This would require the instructions to be a pure Vec<u8>, which means you do lose some of the typing that an enum gives you, but it is almost certainly going to be more performant.

I'm in the opposite camp on this. I agree it might increase the performance gain, but I'd rather take it step-by-step, and start with a more safe (and easier to grok) implementation that leverages Rust's types/type-safety as much as possible.

All other optimizations shouldn't require an extensive rewrite in the future, and can thus be split up into issues that we can tackle as the need arises, similar to how we started with a simple tree-based runtime implementation in the first version because it was "fast enough" for most use-cases at the time, and meant it was easier for us to iterate on the language in the beginning stages.

We aren't at the "the language is done" stage yet (heck, we have a bunch of breaking changes we're likely still to do before we release 1.0), so while I concur that we need this change to get us closer to the ideal performance, we still need to strike a balance, and that balance for me lies at going with a VM as described in this RFC, but leave some performance gains on the table for later, in favour of making it as easy/safe as possible to use/iterate on.

@StephenWakely
Copy link
Contributor Author

I'm in the opposite camp on this. I agree it might increase the performance gain, but I'd rather take it step-by-step, and start with a more safe (and easier to grok) implementation that leverages Rust's types/type-safety as much as possible.

I plan to benchmark both approaches to see how significant the difference is. We can decide from there.

rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
risks involved in running VRL with a VM and the measures we will take to
mitigate those risks.

### Out of scope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy evaluation aspect of parameters to function calls could be considered a potentially breaking change, and this proposal does change that behavior, so I'm not sure it is strictly out of scope. It should be out of scope, but I don't know if we can maintain the current lazy evaluation without much higher complexity.

rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
Literal(LiteralIndex),
}

pub struct LiteralIndex(usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this index to ever be larger than u32? Even u16 is likely reasonable for most use cases, but we'll probably eventually run into problems with that limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine for the vast majority of programs u8 would be sufficient, the remaining would never exceed u16.

This is the appeal of the dynamically sized bytecode VM. We could stick with u8 for most programs and only stretch out over two bytes in the rare situations it's necessary..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this to u16 for now..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth measuring if dynamic sizing has an effect on runtime speed versus complexity, but I'd say initial goal should be to crank a VM out with the simplest possible implementation, whether that's fixed sized representations or dynamic. That said, I would tend to avoid usize since it's a type that changes size depending on machine architecture.

rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved
Comment on lines +138 to +140
Parameters are optional and may not be specified. However, to avoid passing
parameters to the wrong function, an OpCode still needs to be emitted to move
a placeholder value - `None` - to the parameter stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this, you might find value in having a specific opcode to push a None onto the stack.


Check each instruction to ensure:

- each Jump Opcode jumps to a valid location in the bytecode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, given that there are no loops, also check that each jump is a forward jump. I understand that may be short lived, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically we could create different OpCodes for forward and backward jumps which would still allow for this check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even go so far as to say VRL design should be kept so that we can maintain this check.

Comment on lines +141 to +142
However, VRL is a very simple language, which does also mean the VM will be
simple.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that these complexities in the language necessarily increase the complexity of the VM, just like the complexities in Rust don't necessarily require a more complex CPU instruction set. There will be some new requirements, but I would expect them to be relatively small.

Comment on lines +299 to +302
It also means we do not need to store a separate list of constants. On the plus
side, this is one less lookup at runtime to load the constant. On the down side,
we wouldn't be able to dedupe the constant, so for example, if the code uses the
same string twice, it would be represented twice in the bytecode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worse, since the values are boxed, you have now introduced a non-linear indirection to get at the value, which is exactly the kind of memory accesses this RFC is trying to avoid.

@StephenWakely StephenWakely changed the title chore: add VRL enum VM (VenuM) RFC chore: add VRL VM RFC Nov 22, 2021
Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, as a POC/spike, I'm good with what's laid out here.

A lot of the minutiae has been gone over here, but I'm personally more interested in the testing framework (side-by-side VM/AST execution to validate results, property testing, etc) as a way to ensure that the design is meeting the level of functionality and quality of the current AST-based implementation.

We can always tweak and play with opcode sizes, segmented stacks, peephole optimizations, and all sorts of other tricks down the road.

Great work so far.

risks involved in running VRL with a VM and the measures we will take to
mitigate those risks.

### Out of scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to add to this, there should be no user facing changes other than VRL is faster. I've talked with Stephen offline a bit about this but I'd like to see us have property tests that assert the equivalent operation of the tree walker and the enum VM. That'll allow us to be sure any future optimization passes don't introduce gross regressions.

Literal(LiteralIndex),
}

pub struct LiteralIndex(usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth measuring if dynamic sizing has an effect on runtime speed versus complexity, but I'd say initial goal should be to crank a VM out with the simplest possible implementation, whether that's fixed sized representations or dynamic. That said, I would tend to avoid usize since it's a type that changes size depending on machine architecture.


Check each instruction to ensure:

- each Jump Opcode jumps to a valid location in the bytecode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even go so far as to say VRL design should be kept so that we can maintain this check.

Comment on lines +236 to +238
- We lose some safety that we get from the Rust compiler. There will need
to be significant fuzz testing to ensure that the code runs correctly under
all circumstances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core team will help you get the CI infra in place for this, spec out the fuzz tests.

Comment on lines +156 to +158
Current `stdlib` functions are composed of a combination of `compile` and
`resolve` functions. These function will need to be combined into a single
function `call`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to be careful of here is that we don't lose the ability to do work at compile time that can be reused across calls (e.g. pre-compiling regex). In general it'd be nice for functions to be defined directly in terms of the types they operate on, and have the validation and preparation of those values be a separate step that we're able to omit when we have sufficient information to know they're not needed.

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not totally on board with some of the minutia of the approach outlined here, I agree with @tobz that what happens from here (testing framework, validation of results, etc) is more important than bike shedding this document much further.

@jszwedko
Copy link
Member

jszwedko commented Dec 7, 2021

@StephenWakely it looks like this is good to merge?

@StephenWakely StephenWakely merged commit caebaf3 into master Dec 8, 2021
@StephenWakely StephenWakely deleted the stephen/vrl_bytecode_rfc branch December 8, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants