chore: add VRL VM RFC #10011

StephenWakely · 2021-11-12T11:24:55Z

Ref #9811

Readable version

Signed-off-by: Stephen Wakely fungus.humungus@gmail.com

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>

netlify · 2021-11-12T11:25:03Z

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 24cfba0

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/61941d1f854d52000a5a5130

JeanMertz

Awesome. I'm excited to see this progress. I think the RFC is a good start, but there are some parts still missing. I left some comments that hopefully help flesh this RFC out a bit more.

rfcs/2021-11-12-9811-vrl-vm.md

JeanMertz · 2021-11-12T21:07:15Z

rfcs/2021-11-12-9811-vrl-vm.md

+The instructions field is a `Vec` of `OpCode` cast to a usize. The reason for
+the cast is because not all instructions are `OpCode`. For example the
+instructions `[.., Constant, 12, ..]` when evaluated will load the constant
+stored in the `values` `Vec` that is found in position 12 onto the stack.


In the vein of what we discussed during our get-together, I'd like to get as much safety as possible through Rust's type system, including introducing wrapper types to avoid misusing the VM API.

For example, for instructions, we could/should codify the invariants a bit more in the type system using something like:

enum Instruction { Opcode(Opcode), Literal(LiteralIndex), } pub struct LiteralIndex(usize);

There are more examples like that, we don't have to have consensus on all of them in this RFC, but we should have a section that explicitly mentions this as a requirement to successfully implementing the RFC, we can then iterate on the exact shape of those APIs in the implementation PRs.

We could also keep the index in the OpCode itself like:

enum OpCode { Constant(usize), Jump(usize), ... }

This gives us even more type safety and size_of::<OpCode> doesn't get any bigger. I've added this to the alternatives.

We could also keep the index in the OpCode itself like:

I don't see how it gives more type safety, given that the index for one opcode can point to a different store than the index for another opcode. With a wrapper type, you can ensure that the index for one store cannot be mixed with the index into another store.

Also, it might be worth looking into repr(C) and NonZeroUsize and the like, that allow the Rust compiler to add optimizations during compilation to reduce the word count of these wrapper types. I'm not that familiar with how this works, but for example Option<NonZeroUsize> is equal in size to usize because the compiler can use the first bit (the 0 value of the usize) to encode whether the value is a Some or None.

But, I don't think we need to sweat all those minor details in this RFC, I think the goal should be to get a nice performance improvement while still having as much type safety as possible. Any additional enhancements that reduce that type-safety can be filed as separate issues to discuss separately and implement once there's consensus on the cost vs. benefit trade-off.

The safety in the second proposal is in the access to the index. In the first proposal, an opcode that requires a literal requires fetching the next index in the stream, checking if it is in fact a literal, and then accessing the value. In the second case, the literal is just there. We can newtype wrap the literal in the opcode the same way.

I agree, in any case, that this RFC and initial implementation should describe the simplest MVP to prove the promise of some performance gain and then work in optimizations, since we aren't committing ourselves to any particular external representation or interoperability.

JeanMertz · 2021-11-12T21:15:18Z

rfcs/2021-11-12-9811-vrl-vm.md

+pub struct Vm {
+    instructions: Vec<usize>,
+    values: Vec<Literal>,
+    targets: Vec<Variable>,


Will these only be internal targets, or also event (external) targets?

If the former, I think it should be sufficient to store Vec<Ident>, if the latter, we need to look at the existing Target type which covers all potential targets.

We will need to represent both, but there's no reason why it couldn't be split up into:

internal_targets: Vec<Ident>, external_targets: Vec<LookupBuf>,

rfcs/2021-11-12-9811-vrl-vm.md

JeanMertz · 2021-11-12T21:32:31Z

rfcs/2021-11-12-9811-vrl-vm.md

+# RFC 9811 - 2021-11-12 - VRL bytecode VM
+
+This RFC proposes implementing a bytecode VM. VRL will be compiled to the bytecode
+and executed by this VM, with the aim to significantly improve the performance of


We should add a section to this RFC discussing (and proving) the performance gains we've seen from the early spike, as those gains are what warrants the effort needed to implement this RFC.

rfcs/2021-11-12-9811-vrl-vm.md

JeanMertz · 2021-11-12T21:35:06Z

rfcs/2021-11-12-9811-vrl-vm.md

+  However, VRL is a very simple language, which does also mean the VM will be
+  simple.


VRL is also still evolving. We're adding iteration soon, and there's no way to know what else we might introduce in the future, so this is somewhat of a moot point, I think.

I imagine that even with more features added, the goal of the language will still be to keep things simple.

However, it is true that iteration and in particular lexical scoping will increase the complexity. This will also change the planned approach to iteration, so needs some thought.

It's not obvious that these complexities in the language necessarily increase the complexity of the VM, just like the complexities in Rust don't necessarily require a more complex CPU instruction set. There will be some new requirements, but I would expect them to be relatively small.

rfcs/2021-11-12-9811-vrl-vm.md

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>

rfcs/2021-11-12-9811-vrl-vm.md

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>

fuchsnj · 2021-11-16T21:05:38Z

I would strongly encourage looking into dynamically sized bytecode. This would require the instructions to be a pure Vec<u8>, which means you do lose some of the typing that an enum gives you, but it is almost certainly going to be more performant.

The main benefit of doing this is you get to make the most common opcodes exactly 1 byte. With an enum, the smallest an opcode could feasibly be is 4 bytes, and it could easily be larger. That could make a massive difference in caching. It also allows you to encode inputs directly into the bytecode, so you don't have extra indirection from a separate constant list. (although you might still consider a separate list for larger inputs such as Strings)

JeanMertz · 2021-11-17T09:30:28Z

rfcs/2021-11-12-9811-vrl-vm.md

+risks involved in running VRL with a VM and the measures we will take to
+mitigate those risks.
+
+### Out of scope


It came to mind that I think it would be good to add "user-facing breaking changes" as out of scope for this RFC. That is, whatever implementation we end up with, there shouldn't be any difference to the user, other than a faster runtime.

If we do want to do optimizations that require a breaking change, I suggest pulling those out into a separate issue, and discuss them separately, so that we don't block any work listed in this RFC.

The lazy evaluation aspect of parameters to function calls could be considered a potentially breaking change, and this proposal does change that behavior, so I'm not sure it is strictly out of scope. It should be out of scope, but I don't know if we can maintain the current lazy evaluation without much higher complexity.

Yes! It is very much the plan as a part of the testing to be able to run the current implementation alongside the VM to make sure the output doesn't change in any way.

Yes, to add to this, there should be no user facing changes other than VRL is faster. I've talked with Stephen offline a bit about this but I'd like to see us have property tests that assert the equivalent operation of the tree walker and the enum VM. That'll allow us to be sure any future optimization passes don't introduce gross regressions.

rfcs/2021-11-12-9811-vrl-vm.md

JeanMertz · 2021-11-17T09:36:47Z

rfcs/2021-11-12-9811-vrl-vm.md

+The `ArgumentList` parameter that is passed into `call` will have access to the
+parameter stack and the parameter list exposed by the `Function`. This can use
+these to return the appropriate values for the `required`, `optional` etc..
+functions.


I'd love to see a stripped-down example of one of the existing function implementations (f.e. upcase), to get a better sense of how this would work, and if this pattern fits all existing use-cases.

JeanMertz · 2021-11-17T09:40:12Z

rfcs/2021-11-12-9811-vrl-vm.md

+- Currently each stdlib function is responsible for evaluating their own
+  parameters. This allows parameters to be lazily evaluated. With the Vm, the
+  parameters will need to be evaluated up front and the stack passed into the
+  function. This could impact performance.


This could potentially be a big downside, given that almost all VRL programs are function-call-heavy.

I'd love to see this fleshed out a bit more, with some real-world examples of functions that would most likely be impacted by this, and how that impact changes their behaviour.

JeanMertz · 2021-11-17T09:45:39Z

rfcs/2021-11-12-9811-vrl-vm.md

+The issues with using WASM are that it would require data to be serialized in
+order to move between Vector and WASM. This would incur a significant
+performance penalty.


I mentioned this in our meeting, but this isn't actually accurate. Wasm allows both the host and the Wasm module to mutably access the same chunk of memory.

See, for example, the wasmer crate.

It's true, though, that we'll need wrappers to represent our types, but there's also a lot of ongoing work with Wasm interface types to make this simpler.

Still, I agree with the conclusion that this is a step too far for a first iteration, but I want to avoid that conclusion being based on the wrong assumptions.

JeanMertz · 2021-11-17T09:48:28Z

rfcs/2021-11-12-9811-vrl-vm.md

+we wouldn't be able to dedupe the constant, so for example, if the code uses the
+same string twice, it would be represented twice in the bytecode.
+
+#### Bitmask the OpCode data


This is an interesting proposal, and worth exploring in the future, but my initial preference would be to leverage Rust's types and type-safety as much as possible. The suggestion here isn't necessarily "unsafe", but it adds another point where we can mess up (and requires a closed-down API to avoid others from doing the same).

I would probably suggest moving this (and "Include value in the OpCode") out of the "alternatives" section, and into "future work", as I don't think it's either this or the other, but rather we can start safer but potentially a bit slower, and then move towards more "unsafe" solutions that give us an extra few percentages of performance gains.

JeanMertz · 2021-11-17T09:53:02Z

I would strongly encourage looking into dynamically sized bytecode. This would require the instructions to be a pure Vec<u8>, which means you do lose some of the typing that an enum gives you, but it is almost certainly going to be more performant.

I'm in the opposite camp on this. I agree it might increase the performance gain, but I'd rather take it step-by-step, and start with a more safe (and easier to grok) implementation that leverages Rust's types/type-safety as much as possible.

All other optimizations shouldn't require an extensive rewrite in the future, and can thus be split up into issues that we can tackle as the need arises, similar to how we started with a simple tree-based runtime implementation in the first version because it was "fast enough" for most use-cases at the time, and meant it was easier for us to iterate on the language in the beginning stages.

We aren't at the "the language is done" stage yet (heck, we have a bunch of breaking changes we're likely still to do before we release 1.0), so while I concur that we need this change to get us closer to the ideal performance, we still need to strike a balance, and that balance for me lies at going with a VM as described in this RFC, but leave some performance gains on the table for later, in favour of making it as easy/safe as possible to use/iterate on.

StephenWakely · 2021-11-17T16:10:22Z

I'm in the opposite camp on this. I agree it might increase the performance gain, but I'd rather take it step-by-step, and start with a more safe (and easier to grok) implementation that leverages Rust's types/type-safety as much as possible.

I plan to benchmark both approaches to see how significant the difference is. We can decide from there.

rfcs/2021-11-12-9811-vrl-vm.md

bruceg · 2021-11-17T14:52:01Z

rfcs/2021-11-12-9811-vrl-vm.md

+risks involved in running VRL with a VM and the measures we will take to
+mitigate those risks.
+
+### Out of scope


The lazy evaluation aspect of parameters to function calls could be considered a potentially breaking change, and this proposal does change that behavior, so I'm not sure it is strictly out of scope. It should be out of scope, but I don't know if we can maintain the current lazy evaluation without much higher complexity.

rfcs/2021-11-12-9811-vrl-vm.md

bruceg · 2021-11-17T18:09:21Z

rfcs/2021-11-12-9811-vrl-vm.md

+    Literal(LiteralIndex),
+}
+
+pub struct LiteralIndex(usize);


Do we expect this index to ever be larger than u32? Even u16 is likely reasonable for most use cases, but we'll probably eventually run into problems with that limit.

I imagine for the vast majority of programs u8 would be sufficient, the remaining would never exceed u16.

This is the appeal of the dynamically sized bytecode VM. We could stick with u8 for most programs and only stretch out over two bytes in the rare situations it's necessary..

I'll update this to u16 for now..

It's worth measuring if dynamic sizing has an effect on runtime speed versus complexity, but I'd say initial goal should be to crank a VM out with the simplest possible implementation, whether that's fixed sized representations or dynamic. That said, I would tend to avoid usize since it's a type that changes size depending on machine architecture.

rfcs/2021-11-12-9811-vrl-vm.md

bruceg · 2021-11-18T02:26:19Z

rfcs/2021-11-12-9811-vrl-vm.md

+Parameters are optional and may not be specified. However, to avoid passing
+parameters to the wrong function, an OpCode still needs to be emitted to move
+a placeholder value - `None` - to the parameter stack.


Given this, you might find value in having a specific opcode to push a None onto the stack.

bruceg · 2021-11-18T02:31:58Z

rfcs/2021-11-12-9811-vrl-vm.md

+
+Check each instruction to ensure:
+
+- each Jump Opcode jumps to a valid location in the bytecode.


Currently, given that there are no loops, also check that each jump is a forward jump. I understand that may be short lived, though.

Theoretically we could create different OpCodes for forward and backward jumps which would still allow for this check.

I would even go so far as to say VRL design should be kept so that we can maintain this check.

bruceg · 2021-11-18T02:35:02Z

rfcs/2021-11-12-9811-vrl-vm.md

+  However, VRL is a very simple language, which does also mean the VM will be
+  simple.


It's not obvious that these complexities in the language necessarily increase the complexity of the VM, just like the complexities in Rust don't necessarily require a more complex CPU instruction set. There will be some new requirements, but I would expect them to be relatively small.

bruceg · 2021-11-18T02:37:28Z

rfcs/2021-11-12-9811-vrl-vm.md

+It also means we do not need to store a separate list of constants. On the plus
+side, this is one less lookup at runtime to load the constant. On the down side,
+we wouldn't be able to dedupe the constant, so for example, if the code uses the
+same string twice, it would be represented twice in the bytecode.


Worse, since the values are boxed, you have now introduced a non-linear indirection to get at the value, which is exactly the kind of memory accesses this RFC is trying to avoid.

tobz

To be honest, as a POC/spike, I'm good with what's laid out here.

A lot of the minutiae has been gone over here, but I'm personally more interested in the testing framework (side-by-side VM/AST execution to validate results, property testing, etc) as a way to ensure that the design is meeting the level of functionality and quality of the current AST-based implementation.

We can always tweak and play with opcode sizes, segmented stacks, peephole optimizations, and all sorts of other tricks down the road.

Great work so far.

blt · 2021-12-03T19:29:07Z

rfcs/2021-11-12-9811-vrl-vm.md

+risks involved in running VRL with a VM and the measures we will take to
+mitigate those risks.
+
+### Out of scope


Yes, to add to this, there should be no user facing changes other than VRL is faster. I've talked with Stephen offline a bit about this but I'd like to see us have property tests that assert the equivalent operation of the tree walker and the enum VM. That'll allow us to be sure any future optimization passes don't introduce gross regressions.

blt · 2021-12-03T19:31:01Z

rfcs/2021-11-12-9811-vrl-vm.md

+    Literal(LiteralIndex),
+}
+
+pub struct LiteralIndex(usize);


It's worth measuring if dynamic sizing has an effect on runtime speed versus complexity, but I'd say initial goal should be to crank a VM out with the simplest possible implementation, whether that's fixed sized representations or dynamic. That said, I would tend to avoid usize since it's a type that changes size depending on machine architecture.

blt · 2021-12-03T19:32:38Z

rfcs/2021-11-12-9811-vrl-vm.md

+
+Check each instruction to ensure:
+
+- each Jump Opcode jumps to a valid location in the bytecode.


I would even go so far as to say VRL design should be kept so that we can maintain this check.

blt · 2021-12-03T19:33:50Z

rfcs/2021-11-12-9811-vrl-vm.md

+- We lose some safety that we get from the Rust compiler. There will need
+  to be significant fuzz testing to ensure that the code runs correctly under
+  all circumstances.


Core team will help you get the CI infra in place for this, spec out the fuzz tests.

lukesteensen · 2021-12-03T20:38:45Z

rfcs/2021-11-12-9811-vrl-vm.md

+Current `stdlib` functions are composed of a combination of `compile` and
+`resolve` functions. These function will need to be combined into a single
+function `call`.


One thing to be careful of here is that we don't lose the ability to do work at compile time that can be reused across calls (e.g. pre-compiling regex). In general it'd be nice for functions to be defined directly in terms of the types they operate on, and have the validation and preparation of those values be a separate step that we're able to omit when we have sufficient information to know they're not needed.

bruceg

While I'm not totally on board with some of the minutia of the approach outlined here, I agree with @tobz that what happens from here (testing framework, validation of results, etc) is more important than bike shedding this document much further.

jszwedko · 2021-12-07T19:59:13Z

@StephenWakely it looks like this is good to merge?

Added bytecode RFC

6f2775e

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>

github-actions bot added the domain: rfc label Nov 12, 2021

JeanMertz suggested changes Nov 12, 2021

View reviewed changes

StephenWakely added 3 commits November 15, 2021 13:59

Add more details

ac96206

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>

Added feedback from Jean

d46fc07

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>

Fix lint error

8d1288e

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>

StephenWakely requested a review from JeanMertz November 16, 2021 18:42

StephenWakely mentioned this pull request Nov 16, 2021

chore: Add RFC for improving Vrl performance #9812

Closed

StephenWakely requested a review from bruceg November 16, 2021 18:54

fuchsnj reviewed Nov 16, 2021

View reviewed changes

rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved

fuchsnj reviewed Nov 16, 2021

View reviewed changes

rfcs/2021-11-12-9811-vrl-vm.md Show resolved Hide resolved

Further feedback following discussion with team

24cfba0

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>

StephenWakely changed the title ~~chore: add VRL bytecode VM RFC~~ chore: add VRL enum VM (VenuM) RFC Nov 16, 2021

JeanMertz reviewed Nov 17, 2021

View reviewed changes

bruceg reviewed Nov 18, 2021

View reviewed changes

StephenWakely changed the title ~~chore: add VRL enum VM (VenuM) RFC~~ chore: add VRL VM RFC Nov 22, 2021

jszwedko requested review from JeanMertz, bruceg and fuchsnj December 3, 2021 19:06

tobz approved these changes Dec 3, 2021

View reviewed changes

blt approved these changes Dec 3, 2021

View reviewed changes

JeanMertz approved these changes Dec 3, 2021

View reviewed changes

lukesteensen approved these changes Dec 3, 2021

View reviewed changes

bruceg approved these changes Dec 3, 2021

View reviewed changes

StephenWakely merged commit caebaf3 into master Dec 8, 2021

StephenWakely deleted the stephen/vrl_bytecode_rfc branch December 8, 2021 09:50

		However, VRL is a very simple language, which does also mean the VM will be
		simple.


		Check each instruction to ensure:

		- each Jump Opcode jumps to a valid location in the bytecode.

chore: add VRL VM RFC #10011

chore: add VRL VM RFC #10011

Conversation

StephenWakely commented Nov 12, 2021 • edited Loading

netlify bot commented Nov 12, 2021 • edited Loading

JeanMertz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeanMertz Nov 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fuchsnj commented Nov 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeanMertz commented Nov 17, 2021 • edited Loading

StephenWakely commented Nov 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bruceg left a comment

Choose a reason for hiding this comment

jszwedko commented Dec 7, 2021

StephenWakely commented Nov 12, 2021 •

edited

Loading

netlify bot commented Nov 12, 2021 •

edited

Loading

JeanMertz Nov 17, 2021 •

edited

Loading

JeanMertz commented Nov 17, 2021 •

edited

Loading

tobz left a comment •

edited

Loading