Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: Use new Lookup accessors on LogEvents and entire internal API. #5374

Closed
wants to merge 156 commits into from

Conversation

Hoverbear
Copy link
Contributor

@Hoverbear Hoverbear commented Dec 3, 2020

Closes #2843
Closes #2845
Closes #4005

This is a ... chonky PR.

Most of it is trivial (and largely mechanized) changes to some basic interfaces around Event types, as well as a bunch of new code. Much of the old code remains in deprecated state for remap-lang to fully adopt.

Notable developer facing changes

  • All Event, Value, and the new Lookup/LookupBuf types are now fully documented and tested, including several corner cases found during development.
  • From<String> and From<&str> are no longer implemented on Event.
    • The Event type was moved into shared and can no longer access the global log_schema. Yes, it was a lot of work.
  • LookupBuf, a new String analog which pre-parses lookup paths and allows rich remap-syntax for APIs, has been added.
  • Lookup, a new &str analog which pre-parses lookup paths and allows rich remap-syntax for APIs, has been added. It can be trivially used from LookupBuf, like a &str and a String.
  • The new LookupBuf type is deserializable, as well as serializable (like Lookup). This means Config structs can freely contain them and use them, without a conversion step.
  • The log_event! {} macro is much more prominent, use it, please!
  • Event and Value have (nearly) matching APIs around insert/remove etc. On Value these return a result, as the action is not valid on 'primitives'.
    • This was required to implement the recursive API the new Event API uses cleanly, and gives us a lot more options around working with things.
  • log_event.insert(String::from("foo.bar"), 1) will now insert at "foo.bar" the value 1, where prior it would insert "bar", set to 1, into "foo".
  • log_event.insert(LookupBuf::from_str("foo.bar")?, 1) will now insert "bar", set to 1, into "foo", as strings did prior.
  • The Event and Value types have been moved into shared to facilitate sharing between all our crates, making the type more globally accessible. They are re-exported in the old places, you shouldn't notice.
  • A Look and LookSegment trait are present to enforce parity between types, but it's preferred that the types implement functions directly, to ensure users always have function access (even without the traits imported).

Reviewer Notes

@JeanMertz and I agreed to make a conversion layer for this initial PR between Remap Paths and Lookups, this will be removed later.

I suggest treating this PR in three separate chunks:

  • Review the changes to shared. It will introduce to you the new Event type and the related API.
  • Help comb over the changes inside vector to ensure we are not reading any raw data as Lookup types.
  • Suggest some new test cases for lookups or the API in general.

TODOs

  • I want to do some more benching/profiling. I am unsure about the current Value::deserialize impl.
  • Typify the errors.
  • Resolve last remap<->lookup inconsistencies

Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
Signed-off-by: Ana Hobden <operator@hoverbear.org>
@Hoverbear Hoverbear requested review from a team and leebenson and removed request for eeyun and a team April 8, 2021 14:35
@StephenWakely
Copy link
Contributor

I should note that currently this is pulling in the entire legacy Remap pest parser to parse the paths. This needs to be tidied up. It seems extravagant to have three different parsing libraries in a single project, so it would be useful if we could reuse some of the lalrpop code - or since it is quite a simple parsing task, perhaps we could use nom, which may also provide some performance benefits..

@blt
Copy link
Contributor

blt commented Apr 8, 2021

I should note that currently this is pulling in the entire legacy Remap pest parser to parse the paths. This needs to be tidied up. It seems extravagant to have three different parsing libraries in a single project, so it would be useful if we could reuse some of the lalrpop code - or since it is quite a simple parsing task, perhaps we could use nom, which may also provide some performance benefits..

This strikes me as something that would be best tackled in a follow-up issue, to avoid this PR ballooning any further.

@blt
Copy link
Contributor

blt commented Apr 8, 2021

Ah, hmm. #7014 collides with this change set as the metrics stuff is no longer in the top-level crate but shared.

@StephenWakely
Copy link
Contributor

Ah, hmm. #7014 collides with this change set as the metrics stuff is no longer in the top-level crate but shared.

Yeah, a lot of the conflicts this gets are caused by the move to shared.

@blt
Copy link
Contributor

blt commented Apr 8, 2021

Ah, hmm. #7014 collides with this change set as the metrics stuff is no longer in the top-level crate but shared.

Yeah, a lot of the conflicts this gets are caused by the move to shared.

Indeed. This is a change I'd like to see unwound actually. My proposal for the core extraction is that core be the top-level crate and these are very much core-level concerns, to my mind.

@blt
Copy link
Contributor

blt commented Apr 8, 2021

Alright, well. I've reverted the merges that I've done. My hope was to bring #7014 in so I can compare the runtime of current master with this. Since I cannot easily merge with master I'm going to compare the tip of the PR with 6c0df56 instead.

@blt
Copy link
Contributor

blt commented Apr 8, 2021

Okay, benchmarking process started. Should have numbers in a few hours.

@blt
Copy link
Contributor

blt commented Apr 9, 2021

The results so far are not uniformly positive. I've made a comparison between 6c0df56 and cb12351 with this run script. The following gist has the vector configs referenced in the below.

Hyperfine

The first phase of the this benchmark uses the hyperfine tool to compare vector runtime versions in a stable way. The results:

| |__  _   _ _ __   ___ _ __ / _(_)_ __   ___
| '_ \| | | | '_ \ / _ \ '__| |_| | '_ \ / _ \
| | | | |_| | |_) |  __/ |  |  _| | | | |  __/
|_| |_|\__, | .__/ \___|_|  |_| |_|_| |_|\___|
       |___/|_|
PROCESSING: remap-multihop.toml
Benchmark #1: zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-6c0df5614a10c4ef7d8c5e18d3bba289c6a7a0b5 -qq --config /home/blt/projects/com/datadog/timberio
/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/remap-multihop.toml
  Time (mean ± σ):     16.608 s ±  0.359 s    [User: 75.169 s, System: 10.587 s]
  Range (min … max):   16.151 s … 17.099 s    10 runs

Benchmark #2: zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-cb12351ba2a039db1ea962ab75f990b3d149f37e -qq --config /home/blt/projects/com/datadog/timberio
/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/remap-multihop.toml
  Time (mean ± σ):     13.808 s ±  0.403 s    [User: 71.355 s, System: 4.251 s]
  Range (min … max):   13.355 s … 14.671 s    10 runs

Summary
  'zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-cb12351ba2a039db1ea962ab75f990b3d149f37e -qq --config /home/blt/projects/com/datadog/timberio/vector-tes
t-harness/cases/performance/klog_remap_to_blackhole/configs/remap-multihop.toml' ran
    1.20 ± 0.04 times faster than 'zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-6c0df5614a10c4ef7d8c5e18d3bba289c6a7a0b5 -qq --config /home/blt/projects
/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/remap-multihop.toml'


PROCESSING: remap-nohop.toml
Benchmark #1: zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-6c0df5614a10c4ef7d8c5e18d3bba289c6a7a0b5 -qq --config /home/blt/projects/com/datadog/timberio
/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/remap-nohop.toml
  Time (mean ± σ):     29.038 s ±  0.960 s    [User: 77.805 s, System: 10.876 s]
  Range (min … max):   27.844 s … 30.734 s    10 runs

Benchmark #2: zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-cb12351ba2a039db1ea962ab75f990b3d149f37e -qq --config /home/blt/projects/com/datadog/timberio
/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/remap-nohop.toml
  Time (mean ± σ):     37.095 s ±  0.685 s    [User: 93.243 s, System: 12.653 s]
  Range (min … max):   36.222 s … 38.132 s    10 runs

Summary
  'zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-6c0df5614a10c4ef7d8c5e18d3bba289c6a7a0b5 -qq --config /home/blt/projects/com/datadog/timberio/vector-tes
t-harness/cases/performance/klog_remap_to_blackhole/configs/remap-nohop.toml' ran
    1.28 ± 0.05 times faster than 'zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-cb12351ba2a039db1ea962ab75f990b3d149f37e -qq --config /home/blt/projects
/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/remap-nohop.toml'


PROCESSING: stdin_to_blackhole.toml
Benchmark #1: zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-6c0df5614a10c4ef7d8c5e18d3bba289c6a7a0b5 -qq --config /home/blt/projects/com/datadog/timberio
/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/stdin_to_blackhole.toml
  Time (mean ± σ):      6.745 s ±  0.162 s    [User: 12.504 s, System: 2.282 s]
  Range (min … max):    6.502 s …  6.943 s    10 runs

Benchmark #2: zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-cb12351ba2a039db1ea962ab75f990b3d149f37e -qq --config /home/blt/projects/com/datadog/timberio
/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/stdin_to_blackhole.toml
  Time (mean ± σ):      6.741 s ±  0.155 s    [User: 12.425 s, System: 2.235 s]
  Range (min … max):    6.477 s …  7.008 s    10 runs

Summary
  'zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-cb12351ba2a039db1ea962ab75f990b3d149f37e -qq --config /home/blt/projects/com/datadog/timberio/vector-tes
t-harness/cases/performance/klog_remap_to_blackhole/configs/stdin_to_blackhole.toml' ran
    1.00 ± 0.03 times faster than 'zcat baked.log.gz | /home/blt/projects/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/bin/vector-6c0df5614a10c4ef7d8c5e18d3bba289c6a7a0b5 -qq --config /home/blt/projects
/com/datadog/timberio/vector-test-harness/cases/performance/klog_remap_to_blackhole/configs/stdin_to_blackhole.toml'

In the case of remap-multihop this PR is 20% +/- 4% faster than master. For remap-nohop master is 28% +/- 5% faster than this PR. For stdin-to-blackhole there is no change.

Summary

In two of the three cases tested this PR is dramatically different than master. Unfortunately, in one case it is dramatically slower. The case in which this PR is dramatically slower is the "nohop" variant of a remap pipeline, likely to be the manner in which users configure their vector (as we've instructed them to). At least, this "nohop" configuration is not obviously wrong and the 28+/-5% difference is steep.

Perf / Flamegraph

This flamegraphs generated from perf are here. The overall shape of the graphs for each variant are similar -- good -- although the timings are different. However, since the structure of the code has changed so dramatically between this PR and master it's quiet hard, overall, to draw any conclusions here.

Summary

Inconclusive. Comparisons of this kind are best with small differences.

Massif

The massif results for each variant are available in this zip file: massif-results.zip The programs do not exhibit the leak fixed in #7014. In each variant across all configurations there is very little difference in the allocation pattern presented.

Summary

No difference between variants.

@blt
Copy link
Contributor

blt commented Apr 9, 2021

Looking at the data here I'm inclined to say that we should close this PR out. While it's clear that there are performance gains to be made it's also clear that this PR causes a significant pessimization in a remap heavy configuration, something we're pushing for. I consider it likely that this pessimization is fixable and that the approach taken here has legs but this PR is too big. I do not have a hypothesis on where the pessimization is and while it's possible to get set up to make a hypothesis that in itself will be a non-trivial amount of work. As @FungusHumungus noted keeping this PR up to date with master branch is a daily feat. Further, but less concerning, we have not significantly reduced allocations in this PR relative to master, with regard to #6831. Saving allocations were part of this PR's goal set, I do believe. If we were to continue with this PR -- and I would like to acknowledge that a lot of work has gone into this PR -- I think we will need to:

The last point seems the hardest to achieve to me, though it's potentially addressed by the first. @FungusHumungus you've spent the most time wrestling with this code lately. Do you see the situation here differently than I do?

@StephenWakely
Copy link
Contributor

Those are interesting findings. So it seems that the PR is significantly faster, unless it has to deal with Remap. This suggests that the issue is in the interface between Remap and the Lookup api.

I would put my money on the culprit being here: https://github.com/timberio/vector/blob/new-accessors/lib/shared/src/lookup/lookup_buf/mod.rs#L107-L114

impl TryFrom<&vrl::Path> for LookupBuf {
    type Error = LookupError;
    fn try_from(target: &vrl::Path) -> Result<Self, Self::Error> {
        let path_string = target.to_string();
        trace!(path = %path_string, "Converting to LookupBuf.");
        LookupBuf::from_str(&path_string)
    }
}

Any time Remap has to access a path in the event it needs to convert the Remap path into a Lookup path. It does this by dumping it to a string and then reparsing it. The main objective behind this lookup was to reduce these parsing costs. However, when using Remap, it actually now increases the amount of parsing that occurs.

@JeanMertz
Copy link
Contributor

JeanMertz commented Apr 9, 2021

While it's clear that there are performance gains to be made it's also clear that this PR causes a significant pessimization in a remap heavy configuration, something we're pushing for.

The one thing I would note here is that a few months ago — when both VRL and this work were in early development — it was decided to get this PR over the finish line without considering any performance regressions related to Vector <> VRL integration.

That is to say, a naïve implementation of From<vrl::Path> for LookupBuf and impl vrl::Target for Event was chosen to get this over the finish line as soon as possible.

The thought process behind that decision was that there is nothing technically stopping us from using the same performance optimizations w.r.t. path lookup in both VRL and Vector, by extracting out that logic into its own crate, having both VRL and Vector depend on it, and allowing us to pass paths between both without any conversion or allocation needed.

Obviously, this PR has lingered since, and VRL has shipped, so a lot has changed since then, but I never spent any large amount of time improving VRL's path lookup performance precisely because this work was on the horizon, so it might still be worth pushing this PR over the finish line if benchmarks show performance improvements leaving VRL out of the picture.

It would require us to schedule work to actually improve VRL <> Lookup integration, since we can't ship this work in a new release without tackling that first.

Having said that, the PR is massive (a lot of work went into getting this done, as you mentioned), and it feels somewhat like a daunting task to get this in and then iterate on it further. One way forward is to take the idea proven in this PR, and build a new lookup crate that can be worked on in isolation, iterating on it in small PRs, until it's at a stage where both Vector and VRL can use it, and we can have one or two relatively larger PRs to do all the legwork of updating all path references to the new lookup API.

@StephenWakely
Copy link
Contributor

I would be interested if a naive implementation of TryFrom:

impl TryFrom<&vrl::Path> for LookupBuf {
    type Error = LookupError;
    fn try_from(target: &vrl::Path) -> Result<Self, Self::Error> {
        Ok(LookupBuf {
            segments: target
                .segments()
                .iter()
                .map(|segment| match segment {
                    vrl::Segment::Field(vrl::Field::Regular(field)) => SegmentBuf::Field {
                        name: field.clone(),
                        requires_quoting: false,
                    },
                    _ => unimplemented!(),
                })
                .collect(),
        })
    }
}

had any impact. It's not perfect as there are still allocations, but it should be better than running it through pest. If it shifts things in the right direction, it would be indicative that the issue is here and would warrant further investigation.

Note this only works with simple fields - no indexing or coalescing, but it looks like that is all the test conf uses anyway.

The 28% potential improvement is IMO well worth pursuing.

@blt
Copy link
Contributor

blt commented Apr 9, 2021

Having said that, the PR is massive (a lot of work went into getting this done, as you mentioned), and it feels somewhat like a daunting task to get this in and then iterate on it further. One way forward is to take the idea proven in this PR, and build a new lookup crate that can be worked on in isolation, iterating on it in small PRs, until it's at a stage where both Vector and VRL can use it, and we can have one or two relatively larger PRs to do all the legwork of updating all path references to the new lookup API.

This strikes me as the proper way forward. The ideas in this PR are absolutely worth pursuing but we need to be careful to separate out the new work -- the lookup crate -- from the mechanical work of integrating it. These two things got integrated into one PR and it makes interrogating the new work quite hard.

Personally, I'd be very happy to see an issue made to extract the path work here into a separate crate, PR and then a follow-up issue for integration into the wider project. Doing that we should be able to capture the performance gain of this PR without imposing the loss. I would also vote to prioritize this project.

@FungusHumungus @JeanMertz @binarylogic I know we talked about this some out-of-ticket. Does this take capture your thoughts on the subject?

@StephenWakely
Copy link
Contributor

Does this take capture your thoughts on the subject?

Yes. I would suggest in the following order.

  1. Pull it out into a separate crate. However, you (@blt) mentioned above that you would like to see shared pulled back into the root crate. That will not be possible as it needs sharing with Vrl. Is that still going to be an issue given we are now splitting the tasks over multiple stages?
  2. Get Vrl working with the new crate. I think the main challenges here will involve enabling Vrl to use the same parsing code as the Lookup, bearing in mind Lookup likes to parse as a reference type (Lookup<'a>). The Vrl parser currently parses as an owned type. So it would need to handle both. Once Vrl is working with the new crate we should be able to do some hefty benchmarking. I think the main problematic config was pretty much entirely using lookups via Vrl. So I would expect to see a 28% performance gain right there.. If not more (fingers crossed).
  3. Move the rest of the app to the new crate. I imagine we could temporarily support both apis, so LogEvent would have an insert and a speedy_insert function. This part could be done at a leisurely pace. When everything is moved over, we can delete insert and sed speedy_insert -> insert.

@blt
Copy link
Contributor

blt commented Apr 9, 2021

Does this take capture your thoughts on the subject?

Yes. I would suggest in the following order.

1. Pull it out into a separate crate. However, you (@blt) mentioned above that you would like to see `shared` pulled back into the root crate. That will not be possible as it needs sharing with Vrl. Is that still going to be an issue given we are now splitting the tasks over multiple stages?

Could be I'm wrong there. I've only got vague notions for #7027 so far and this is part of that. As is, the only blocker I'm aware of with shared is the Metrics::from_metrics_kv. It does strike me that Event is a core concern of vector, but I do see that there's a dependency issue.

2. Get Vrl working with the new crate. I think the main challenges here will involve enabling Vrl to use the same parsing code as the Lookup, bearing in mind Lookup likes to parse as a reference type (`Lookup<'a>`). The Vrl parser currently parses as an owned type.

FWIW if VRL could be made to parse by reference that would go a long way to addressing #6831.

So it would need to handle both. Once Vrl is working with the new crate we should be able to do some hefty benchmarking. I think the main problematic config was pretty much entirely using lookups via Vrl. So I would expect to see a 28% performance gain right there.. If not more (fingers crossed).

3. Move the rest of the app to the new crate. I imagine we could temporarily support both apis, so `LogEvent` would have an `insert` and a `speedy_insert` function. This part could be done at a leisurely pace. When everything is moved over, we can delete `insert` and sed `speedy_insert` -> `insert`.

I buy the general approach, yeah. I like that it can be done gradually.

@StephenWakely
Copy link
Contributor

StephenWakely commented Apr 9, 2021

It does strike me that Event is a core concern of vector, but I do see that there's a dependency issue.

Vrl only needs the Lookup path data structures. It absolutely doesn't want anything to do with the Event. That way it can remain agnostic to the data it is running on top of. So the new crate should really only contain the core Lookup api. The actual lookup code, Event, Metrics etc can remain within Vector core.

@blt
Copy link
Contributor

blt commented Apr 9, 2021

It does strike me that Event is a core concern of vector, but I do see that there's a dependency issue.

Vrl only needs the Lookup path data structures. It absolutely doesn't want anything to do with the Event. That way it can remain agnostic to the data it is running on top of. So the new crate should really only contain the core Lookup api. The actual lookup code, Event, Metrics etc can remain within Vector core.

Rad. Even better.

@blt
Copy link
Contributor

blt commented Apr 9, 2021

@binarylogic @JeanMertz @FungusHumungus I've made #7070 to replace this PR. I'd be grateful for your thoughts.

@blt
Copy link
Contributor

blt commented Apr 12, 2021

Closing in favor of #7070.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: data model Anything related to Vector's internal data model
Projects
None yet
9 participants