Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring pipelines transform more inline with general transform behavior #11784

Closed
2 of 3 tasks
blt opened this issue Mar 10, 2022 · 7 comments · Fixed by #12602
Closed
2 of 3 tasks

Bring pipelines transform more inline with general transform behavior #11784

blt opened this issue Mar 10, 2022 · 7 comments · Fixed by #12602
Assignees
Labels
domain: performance Anything related to Vector's performance transform: pipelines Anything `pipelines` transform related

Comments

@blt
Copy link
Contributor

blt commented Mar 10, 2022

In #10144 we've slowly come to understand why http -> pipelines -> blackhole does not perform well. The root culprit is the expansion mechanism in place for PipelinesConfig et al and the additional hops that adds to a running topology, compared to a non-pipeline configuration that achieves the same ends. We would like, then, to minimize the differences at runtime between an "unrolled" pipeline versus its pipelined peer.

Thoughts for how to minimize these differences:

The approach for this work will necessarily have to be a loop of removing one serious bottleneck, re-evaluating our progress and moving on to the next bottleneck. At the end it's entirely possible we will have removed the expansion mechanism in place for pipelines, possibly expanded the primitives available for all transforms to control branching etc.

@blt blt added domain: performance Anything related to Vector's performance transform: pipelines Anything `pipelines` transform related labels Mar 10, 2022
@jszwedko
Copy link
Member

With regards to double filtering, we have thought about adding a new named output to the filter transform in the past that would permit routing events that fail the condition. Maybe that could solve this problem as well as providing a generally useful addition to the filter transform?

@blt
Copy link
Contributor Author

blt commented Mar 10, 2022

With regards to double filtering, we have thought about adding a new named output to the filter transform in the past that would permit routing events that fail the condition. Maybe that could solve this problem as well as providing a generally useful addition to the filter transform?

That makes good sense to me. The semantics of the Filter transform are "take an input, drop if not matched else pass". We almost want a splitter where the transform has two outputs where the semantics are "take an input, route high if matches else route low or drop" if there's no low. You could implement the existing Filter in terms of that or change how Filter works as a primitive.

@jdrouet
Copy link
Contributor

jdrouet commented Mar 11, 2022

That makes good sense to me. The semantics of the Filter transform are "take an input, drop if not matched else pass". We almost want a splitter where the transform has two outputs where the semantics are "take an input, route high if matches else route low or drop" if there's no low. You could implement the existing Filter in terms of that or change how Filter works as a primitive.

What do you think about doing a switch transform? The only difference with a router transform would be that it would only forward the event to the first route that matches and therefore wouldn't need to clone the event.

@blt
Copy link
Contributor Author

blt commented Mar 11, 2022

That makes good sense to me. The semantics of the Filter transform are "take an input, drop if not matched else pass". We almost want a splitter where the transform has two outputs where the semantics are "take an input, route high if matches else route low or drop" if there's no low. You could implement the existing Filter in terms of that or change how Filter works as a primitive.

What do you think about doing a switch transform? The only difference with a router transform would be that it would only forward the event to the first route that matches and therefore wouldn't need to clone the event.

I would be interested to see how a switch performs for sure. Seems like a generally useful primitive as well. @lukesteensen's thoughts here would be valuable too.

@binarylogic
Copy link
Contributor

From a UX standpoint I would like to have a single route transform with an option to switch or not. This is inline with the transform boundaries in the UX design doc. If it's not possible, that's fine, but they are close enough in behavior that a single transform would be better.

@lukesteensen
Copy link
Member

We could add two options to the route config that would cover these:

  1. An else output for when no condition matches
  2. a first_match mode where we stop after the first matching condition instead of sending to all matches

I don't want to end up with too many knobs, but that seems less confusing than altering the filter semantics or adding a whole new similar-but-different transform.

@jszwedko
Copy link
Member

We could add two options to the route config that would cover these:

  1. An else output for when no condition matches
  2. a first_match mode where we stop after the first matching condition instead of sending to all matches

I don't want to end up with too many knobs, but that seems less confusing than altering the filter semantics or adding a whole new similar-but-different transform.

While both of those would be required to support the semantics of the switch transform in #11797 I think, strictly for the pipelines use-case which only has two routes, that simply supporting else would be sufficient.

blt added a commit that referenced this issue May 11, 2022
This commit adjusts pipeline expansion so that they are combined, rather than,
well, expanded. This means that the sub-transforms of a pipeline run in serial
but that each pipeline as a whole can run multiple copies of itself at
once. This also cleans up many low-priority tasks.

Resolves #11787
Resolves #11784
REF #10144

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue May 13, 2022
This commit adjusts pipeline expansion so that they are combined, rather than,
well, expanded. This means that the sub-transforms of a pipeline run in serial
but that each pipeline as a whole can run multiple copies of itself at
once. This also cleans up many low-priority tasks.

Resolves #11787
Resolves #11784
REF #10144

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue May 16, 2022
* Combine pipeline stages

This commit adjusts pipeline expansion so that they are combined, rather than,
well, expanded. This means that the sub-transforms of a pipeline run in serial
but that each pipeline as a whole can run multiple copies of itself at
once. This also cleans up many low-priority tasks.

Resolves #11787
Resolves #11784
REF #10144

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Tidy up errors, fix bug in vector.toml pipeline config

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* test dings

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* temporariliy disable http_datadog_filter_blackhole

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* PR feedback

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* PR feedback with regard to outputs

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* PR feedback

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* try 64 wide interior buffer

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

Co-authored-by: Brian L. Troutwine <brian@troutwine.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance transform: pipelines Anything `pipelines` transform related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants