-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Unmapped aggs should not run pipelines if they delegate reduction #33528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
UnmappedTerms aggs try to delegate reduction to a sibling object that is not unmapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the UnmappedTerms _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON and break when converting to maps. This fixes the issue by toggling a flag in UnmappedTerms if it delegated away reduction so that it knows not to run pipeline aggs either. Closes elastic#33514
|
Pinging @elastic/es-search-aggs |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of the fact that unmapped aggs need to do this delegation but this LGTM given thats how it works. I wonder if we can improve this to avoid having to delegate like this down the road?
|
Another easy fix would be sorting the aggs before reduction, so that we preferentially choose mapped aggs as the first reduction object. We'd probably need the unmapped aggs to implement a simple interface so we could tell them apart from other aggs, and would need to do the sort, but that might be overall cleaner than this sort of delegation? |
What would happen if all the aggs were unmapped, would we end up having to do the same delegation we do now?
This could also be done by having a method on InternalAggregation that is something like What would be great is if we could find a way to make it so Unmapped aggs are not special at all and so we don't have to avoid using them for reduce or being careful about double execution |
Same delegation process, but no need for the flag to track state. If there are only unmapped aggs, the first one will get chosen to run. Then it will look through all the aggs, see they are also all unmapped and then just return itself skipping reduction entirely. Then it'll also run pipelines.
++ that'd work too, and be cleaner. |
|
@colings86 Ok, reworked to use a sorting approach. Has an overall smaller change to the codebase, at the small cost of a sort. I also removed the delegation ability from unmapped aggs, since it's not needed anymore. If they get called to lead the reduction they will just return themselves, because there are no other mapped aggs. There is a bit of a danger here that the caller won't know to sort first, and won't know that unmapped aggs just return themselves. Right now there's only one place that calls this code so it's reasonably contained. But we could make the unmapped aggs throw a Thoughts? |
|
Jenkins, run gradle build tests |
1 similar comment
|
Jenkins, run gradle build tests |
|
Huzzah, tests passing again. Think this is ready for another look. |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor comment but this LGTM. Thanks for spending the time to work to a better solution
| * | ||
| * Applies to any pipeline agg, not just max. | ||
| */ | ||
| public void testFieldGetsWrittenOutTwice() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this testing that the field does NOT get written twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, the name should be adjusted. Named it while it was failing, but that doesn't make sense now that it's fixed :)
|
Jenkins, run gradle build tests |
Previously, unmapped aggs try to delegate reduction to a sibling agg that is mapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the unmapped agg _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON (e.g. same key multiple times) and break when converting to maps. This fixes by sorting the list of aggregations ahead of time so that mapped aggs appear first, meaning they preferentially lead the reduction. If all aggs are unmapped, the first unmapped agg simply creates a new unmapped object and returns that for the reduction. This means that unmapped aggs no longer defer and there is no chance for a secondary execution of pipelines (or other side effects caused by deferring execution). Closes #33514
Previously, unmapped aggs try to delegate reduction to a sibling agg that is mapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the unmapped agg _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON (e.g. same key multiple times) and break when converting to maps. This fixes by sorting the list of aggregations ahead of time so that mapped aggs appear first, meaning they preferentially lead the reduction. If all aggs are unmapped, the first unmapped agg simply creates a new unmapped object and returns that for the reduction. This means that unmapped aggs no longer defer and there is no chance for a secondary execution of pipelines (or other side effects caused by deferring execution). Closes #33514
Previously, unmapped aggs try to delegate reduction to a sibling agg that is mapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the unmapped agg _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON (e.g. same key multiple times) and break when converting to maps. This fixes by sorting the list of aggregations ahead of time so that mapped aggs appear first, meaning they preferentially lead the reduction. If all aggs are unmapped, the first unmapped agg simply creates a new unmapped object and returns that for the reduction. This means that unmapped aggs no longer defer and there is no chance for a secondary execution of pipelines (or other side effects caused by deferring execution). Closes #33514
UnmappedTerms/UnmappedSampler/UnmappedSigTermsaggs try to delegate reduction to a sibling object that is not unmapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, theUnmapped*agg also tries to run pipeline aggs.This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON and break when converting to maps.
This fixes the issue by toggling a flag in UnmappedTerms if it delegated away reduction so that it knows not to run pipeline aggs either.
Closes #33514