Reuse CompensatedSum object in agg collect loop #49548
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The new
CompensatedSum
is a nice DRY refactor, but had the unanticipated side effect of creating a lot of object allocation in the aggregation hot collection loop: one object per visited document, per aggregator. In some places it created two per-doc-per-agg (weighted avg, geo centroids, etc) since there were multiple compensations being maintained.This PR moves the object creation out of the hot loop so that it is now created once per segment, and resets the internal state each time through the loop
Closes #49506
Note:
CompensatedSum
is also used for the various Internal* classes, and these have been left alone since they are not performance critical and the usage works great there.Old text:
This is a refactor to introduce a new "compensated" DoubleArray abstraction. It allows aggregators to use kahan summation with no extra object garbage. Internally this uses the newCompensatedSum
object, but just continually resets the state for each new bucket instead of allocating a new object.The syntax loosely follows DoubleArray, but is different in places. A user will initialize/grow the array as normal, but instead of getting/setting they will reset/add/commit values. Because of these changes it doesn't currently implement DoubleArray or any of the relatives.Still a bit of a WIP. I want to run some benchmarks to see if it makes a noticeable difference, and I'm not quite satisfied with the interface/abstraction yet.