Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse CompensatedSum object in agg collect loop #49548

Merged
merged 2 commits into from
Nov 25, 2019

Conversation

polyfractal
Copy link
Contributor

@polyfractal polyfractal commented Nov 25, 2019

The new CompensatedSum is a nice DRY refactor, but had the unanticipated side effect of creating a lot of object allocation in the aggregation hot collection loop: one object per visited document, per aggregator. In some places it created two per-doc-per-agg (weighted avg, geo centroids, etc) since there were multiple compensations being maintained.

This PR moves the object creation out of the hot loop so that it is now created once per segment, and resets the internal state each time through the loop

Closes #49506

Note: CompensatedSum is also used for the various Internal* classes, and these have been left alone since they are not performance critical and the usage works great there.

Old text:

This is a refactor to introduce a new "compensated" DoubleArray abstraction. It allows aggregators to use kahan summation with no extra object garbage. Internally this uses the new CompensatedSum object, but just continually resets the state for each new bucket instead of allocating a new object.

The syntax loosely follows DoubleArray, but is different in places. A user will initialize/grow the array as normal, but instead of getting/setting they will reset/add/commit values. Because of these changes it doesn't currently implement DoubleArray or any of the relatives.

Still a bit of a WIP. I want to run some benchmarks to see if it makes a noticeable difference, and I'm not quite satisfied with the interface/abstraction yet.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@polyfractal
Copy link
Contributor Author

So @iverase kindly pointed out that all the ceremony around the BigArray/DoubleArray stuff is unnecessary, and we really only need the "reset" functionality and each agg can keep it's own scratch CompensatedSum object.

E.g. wildly over-complicated. :) Gonna re-work this

@polyfractal polyfractal force-pushed the compensated_sum_double_array branch from 3566bd0 to 9e5d3be Compare November 25, 2019 19:08
@polyfractal polyfractal removed the WIP label Nov 25, 2019
@polyfractal
Copy link
Contributor Author

Pushed an update, now 200% simpler. Thanks @iverase :)

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@polyfractal polyfractal changed the title Refactor aggregator sum compensation to use DoubleArray abstraction Reuse CompensatedSum object in agg collect loop Nov 25, 2019
@polyfractal polyfractal merged commit 5bc0a3f into elastic:master Nov 25, 2019
polyfractal added a commit that referenced this pull request Nov 25, 2019
The new CompensatedSum is a nice DRY refactor, but had the unanticipated 
side effect of creating a lot of object allocation in the aggregation hot collection 
loop: one object per visited document, per aggregator. In some places it 
created two per-doc-per-agg (weighted avg, geo centroids, etc) since there 
were multiple compensations being maintained.

This PR moves the object creation out of the hot loop so that it is now 
created once per segment, and resets the internal state each time through 
the loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce object allocations of CompensatedSum in aggregations
4 participants