Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Kahan Summation algorithm #71339

Closed
wants to merge 2 commits into from
Closed

Implement Kahan Summation algorithm #71339

wants to merge 2 commits into from

Conversation

zkscpqm
Copy link

@zkscpqm zkscpqm commented Apr 6, 2021

Currently the Sum aggregation was simply done with += which can lead to inaccuracies when working with high-precision values.
The Kahan Summation algorithm checks for such inaccuracies by performing the sum and keeping track of lost precision points.

Edit 1: Updated to account for new values being larger than running sum

@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Apr 6, 2021
This takes into account the added value being higher than the running sum
@cbuescher cbuescher added the :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data label Apr 22, 2021
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented Apr 22, 2021

For what it's worth the sum does use Kahan on the data nodes. This is for rollup. @csoulios might be the best to have a look.

@csoulios
Copy link
Contributor

@zkscpqm thank you for contributing this improvement.

There is already an implementation of KahanSummation implemented in CompensatedSum

Please have a look

final CompensatedSum kahanSummation = new CompensatedSum(0, 0);

Do you think you could modify your code so that it uses CompensatedSum?

@pugnascotia
Copy link
Contributor

@zkscpqm are you still interested in working on this PR?

@csoulios csoulios self-assigned this Jun 1, 2022
@csoulios
Copy link
Contributor

csoulios commented Jun 9, 2022

This PR has been idle for long.

Also, this feature has been implemented for the TSDB downsampling project (#87554)

Closing it.

@csoulios csoulios closed this Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants