Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug where sum returns wrong values with long data type #6984

Closed

Conversation

matthewryanwells
Copy link

Description

Fixes a bug regarding sum returning an incorrect value when working with large long values because the current code uses the double data type. In addition we also extracted code for checking null source into it's own class so it doesn't need to be checked for.

Issues Resolved

#5537
opensearch-project/sql#1052

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…null values to clean code and solve bug

Signed-off-by: Matthew Wells <matthew.wells@improving.com>
Signed-off-by: Matthew Wells <matthew.wells@improving.com>
Signed-off-by: Matthew Wells <matthew.wells@improving.com>
Signed-off-by: Matthew Wells <matthew.wells@improving.com>
Signed-off-by: Matthew Wells <matthew.wells@improving.com>
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2023

Gradle Check (Jenkins) Run Completed with:

…' for normal or high precision but slower summation

Signed-off-by: Matthew Wells <matthew.wells@improving.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

Gradle Check (Jenkins) Run Completed with:

@reta
Copy link
Collaborator

reta commented Jun 9, 2023

Thanks a lot @matthewryanwells for moving it forward

@matthewryanwells
Copy link
Author

Thanks a lot @matthewryanwells for moving it forward

Thank you! I had some other stuff to work on but now I can focus on this now, with the optional parameter implemented (with a few updates still needed) I can fully focus on a solution that solves the bug now

…ional parameter, added tests, set correct default value, changed method to enum

Signed-off-by: Matthew Wells <matthew.wells@improving.com>
@matthewryanwells
Copy link
Author

There is still some work I need to do, specifically it seems that the precise sum aggregator is sometimes not returning the correct result but we are getting close to being done

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@acarbonetto
Copy link
Contributor

@reta I don't know if you have any suggestions to solve this:
The SumPreciseAggregator.java will only aggregate data within a shard. The fetch phase run the index on several different shards, and reduces the results using InternalAggregations.java. Which means if the index is split between multiple shards and processed by multiple threads, the longs are casted to doubles before finally being reduced by the InternalSum processor.
We observed this when running IT tests, since the IT testes run on 5 shards, the data was (around 20%) aggregated correctly and (around 80%) failing because it was casted before aggregation.
To fix this, we would need to return long values from the precise sum aggregation, and update the InternalAggregation to aggregate using a similar mechanism as the SumPreciseAggregator. Right?

@reta
Copy link
Collaborator

reta commented Jun 16, 2023

To fix this, we would need to return long values from the precise sum aggregation, and update the InternalAggregation to aggregate using a similar mechanism as the SumPreciseAggregator. Right?

@matthewryanwells I think the exact approach may look more complicated. So we have basically 3 data types (as of today): long, double and unsigned long (was recently merged). The valuesSource has two properties:

  • isFloatingPoint - means this is aggregation over floating point, we need to use double
  • isBigInteger - means this is aggregation over unsigned long, we need to use BigInteger
  • all other cases - means this is aggregation over other types (long / integer / ... ), we need to use BigInteger (since long could overflow)

The SumAggregator / SumPreciseAggregator still use InternalSum to represent the result, which only bears double value - we would loose precision here. One of the options which comes to mind is what if InternalSum bears BigDecimal instead? (InternalSum::reduce still uses CompensatedSum)

Does it make sense?

@matthewryanwells
Copy link
Author

To fix this, we would need to return long values from the precise sum aggregation, and update the InternalAggregation to aggregate using a similar mechanism as the SumPreciseAggregator. Right?

@matthewryanwells I think the exact approach may look more complicated. So we have basically 3 data types (as of today): long, double and unsigned long (was recently merged). The valuesSource has two properties:

  • isFloatingPoint - means this is aggregation over floating point, we need to use double
  • isBigInteger - means this is aggregation over unsigned long, we need to use BigInteger
  • all other cases - means this is aggregation over other types (long / integer / ... ), we need to use BigInteger (since long could overflow)

The SumAggregator / SumPreciseAggregator still use InternalSum to represent the result, which only bears double value - we would loose precision here. One of the options which comes to mind is what if InternalSum bears BigDecimal instead? (InternalSum::reduce still uses CompensatedSum)

Does it make sense?

The explanation is slightly confusing for me.

For this improvement to work out we would need SumPreciseAggregator to return a BigInteger value and have the aggregation at the shard level (I am assuming that this is InternalSum) also do it's arithmetic using BigIntegers (because both double and BigDecimal lose precision).

Do you know if something like this would be possible/a reasonable thing we could implement?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jul 24, 2023
@reta
Copy link
Collaborator

reta commented Jul 24, 2023

Apologies, @matthewryanwells, I missed your reply

Do you know if something like this would be possible/a reasonable thing we could implement?

I think there is a way to implement that (taking into account that the precise calculation is opt-in), but that would increase basically transfer both double and BigInteger/BigDecimal values, I will try to prototype the solution this week.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jul 25, 2023
@reta
Copy link
Collaborator

reta commented Jul 28, 2023

@matthewryanwells some updates

Do you know if something like this would be possible/a reasonable thing we could implement?

I was hacking around and I think we could implement that by passing both double and BigDecimal representation of the sum in InternalSum & related classes.

For this improvement to work out we would need SumPreciseAggregator to return a BigInteger value and have the aggregation at the shard level (I am assuming that this is InternalSum) also do it's arithmetic using BigIntegers (because both double and BigDecimal lose precision).

I have trouble projecting the precise sum to BigInteger, the aggregation over the double values would not be possible I think. May be you could clarify the trick for me here? Thank you.

@acarbonetto
Copy link
Contributor

@matthewryanwells has unfortunately paused work on this because of the increased scope and experimental flags requirements. There isn't a huge use-case for the 'fix' (its becoming more of an enhancement now). Hopefully he will find some time to work on it in the near future and we can re-open the ticket.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 1, 2023
@opensearch-trigger-bot
Copy link
Contributor

This PR was closed because it has been stalled for 7 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants