Skip to content

Conversation

@thomasrebele
Copy link
Contributor

What changes were proposed in this pull request?

As described in HIVE-29275, the compareTo method of Decimal leads to wrong result. This PR proposes a custom comparator handling many cases with a fallback to BigDecimal's compareTo.

Why are the changes needed?

Fixes the stats processed by DecimalColumnStatsMerger.

Does this PR introduce any user-facing change?

Yes, might change the statistics.

How was this patch tested?

Unit tests, qfile tests.

@thomasrebele thomasrebele changed the title Stats autogather calculates the min statistic incorrectly HIVE-29275: Stats autogather calculates the min statistic incorrectly Nov 12, 2025
@thomasrebele thomasrebele marked this pull request as ready for review November 13, 2025 09:16
Implement an optimized compare method for Decimal
@sonarqubecloud
Copy link

int compareInner(Decimal d1, Decimal d2, boolean useFallback, ApproachInfoCallback aic) {
byte[] unscaled1 = d1.getUnscaled();
byte[] unscaled2 = d2.getUnscaled();
boolean notNegative1 = isNotNegative(unscaled1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about naming on overall: isn't it better to understand isPositive instead of isNotNegative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the same:

  • isPositive: 0 < x
  • isNotNegative 0 <= x
    So the question is how 0 is handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately isPositive is not the same as isNotNegative, as isPositive(0) is false, while isNotNegative(0) is true. At least that's the standard use of the terms.


@Override
public int compare(Decimal o1, Decimal o2) {
return compareInner(o1, o2, true, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the entry point of the comparing itself. Is it possible to add some traditional checks like null checking and reference equality check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add the checks.


HiveDecimal result = HiveDecimal.create(new BigInteger(val.getUnscaled()), val.getScale());
return (result != null) ? result.toString() : "";
return MetaStoreServerUtils.decimalToString(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of moving this logic to MetaStoreServerUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShowUtils is not accessible from DecimalColumnStatsMergerTest. Moving the logic of the method to MetaStoreServerUtils allows it to use it in the test as well, avoiding duplication of the same logic in different places.

@thomasrebele
Copy link
Contributor Author

Thank you for the reviews! As @kasakrisz observed, the comparator will not be called frequently, so maybe the custom comparator is not justifiable.

I'll prepare another PR using a simple HiveDecimal.compareTo, once #6189 has been merged (as both deal with statistics).

*
* <p>In that case it is enough to compare their unscaled value (or significand).
*/
private static int compareSameScale(byte pad, byte[] b1, byte[] b2, ApproachInfoCallback aic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some explanation about the algorithm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what there is to say besides that the algorithm "compare[s] their unscaled value".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants