Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35576: [C++] Make Decimal{128,256}::FromReal more accurate #35997

Merged
merged 4 commits into from
Jun 14, 2023

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jun 8, 2023

The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as:

>>> pa.array([1234567890.]).cast(pa.decimal128(38, 11))
<pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0>
[
  1234567889.99999995904
]
>>> pa.array([1234567890.]).cast(pa.decimal128(38, 12))
<pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0>
[
  1234567890.000000057344
]

The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.

@pitrou pitrou force-pushed the gh-35576-float-to-decimal branch from 18155dc to f8a6443 Compare June 8, 2023 16:02
@pitrou pitrou marked this pull request as ready for review June 8, 2023 19:52
@pitrou
Copy link
Member Author

pitrou commented Jun 8, 2023

@felipecrv @benibus Would either you have time to review this?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't promise to follow the conversion algorithm entirely :) but this looks well tested and the bit shifting routines make sense.

cpp/src/arrow/util/decimal_internal.h Show resolved Hide resolved
Comment on lines -954 to -960
bool operator==(const BasicDecimal128& left, const BasicDecimal128& right) {
return left.high_bits() == right.high_bits() && left.low_bits() == right.low_bits();
}

bool operator!=(const BasicDecimal128& left, const BasicDecimal128& right) {
return !operator==(left, right);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going away because it is superceded by the version in GenericBasicDecimal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

cpp/src/arrow/util/decimal.cc Outdated Show resolved Hide resolved
@@ -802,34 +814,21 @@ class TestDecimalFromReal : public ::testing::Test {
const std::vector<ParamType> params{
// clang-format off
{0.0f, 1, 0, "0"},
{-0.0f, 1, 0, "0"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove these cases? We can still convert negative numbers right? Isn't it just less precise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because CheckDecimalFromReal and CheckDecimalFromRealIntegerString now automatically deduce the negative test cases from the positive ones.

f"diff_digits = {diff_digits!r}")


# XXX Cannot test float32 as case generators above assume float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO (should we create an issue?) or is it a fact, in which case we can get rid of the XXX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I wouldn't create an issue for it as the effort-benefit ratio is probably low.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jun 9, 2023
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't claim I went through all the details of the implementation, but it looks like the kind of code that doesn't have rarely-taken branches that wouldn't be exercised from the unit tests. LGTM.

cpp/src/arrow/util/decimal.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/basic_decimal.cc Outdated Show resolved Hide resolved
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain,
accumulating rounding errors especially for large scale or precision values, such as:
```
>>> pa.array([1234567890.]).cast(pa.decimal128(38, 11))
<pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0>
[
  1234567889.99999995904
]
>>> pa.array([1234567890.]).cast(pa.decimal128(38, 12))
<pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0>
[
  1234567890.000000057344
]
```

The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain.
However, negative scales, which are presumably infrequent, fall back on the old algorithm.
@pitrou pitrou force-pushed the gh-35576-float-to-decimal branch from 5f0abee to 87be93a Compare June 14, 2023 12:47
@pitrou
Copy link
Member Author

pitrou commented Jun 14, 2023

I think I have addressed all review comments. Would you like to take another look? @westonpace @felipecrv

@pitrou pitrou merged commit b6eab1f into apache:main Jun 14, 2023
@pitrou pitrou deleted the gh-35576-float-to-decimal branch June 14, 2023 14:49
@conbench-apache-arrow
Copy link

Conbench analyzed the 7 benchmark runs on commit b6eab1f4.

There were 35 benchmark results indicating a performance regression:

The full Conbench report has more details.

return {
// -- Stress the 24 bits of precision of a float
// 2**63 + 2**40
FromFloatTestParam{9.223373e+18f, 19, 0, "9223373136366403584"},
Copy link
Contributor

@huberylee huberylee Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou Hi, the expected return value of FromFloatTestParam{5.76460752e13f, 18, 4, "57646075230342.3488"} is 57646073774080.0000, which seems different from the original value, does that meet expectations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you call "the original value" in that context?
Both values are actually equal in float32:

>>> np.float32(5.76460752e13) == np.float32(57646073774080)
True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you call "the original value" in that context? Both values are actually equal in float32:

>>> np.float32(5.76460752e13) == np.float32(57646073774080)
True

I see what you mean. Thanks!

@AlenkaF AlenkaF removed their request for review November 1, 2023 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Decimal{128,256}::FromReal accuracy loss on non-small scale values
4 participants