-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35576: [C++] Make Decimal{128,256}::FromReal more accurate #35997
Conversation
18155dc
to
f8a6443
Compare
@felipecrv @benibus Would either you have time to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't promise to follow the conversion algorithm entirely :) but this looks well tested and the bit shifting routines make sense.
bool operator==(const BasicDecimal128& left, const BasicDecimal128& right) { | ||
return left.high_bits() == right.high_bits() && left.low_bits() == right.low_bits(); | ||
} | ||
|
||
bool operator!=(const BasicDecimal128& left, const BasicDecimal128& right) { | ||
return !operator==(left, right); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going away because it is superceded by the version in GenericBasicDecimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
@@ -802,34 +814,21 @@ class TestDecimalFromReal : public ::testing::Test { | |||
const std::vector<ParamType> params{ | |||
// clang-format off | |||
{0.0f, 1, 0, "0"}, | |||
{-0.0f, 1, 0, "0"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove these cases? We can still convert negative numbers right? Isn't it just less precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because CheckDecimalFromReal
and CheckDecimalFromRealIntegerString
now automatically deduce the negative test cases from the positive ones.
python/pyarrow/tests/test_compute.py
Outdated
f"diff_digits = {diff_digits!r}") | ||
|
||
|
||
# XXX Cannot test float32 as case generators above assume float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO (should we create an issue?) or is it a fact, in which case we can get rid of the XXX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I wouldn't create an issue for it as the effort-benefit ratio is probably low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't claim I went through all the details of the implementation, but it looks like the kind of code that doesn't have rarely-taken branches that wouldn't be exercised from the unit tests. LGTM.
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as: ``` >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11)) <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0> [ 1234567889.99999995904 ] >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12)) <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0> [ 1234567890.000000057344 ] ``` The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.
5f0abee
to
87be93a
Compare
I think I have addressed all review comments. Would you like to take another look? @westonpace @felipecrv |
Conbench analyzed the 7 benchmark runs on commit There were 35 benchmark results indicating a performance regression:
The full Conbench report has more details. |
return { | ||
// -- Stress the 24 bits of precision of a float | ||
// 2**63 + 2**40 | ||
FromFloatTestParam{9.223373e+18f, 19, 0, "9223373136366403584"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou Hi, the expected return value of FromFloatTestParam{5.76460752e13f, 18, 4, "57646075230342.3488"}
is 57646073774080.0000
, which seems different from the original value, does that meet expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you call "the original value" in that context?
Both values are actually equal in float32
:
>>> np.float32(5.76460752e13) == np.float32(57646073774080)
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you call "the original value" in that context? Both values are actually equal in
float32
:>>> np.float32(5.76460752e13) == np.float32(57646073774080) True
I see what you mean. Thanks!
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as:
The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.