-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Decimal{128,256}::FromReal accuracy loss on non-small scale values #35576
Comments
the first part of the error is down-casting from float64 (python default representation to)->float32
Same happens with numpy:
The second part of the error I think is likely due to implementation which looks like we somehow might do an extra cast through an intermediate value:
gives:
I think the second source of error might be: arrow/cpp/src/arrow/util/decimal.cc Line 158 in cd6e2a4
since this looks like it it is done in float space (instead of casting to double) which potentially causes further loss of precision. |
>>> pa.array([1234567890.]).cast(pa.decimal128(38, 18))
<pyarrow.lib.Decimal128Array object at 0x7f05f4a3f0a0>
[
1234567889.999999973827018752
] |
There are also very strange precision-related phenomena going on: >>> pa.array([1234567890.]).cast(pa.decimal128(38, 10))
<pyarrow.lib.Decimal128Array object at 0x7f05f49238e0>
[
1234567890.0000000000
]
>>> pa.array([1234567890.]).cast(pa.decimal128(38, 11))
<pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0>
[
1234567889.99999995904
]
>>> pa.array([1234567890.]).cast(pa.decimal128(38, 12))
<pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0>
[
1234567890.000000057344
] |
@rok @felipecrv Does any of you want to take a look at this? |
The issue is with the algorithm used in >>> powers_of_ten = [1e-38, 1e-37, 1e-36, 1e-35, 1e-34, 1e-33, 1e-32, 1e-31, 1e-30, 1e-29, 1e-28,
...: 1e-27, 1e-26, 1e-25, 1e-24, 1e-23, 1e-22, 1e-21, 1e-20, 1e-19, 1e-18, 1e-17,
...: 1e-16, 1e-15, 1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6,
...: 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3, 1e4, 1e5,
...: 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16,
...: 1e17, 1e18, 1e19, 1e20, 1e21, 1e22, 1e23, 1e24, 1e25, 1e26, 1e27,
...: 1e28, 1e29, 1e30, 1e31, 1e32, 1e33, 1e34, 1e35, 1e36, 1e37, 1e38]
>>> int(powers_of_ten[38+10]*1234567890.)
12345678900000000000
>>> int(powers_of_ten[38+11]*1234567890.)
123456788999999995904
>>> int(powers_of_ten[38+12]*1234567890.)
1234567890000000057344 Also it looks like I'm the culprit: |
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as: ``` >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11)) <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0> [ 1234567889.99999995904 ] >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12)) <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0> [ 1234567890.000000057344 ] ``` The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as: ``` >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11)) <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0> [ 1234567889.99999995904 ] >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12)) <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0> [ 1234567890.000000057344 ] ``` The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as: ``` >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11)) <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0> [ 1234567889.99999995904 ] >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12)) <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0> [ 1234567890.000000057344 ] ``` The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm.
The original algorithm for real-to-decimal conversion did its computations in the floating-point domain, accumulating rounding errors especially for large scale or precision values, such as: ``` >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11)) <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0> [ 1234567889.99999995904 ] >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12)) <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0> [ 1234567890.000000057344 ] ``` The new algorithm strives to avoid precision loss by doing all its computations in the decimal domain. However, negative scales, which are presumably infrequent, fall back on the old algorithm. * Closes: #35576 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the bug, including details regarding any error messages, version, and platform.
While encoding of decimal128 and float32 is very different, the difference after the cast seems much larger than expected.
Component(s)
Python
The text was updated successfully, but these errors were encountered: