Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Decimal Division Coercion #6828

Closed
tustvold opened this issue Jul 3, 2023 · 10 comments · Fixed by #6832
Closed

Incorrect Decimal Division Coercion #6828

tustvold opened this issue Jul 3, 2023 · 10 comments · Fixed by #6832
Labels
bug Something isn't working

Comments

@tustvold
Copy link
Contributor

tustvold commented Jul 3, 2023

Describe the bug

Currently when performing decimal division coercion_decimal_mathematics_type as called by BinaryExpr::evaluate will coerce both inputs to the wider precision type. The division kernel will then scale the left hand side by the output scale.

To see why this is an issue consider

1 DECIMAL(38, 20) / 5 DECIMAL(38, 0)

This computation shouldn't overflow as it can just perform 10e20 / 5

The issue is that the coercion logic will scale the right hand side to 5e20 requiring it to also scale the left hand side to 10e40 which will overflow

To Reproduce

❯ create table foo (a DECIMAL(38, 20), b DECIMAL(38, 0));
0 rows in set. Query took 0.001 seconds.
❯ insert into foo VALUES (1, 1000000000000);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row in set. Query took 0.003 seconds.
❯ select a / b from foo;
Decimal128(38, 20) vs Decimal128(38, 20)
+------------------------------------------+
| foo.a / foo.b                            |
+------------------------------------------+
| 0.00000000000000000000000000000000743222 |
+------------------------------------------+
1 row in set. Query took 0.004 seconds.

Expected behavior

No response

Additional context

#6794

@tustvold tustvold added the bug Something isn't working label Jul 3, 2023
@viirya
Copy link
Member

viirya commented Jul 3, 2023

The division kernel will then scale the left hand side by the output scale.

This scaling should allow precision loss instead of overflow directly.

@tustvold
Copy link
Contributor Author

tustvold commented Jul 3, 2023

This scaling should allow precision loss instead of overflow directly.

The point here is that no overflow should occur, the only reason overflow occurs is because the type coercion machinery isn't respecting the input types. We don't need to perform fixed point multiplication here if we're sensible about preserving the input scale

Edit: see https://github.com/apache/arrow-rs/pull/4465/files#diff-12bc5282f3ea1d5a1e2efeacbede6aa8ac7308498533f527d43705398cdb56eaR501

@viirya
Copy link
Member

viirya commented Jul 3, 2023

Read through the diff quickly.

let mul_pow = result_scale - (s2 - s1);

I think result_scale is min(38, 20 + 38 + 1) = 38?

Hmm, for the described case, isn't mul_pow equal to 58 (38 - (0 - 20))? As I assume s1, s2 are original scales of lhs, rhs arrays.

The issue is that the coercion logic will scale the right hand side to 5e20 requiring it to also scale the left hand side to 10e40 which will overflow

So scaling the left hand side to 10e58 won't overflow? Am I missing any point here? 🤔

@tustvold
Copy link
Contributor Author

tustvold commented Jul 3, 2023

isn't mul_pow equal to 58 (38 - (0 - 20))

Yeah, I need to continue to iterate on this, it's all still a WIP. But it seems inherently wrong that dividing by a value greater than 1 could result in overflow and requires fixed point multiplication.

@tustvold
Copy link
Contributor Author

tustvold commented Jul 3, 2023

Interestingly duckdb appears to not support decimal division at all

>>> duckdb.sql('select cast(5 as DECIMAL(38, 10)) / cast(10 as DECIMAL(38, 10))')
┌──────────────────────────────────────────────────────────┐
│ (CAST(5 AS DECIMAL(38,10)) / CAST(10 AS DECIMAL(38,10))) │
│                          double                          │
├──────────────────────────────────────────────────────────┤
│                                                      0.5 │
└──────────────────────────────────────────────────────────┘

Mysql appears to just increment the left hand scale by 4

DECIMAL(38, 10) / DECIMAL(38, 10) -> DECIMAL(52, 14)
DECIMAL(11, 10) / DECIMAL(2, 0) -> DECIMAL(15, 14)
DECIMAL(11, 10) / DECIMAL(13, 12) -> DECIMAL(27, 14)
DECIMAL(11, 10) / DECIMAL(17, 16) -> DECIMAL(31, 14)

Postgres appears to do something similar - https://github.com/postgres/postgres/blob/29cf61ade3f245aa40f427a1d6345287ef77e622/src/interfaces/ecpg/pgtypeslib/numeric.c#L1047

Interestingly the Hive specification states

But one thing is clear, for scale resulting from a division, the scale of the result is s1 plus a
system-wide increment, which has a default 4

But then goes on to show a table with something different

image

I think using a fixed increment of the dividend's scale makes a whole lot more sense than a value computed based on the divisor's precision, which just seems to be a recipe for overflow.

Edit: I have updated apache/arrow-rs#4465 to do this

@comphead
Copy link
Contributor

Adding more test cases to be validated on DF

Looks like there is a series of bugs on decimal multiply/division in DF, so this qury in PG gives me all the same numbers

select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) / 1,
       cast(6.4053151420411946063694043751862251568 as decimal(38,37)) / 1.0,
       cast(6.4053151420411946063694043751862251568 as decimal(38,37)) * 1,
       cast(6.4053151420411946063694043751862251568 as decimal(38,37)) * 1.0

6.40531514204119460636940437518622515680

in DF division not happening at all, multiply gives diff results

❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) / 1;
Optimizer rule 'simplify_expressions' failed
caused by
Arrow error: Compute error: Overflow happened on: 64053151420411946063694043751862251568 * 100000000000000000000000000000000000000
❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) / 1.0;
Optimizer rule 'simplify_expressions' failed
caused by
Arrow error: Compute error: Overflow happened on: 64053151420411946063694043751862251568 * 100000000000000000000000000000000000000
❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) * 1;
+---------------------------------------------------------------------------+
| Decimal128(Some(64053151420411946063694043751862251568),38,37) * Int64(1) |
+---------------------------------------------------------------------------+
| 6.4053151420411946063694043751862251568                                   |
+---------------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.
❯ select cast(6.4053151420411946063694043751862251568 as decimal(38,37)) * 1.0;
+-------------------------------------------------------------------------------------------+
| Decimal128(Some(64053151420411946063694043751862251568),38,37) * Decimal128(Some(10),2,1) |
+-------------------------------------------------------------------------------------------+
| -0.40033219637757466289808777344913907232                                                 |
+-------------------------------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.

@viirya
Copy link
Member

viirya commented Jul 30, 2023

The overflow could be fixed after #6832, I think. Currently decimal multiplication with scalar doesn't handle overflow correctly.

@westonpace
Copy link
Member

westonpace commented Aug 3, 2023

I haven't followed the entire discussion but, FWIW, Substrait currently agrees with the max(6, s1 + p2 + 1) formulation. SQL server also uses this formula. Let me also preface this description with the fact that this topic confuses me all the time so there is a chance I'm way off base here.

Postgres appears to do something similar - https://github.com/postgres/postgres/blob/29cf61ade3f245aa40f427a1d6345287ef77e622/src/interfaces/ecpg/pgtypeslib/numeric.c#L1047

I'm not sure this bears out empirically. Using https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/9659 I get the following result from CAST(1 AS DECIMAL(3, 0)) / CAST(1000000) AS DECIMAL(7, 0)) = 0.000001000000000000000000 which has a scale of at least 7 which is greater than 0+4.

I think using a fixed increment of the dividend's scale makes a whole lot more sense than a value computed based on the divisor's precision, which just seems to be a recipe for overflow.

I think the idea of involving the divisor's precision is for cases like:

1 / 1000000000

which just seems to be a recipe for overflow.
This scaling should allow precision loss instead of overflow directly.

Agreed. In both SQL server and Substrait there is a second step to help with the overflow problem. This second step applies to all decimal arithmetic operations. The rule is basically like this:

If, after following the formulas, the resulting precision / scale is out of bounds, then sacrifice scale (e.g. throw away the really insignificant digits on the far right) but keep at least 6 digits of scale. If you can't keep at least 6 digits of scale (e.g. the number requires 33 digits or more) then overflow. The 6 is absolutely arbitrary but it works.

Note, that the SQL server rule is slightly more complex, but more or less the same thing:

In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to store the integral part of the decimal number. If > there isn't enough space to store them (that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale), the scale is reduced to provide > enough space for the integral part. The resulting scale is min(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be > rounded to fit into the resulting scale.

In multiplication and division operations, we need precision - scale places to store the integral part of the result. The scale might > be reduced using the following rules:

  • The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the integral part is less than 32, because it can't be greater than 38 - (precision-scale). The result might be rounded in this case.
  • The scale isn't changed if it's less than 6 and if the integral part is greater than 32. In this case, an overflow error might be raised if it can't fit into decimal(38, scale).
  • The scale is set to 6 if it's greater than 6 and if the integral part is greater than 32. In this case, both the integral part and scale would be reduced and resulting type is decimal(38, 6). The result might be rounded to 6 decimal places, or the overflow error is thrown if the integral part can't fit into 32 digits.

@westonpace
Copy link
Member

Also, if it helps, here are the calcite rules, which should be the same as, and were the motivation for, the substrait rules.

@tustvold
Copy link
Contributor Author

tustvold commented Aug 3, 2023

I've attempted to adjust this in apache/arrow-rs#4640 PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants