More decimal 32/64 support - type coercsion and misc gaps #17808

AdamGS · 2025-09-27T11:22:51Z

Which issue does this PR close?

This is a followup of #17501 and #17489 (Should I open a dedicated issue?).

The changes are taken out of #17759 as the SQL-facing changes ended up making more of an impact than I expected, and should probably be discussed in #17747.

Rationale for this change

Improve support for working with decimal columns of different width

What changes are included in this PR?

The biggest change in this PR is that type coercion for binary expression now not only supports the new decimal types, but it will try and extend one side into the other, which would just fail right now.

Minor changes

Support abs(deicmal32/64)
Support SMJ on Decimal32/64 (256 was left out, and I'm unclear if it was overlooked or omitted intentionally).

Are these changes tested?

Additional coercion tests

Are there any user-facing changes?

No public API changes, but more combination of operations should work.

xudong963

Thank you @AdamGS

datafusion/expr-common/src/type_coercion/binary.rs

Jefffrey · 2025-09-29T03:38:01Z

datafusion/physical-plan/src/joins/sort_merge_join/stream.rs

+            DataType::Decimal32(..) => compare_value!(Decimal32Array),
+            DataType::Decimal64(..) => compare_value!(Decimal64Array),


Yeah I am curious why 256 is omitted? Perhaps we can add it in if there's no compiler error doing so?

Sure, I'll add it. Through this work I've noticed a few places where it was omitted, but I've yet to find a reason or even a failing test that would explain it, usually its just a matter of adding a branch to a match statement like here.

Jefffrey · 2025-09-29T03:38:40Z

datafusion/spark/src/function/math/width_bucket.rs

-                Int8 | Int16 | Int32 | Int64 | Float32 | Float64 | Decimal128(_, _)
+                Int8 | Int16
+                    | Int32
+                    | Int64
+                    | Float32
+                    | Float64
+                    | Decimal32(_, _)
+                    | Decimal64(_, _)
+                    | Decimal128(_, _)


Can use is_signed_numeric() here potentially (though that brings in Float16)

That seems like its probably fine, also talked to my local Spark expert and seems like it should be fine (+ I assume there are other places where we deal with the conversion to spark's type system)

Jefffrey · 2025-09-29T03:41:41Z

datafusion/expr-common/src/type_coercion/binary.rs

+    let s = s1.max(s2);
+    let range = (p1 as i8 - s1).max(p2 as i8 - s2);
+    let required_precision = (range + s) as u8;


What happens if we have:

Decimal256 with precision 76 (max) and scale 0, and Decimal128 with precision 38 (max) with scale 1;

So s = 1, range = 76, required_precision = 76 + 1 -> overflow?

Is this a valid case?

I don't think an overflow is valid, I'll have to think about it and maybe look into solutions in other systems.
We can also just return None in that case, which should force the user to add an explicit cast to one side.

I looked around a bit, and what I could find is:

DataFusion already has multiple issues regarding cast overflow/precision loss (decimal calculate overflow but not throw error #16406, Datafusion downcasts decimal loosing precision #13492), which I'm happy to take on but are unrelated here.

Spark (which seems to be the main inspiration for this code) has a configuration to control how it handles these cases (here and here).

I'm not sure what's the desired behavior regarding precision loss (should it be configurable? Is there currently an accepted desired behavior?), I think for this PR it should be fine to just return None if the precision overflows, and take the bigger conversation into an issue where people can weigh in, and I'll be glad to take that forward. What do you think?

I think returning None in cases like this for this PR is fine 👍

Done as part of fd1f043

Cheers; left another minor comment related to the check below. Also would be nice if we had a test for this edge case.

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

Jefffrey · 2025-09-29T12:12:07Z

datafusion/expr-common/src/type_coercion/binary.rs

+    // We currently don't handle cases where the required precision overflows
+    if required_precision > DECIMAL256_MAX_PRECISION {
+        return None;
+    }
+
+    // Choose the larger variant between the two input types
+    match (lhs_type, rhs_type) {
+        (Decimal32(_, _), Decimal64(_, _)) | (Decimal64(_, _), Decimal32(_, _)) => {
+            Some(Decimal64(required_precision, s))
+        }
+        (Decimal32(_, _), Decimal128(_, _)) | (Decimal128(_, _), Decimal32(_, _)) => {
+            Some(Decimal128(required_precision, s))
+        }
+        (Decimal32(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal32(_, _)) => {
+            Some(Decimal256(required_precision, s))
+        }
+        (Decimal64(_, _), Decimal128(_, _)) | (Decimal128(_, _), Decimal64(_, _)) => {
+            Some(Decimal128(required_precision, s))
+        }
+        (Decimal64(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal64(_, _)) => {
+            Some(Decimal256(required_precision, s))
+        }
+        (Decimal128(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal128(_, _)) => {
+            Some(Decimal256(required_precision, s))
+        }


Suggested change

// We currently don't handle cases where the required precision overflows

if required_precision > DECIMAL256_MAX_PRECISION {

return None;

}

// Choose the larger variant between the two input types

match (lhs_type, rhs_type) {

(Decimal32(_, _), Decimal64(_, _)) | (Decimal64(_, _), Decimal32(_, _)) => {

Some(Decimal64(required_precision, s))

}

(Decimal32(_, _), Decimal128(_, _)) | (Decimal128(_, _), Decimal32(_, _)) => {

Some(Decimal128(required_precision, s))

}

(Decimal32(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal32(_, _)) => {

Some(Decimal256(required_precision, s))

}

(Decimal64(_, _), Decimal128(_, _)) | (Decimal128(_, _), Decimal64(_, _)) => {

Some(Decimal128(required_precision, s))

}

(Decimal64(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal64(_, _)) => {

Some(Decimal256(required_precision, s))

}

(Decimal128(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal128(_, _)) => {

Some(Decimal256(required_precision, s))

}

// Choose the larger variant between the two input types

match (lhs_type, rhs_type) {

(Decimal32(_, _), Decimal64(_, _)) | (Decimal64(_, _), Decimal32(_, _)) if required_precision <= DECIMAL64_MAX_PRECISION => {

Some(Decimal64(required_precision, s))

}

(Decimal32(_, _), Decimal128(_, _)) | (Decimal128(_, _), Decimal32(_, _)) if required_precision <= DECIMAL128_MAX_PRECISION => {

Some(Decimal128(required_precision, s))

}

(Decimal32(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal32(_, _)) if required_precision <= DECIMAL64_MAX_PRECISION => {

Some(Decimal256(required_precision, s))

}

(Decimal64(_, _), Decimal128(_, _)) | (Decimal128(_, _), Decimal64(_, _)) if required_precision <= DECIMAL128_MAX_PRECISION => {

Some(Decimal128(required_precision, s))

}

(Decimal64(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal64(_, _)) if required_precision <= DECIMAL64_MAX_PRECISION => {

Some(Decimal256(required_precision, s))

}

(Decimal128(_, _), Decimal256(_, _)) | (Decimal256(_, _), Decimal128(_, _)) if required_precision <= DECIMAL64_MAX_PRECISION => {

Some(Decimal256(required_precision, s))

}

Do we need to account for the other decimal types like so?

I'll add some tests to make sure what's the correct thing here is.

done as part of 4145a04

Jefffrey · 2025-09-29T12:12:31Z

datafusion/expr-common/src/type_coercion/binary.rs

+    let s = s1.max(s2);
+    let range = (p1 as i8 - s1).max(p2 as i8 - s2);
+    let required_precision = (range + s) as u8;


Cheers; left another minor comment related to the check below. Also would be nice if we had a test for this edge case.

Jefffrey · 2025-09-29T12:18:55Z

datafusion/expr-common/src/type_coercion/binary.rs

 pub fn decimal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
    use arrow::datatypes::DataType::*;

    match (lhs_type, rhs_type) {


This might be cleaner like so:

/// Decimal coercion rules. pub fn decimal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { use arrow::datatypes::DataType::*; // Prefer decimal data type over floating point for comparison operation match (lhs_type, rhs_type) { // Same decimal types (lhs_type, rhs_type) if std::mem::discriminant(lhs_type) == std::mem::discriminant(rhs_type) => { get_wider_decimal_type(lhs_type, rhs_type) } // Mismatched decimal types (lhs_type, rhs_type) if is_decimal(lhs_type) && is_decimal(rhs_type) && std::mem::discriminant(lhs_type) != std::mem::discriminant(rhs_type) => { get_wider_decimal_type_cross_variant(lhs_type, rhs_type) } // Decimal + non-decimal types (Decimal32(_, _) | Decimal64(_, _) | Decimal128(_, _) | Decimal256(_, _), _) | (_, Decimal32(_, _) | Decimal64(_, _) | Decimal128(_, _) | Decimal256(_, _)) => { get_common_decimal_type(lhs_type, rhs_type) } (_, _) => None, } }

Following what was done above

Oops forgot the is_decimal() checks for the first branch

yeah I've added them locally :) should have something soon

done as part of 4145a04

AdamGS · 2025-09-29T12:42:06Z

Added tests to check for precision overflow. This really brings things close to making decimal a logical type, which would be nice but would also make it behave really differently than any other type.

Jefffrey

LGTM 👍

* More small decimal support * CR comments Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Add tests and cleanup some code --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

More small decimal support

4145a04

github-actions bot added logical-expr Logical plan and expressions common Related to common crate functions Changes to functions implementation physical-plan Changes to the physical-plan crate spark labels Sep 27, 2025

xudong963 approved these changes Sep 28, 2025

View reviewed changes

Jefffrey reviewed Sep 29, 2025

View reviewed changes

AdamGS force-pushed the adamg/fill-decimal-gaps branch from 62eb314 to b4c7400 Compare September 29, 2025 10:58

CR comments

fd1f043

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

AdamGS force-pushed the adamg/fill-decimal-gaps branch from b4c7400 to fd1f043 Compare September 29, 2025 11:05

AdamGS mentioned this pull request Sep 29, 2025

Release DataFusion 50.1.0 (minor) #17594

Closed

18 tasks

Jefffrey reviewed Sep 29, 2025

View reviewed changes

Add tests and cleanup some code

f0b7433

Jefffrey approved these changes Sep 29, 2025

View reviewed changes

Jefffrey added this pull request to the merge queue Sep 30, 2025

Merged via the queue into apache:main with commit 032117a Sep 30, 2025
28 checks passed

This was referenced Sep 30, 2025

Convert decimal dtype to arrrow's decimal32/64 vortex-data/vortex#4668

Open

Release DataFusion 50.2.0 (minor) #17849

Open

AdamGS mentioned this pull request Oct 3, 2025

[branch-50] Backport: More decimal 32/64 support #17908

Closed

		DataType::Decimal32(..) => compare_value!(Decimal32Array),
		DataType::Decimal64(..) => compare_value!(Decimal64Array),

More decimal 32/64 support - type coercsion and misc gaps #17808

More decimal 32/64 support - type coercsion and misc gaps #17808

Uh oh!

Conversation

AdamGS commented Sep 27, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Minor changes

Are these changes tested?

Are there any user-facing changes?

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AdamGS commented Sep 29, 2025

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!