[Variant] Define and use VariantDecimalType trait #8562

scovich · 2025-10-07T01:09:37Z

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Closes #NNN.

Rationale for this change

VariantDecimalXX structs are structurally near-identical but lack any trait to that can expose that regularity.

What changes are included in this PR?

Define and use a new VariantDecimalType trait that exposes common functionality of all three variant decimal types.

Are these changes tested?

Yes, existing unit tests cover the changes.

Are there any user-facing changes?

New pub trait.

scovich · 2025-10-07T01:10:06Z

@alamb @liamzwbao thoughts?

alamb

I think this looks like a nice cleanup / consolidation of responsibility to me me -- thanks @scovich

alamb · 2025-10-07T18:13:50Z

parquet-variant-compute/src/unshred_variant.rs

    }
 }

-/// Trait to unify decimal unshredding across Decimal32/64/128 types


this is a nice way to avoid duplication in my opinion

Definitely nice. It directly inspired this PR and the new VariantDecimalType, and you'll notice that DecimalUnshredRowBuilder -- which used to take DecimalSpec and invoke its into_variant method -- now takes VariantDecimalType and invokes its try_new_with_signed_scale constructor.

Do you feel that this trait captured some redundancy or boilerplate that the new trait misses?

alamb · 2025-10-07T18:16:02Z

parquet-variant/src/variant/decimal.rs

+        impl $struct_name {
+            /// Attempts to create a new instance of this decimal type, failing if the value or
+            /// scale is too large.
+            pub fn try_new(integer: $native, scale: u8) -> Result<Self, ArrowError> {


I wonder if you can avoid a macro and instead give try_new a default_impl in the trait definition

Trait-provided methods can only work with methods and associated types that trait provides (or pulls in as a type constraint on Self). Which unfortunately doesn't include the actual struct members.

Also, unless we're willing to take a dependency on the num crate, there are no traits on integer types that can be used to make a generic validation method, because i32::unsigned_abs is a completely different and unrelated function than i64::unsigned_abs, as far as rustc is concerned. Without a trait to connect them, their similar names mean nothing to the compiler.

I guess I could require PartialOrd and then split the check in two?

if integer > Self::MAX_UNSCALED_VALUE || integer < -Self::MAX_UNSCALED_VALUE {

🤔

Hmm, the split is nice (I'll push that), but any attempt at a generic checker needs so many type constraints on VariantDecimalType::Native that the resulting code was significantly bigger and harder to read.

BTW, there are really nice helper traits for working with arrow primitive numeric types... but they're in the arrow-compute crate which variant (intentionally) doesn't depend on.

scovich · 2025-10-07T18:59:13Z

parquet-variant/src/variant/decimal.rs

+                Self::try_new(integer, scale)
+            }
+
+            fn try_new_with_signed_scale(integer: $native, scale: i8) -> Result<Self, ArrowError> {


Note: I originally hoped this could be a provided trait method, but checked_pow and checked_mul methods on integers are not available through any common trait (unless we take a dependency on the num crate)

Let's just go with this implementation for now

liamzwbao

This looks very nice! could also benefit the variant to decimal PR

klion26

Looks great! Thanks for this work.

klion26 · 2025-10-10T12:58:47Z

parquet-variant-compute/src/unshred_variant.rs

+            DataType::Decimal32(_, _)
+            | DataType::Decimal64(_, _)
+            | DataType::Decimal128(_, _)
+            | DataType::Decimal256(_, _) => {


This is the match arm of typed_value, and Variant only have Decimal4/Decimal8/Decimal16, is there any chance the DataType::Decimal256(_, _) will happen?

Decimal256 is not a valid shredding type:

Shredded values must use the following Parquet types:

Variant Type Parquet Physical Type Parquet Logical Type

boolean BOOLEAN

int8 INT32 INT(8, signed=true)

int16 INT32 INT(16, signed=true)

int32 INT32

int64 INT64

float FLOAT

double DOUBLE

decimal4 INT32 DECIMAL(P, S)

decimal8 INT64 DECIMAL(P, S)

decimal16 BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY DECIMAL(P, S)

date INT32 DATE

time INT64 TIME(false, MICROS)

timestamptz(6) INT64 TIMESTAMP(true, MICROS)

timestamptz(9) INT64 TIMESTAMP(true, NANOS)

timestampntz(6) INT64 TIMESTAMP(false, MICROS)

timestampntz(9) INT64 TIMESTAMP(false, NANOS)

binary BINARY

string BINARY STRING

uuid FIXED_LEN_BYTE_ARRAY[len=16] UUID

array GROUP; see Arrays below LIST

object GROUP; see Objects below

scovich · 2025-10-10T15:28:49Z

parquet-variant-compute/src/arrow_to_variant.rs

 macro_rules! define_row_builder {
    (
-        struct $name:ident<$lifetime:lifetime $(, $generic:ident: $bound:path )?>
+        struct $name:ident<$lifetime:lifetime $(, $generic:ident $( : $bound:path )? )*>


Make the bound optional, because decimals use the where clause instead for better readability.

Also allow multiple generic params instead of just one

scovich · 2025-10-10T15:29:44Z

parquet-variant-compute/src/arrow_to_variant.rs

        {
            array: &$lifetime $array_type,
            $( $( $field: $field_type, )+ )?
+            _phantom: std::marker::PhantomData<($( $generic, )*)>, // capture all type params


Capture all generic params automatically (decimal needs this, doesn't hurt other types)

scovich · 2025-10-10T15:30:33Z

parquet-variant-compute/src/type_conversion.rs

 pub(crate) use primitive_conversion_single_value;
-
-/// Convert a decimal value to a `VariantDecimal`
-macro_rules! decimal_to_variant_decimal {


Replaced by VariantDecimalType::try_new_with_signed_scale

scovich · 2025-10-10T16:10:53Z

parquet-variant/src/variant/decimal.rs

+                        // Track the sign explicitly, in case the quotient is zero
+                        let sign = if self.integer < 0 { "-" } else { "" };
+                        // Format an unsigned remainder with leading zeros and strip trailing zeros
+                        let remainder =
+                            format!("{:0width$}", remainder.abs(), width = self.scale as usize);
+                        let remainder = remainder.trim_end_matches('0');
+                        let quotient = (self.integer / divisor).abs();
+                        return write!(f, "{sign}{quotient}.{remainder}");


I don't think sign is helpful? We could omit it and also omit the abs call on quotient.

Update: Oh... it's needed "in case the quotient is zero" 🤦

scovich · 2025-10-10T16:21:58Z

@alamb -- this should be ready for your final review+merge

alamb · 2025-10-14T16:09:27Z

parquet-variant/src/variant/decimal.rs

+                Self::try_new(integer, scale)
+            }
+
+            fn try_new_with_signed_scale(integer: $native, scale: i8) -> Result<Self, ArrowError> {


Let's just go with this implementation for now

alamb · 2025-10-14T16:10:10Z

Thank you so much @scovich @liamzwbao and @klion26 -- this looks really nice now 👏

scovich added 2 commits October 6, 2025 16:20

[Variant] Define and use VariantDecimalType trait

32bc99c

additional cleanup

9f83ab8

github-actions bot added the parquet-variant parquet-variant* crates label Oct 7, 2025

alamb reviewed Oct 7, 2025

View reviewed changes

alamb mentioned this pull request Oct 7, 2025

[Variant] Support variant to Decimal32/64/128/256 #8552

Merged

scovich commented Oct 7, 2025

View reviewed changes

better approach to MAX_UNSCALED_VALUE

bd7f3a7

liamzwbao reviewed Oct 8, 2025

View reviewed changes

klion26 approved these changes Oct 10, 2025

View reviewed changes

scovich added 3 commits October 10, 2025 08:00

simpler macro

e6c88b9

simpler phantoms

3f66de4

another tweak

be9e2c9

scovich commented Oct 10, 2025

View reviewed changes

scovich added 3 commits October 10, 2025 09:15

minimize diff

0b9c5df

tiny tweak

a731191

Merge remote-tracking branch 'oss/main' into variant-decimal-trait

6c0823a

scovich marked this pull request as ready for review October 10, 2025 16:21

scovich requested a review from alamb October 10, 2025 16:21

alamb approved these changes Oct 14, 2025

View reviewed changes

alamb merged commit b8fdd90 into apache:main Oct 14, 2025
17 checks passed

Variant Type	Parquet Physical Type	Parquet Logical Type
boolean	BOOLEAN
int8	INT32	INT(8, signed=true)
int16	INT32	INT(16, signed=true)
int32	INT32
int64	INT64
float	FLOAT
double	DOUBLE
decimal4	INT32	DECIMAL(P, S)
decimal8	INT64	DECIMAL(P, S)
decimal16	BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY	DECIMAL(P, S)
date	INT32	DATE
time	INT64	TIME(false, MICROS)
timestamptz(6)	INT64	TIMESTAMP(true, MICROS)
timestamptz(9)	INT64	TIMESTAMP(true, NANOS)
timestampntz(6)	INT64	TIMESTAMP(false, MICROS)
timestampntz(9)	INT64	TIMESTAMP(false, NANOS)
binary	BINARY
string	BINARY	STRING
uuid	FIXED_LEN_BYTE_ARRAY[len=16]	UUID
array	GROUP; see Arrays below	LIST
object	GROUP; see Objects below

[Variant] Define and use VariantDecimalType trait #8562

[Variant] Define and use VariantDecimalType trait #8562

Uh oh!

Conversation

scovich commented Oct 7, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

scovich commented Oct 7, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liamzwbao left a comment

Choose a reason for hiding this comment

Uh oh!

klion26 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich commented Oct 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scovich Oct 7, 2025 •

edited

Loading