Skip to content

Conversation

@scovich
Copy link
Contributor

@scovich scovich commented Oct 7, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

  • Closes #NNN.

Rationale for this change

VariantDecimalXX structs are structurally near-identical but lack any trait to that can expose that regularity.

What changes are included in this PR?

Define and use a new VariantDecimalType trait that exposes common functionality of all three variant decimal types.

Are these changes tested?

Yes, existing unit tests cover the changes.

Are there any user-facing changes?

New pub trait.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Oct 7, 2025
@scovich
Copy link
Contributor Author

scovich commented Oct 7, 2025

@alamb @liamzwbao thoughts?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like a nice cleanup / consolidation of responsibility to me me -- thanks @scovich

}
}

/// Trait to unify decimal unshredding across Decimal32/64/128 types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice way to avoid duplication in my opinion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely nice. It directly inspired this PR and the new VariantDecimalType, and you'll notice that DecimalUnshredRowBuilder -- which used to take DecimalSpec and invoke its into_variant method -- now takes VariantDecimalType and invokes its try_new_with_signed_scale constructor.

Do you feel that this trait captured some redundancy or boilerplate that the new trait misses?

impl $struct_name {
/// Attempts to create a new instance of this decimal type, failing if the value or
/// scale is too large.
pub fn try_new(integer: $native, scale: u8) -> Result<Self, ArrowError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you can avoid a macro and instead give try_new a default_impl in the trait definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trait-provided methods can only work with methods and associated types that trait provides (or pulls in as a type constraint on Self). Which unfortunately doesn't include the actual struct members.

Also, unless we're willing to take a dependency on the num crate, there are no traits on integer types that can be used to make a generic validation method, because i32::unsigned_abs is a completely different and unrelated function than i64::unsigned_abs, as far as rustc is concerned. Without a trait to connect them, their similar names mean nothing to the compiler.

I guess I could require PartialOrd and then split the check in two?

if integer > Self::MAX_UNSCALED_VALUE || integer < -Self::MAX_UNSCALED_VALUE {

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the split is nice (I'll push that), but any attempt at a generic checker needs so many type constraints on VariantDecimalType::Native that the resulting code was significantly bigger and harder to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, there are really nice helper traits for working with arrow primitive numeric types... but they're in the arrow-compute crate which variant (intentionally) doesn't depend on.

Self::try_new(integer, scale)
}

fn try_new_with_signed_scale(integer: $native, scale: i8) -> Result<Self, ArrowError> {
Copy link
Contributor Author

@scovich scovich Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I originally hoped this could be a provided trait method, but checked_pow and checked_mul methods on integers are not available through any common trait (unless we take a dependency on the num crate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just go with this implementation for now

Copy link
Contributor

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice! could also benefit the variant to decimal PR

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for this work.

DataType::Decimal32(_, _)
| DataType::Decimal64(_, _)
| DataType::Decimal128(_, _)
| DataType::Decimal256(_, _) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the match arm of typed_value, and Variant only have Decimal4/Decimal8/Decimal16, is there any chance the DataType::Decimal256(_, _) will happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal256 is not a valid shredding type:

Shredded values must use the following Parquet types:

Variant Type Parquet Physical Type Parquet Logical Type
boolean BOOLEAN
int8 INT32 INT(8, signed=true)
int16 INT32 INT(16, signed=true)
int32 INT32
int64 INT64
float FLOAT
double DOUBLE
decimal4 INT32 DECIMAL(P, S)
decimal8 INT64 DECIMAL(P, S)
decimal16 BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY DECIMAL(P, S)
date INT32 DATE
time INT64 TIME(false, MICROS)
timestamptz(6) INT64 TIMESTAMP(true, MICROS)
timestamptz(9) INT64 TIMESTAMP(true, NANOS)
timestampntz(6) INT64 TIMESTAMP(false, MICROS)
timestampntz(9) INT64 TIMESTAMP(false, NANOS)
binary BINARY
string BINARY STRING
uuid FIXED_LEN_BYTE_ARRAY[len=16] UUID
array GROUP; see Arrays below LIST
object GROUP; see Objects below

macro_rules! define_row_builder {
(
struct $name:ident<$lifetime:lifetime $(, $generic:ident: $bound:path )?>
struct $name:ident<$lifetime:lifetime $(, $generic:ident $( : $bound:path )? )*>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the bound optional, because decimals use the where clause instead for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also allow multiple generic params instead of just one

{
array: &$lifetime $array_type,
$( $( $field: $field_type, )+ )?
_phantom: std::marker::PhantomData<($( $generic, )*)>, // capture all type params
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture all generic params automatically (decimal needs this, doesn't hurt other types)

pub(crate) use primitive_conversion_single_value;

/// Convert a decimal value to a `VariantDecimal`
macro_rules! decimal_to_variant_decimal {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by VariantDecimalType::try_new_with_signed_scale

Comment on lines +187 to +194
// Track the sign explicitly, in case the quotient is zero
let sign = if self.integer < 0 { "-" } else { "" };
// Format an unsigned remainder with leading zeros and strip trailing zeros
let remainder =
format!("{:0width$}", remainder.abs(), width = self.scale as usize);
let remainder = remainder.trim_end_matches('0');
let quotient = (self.integer / divisor).abs();
return write!(f, "{sign}{quotient}.{remainder}");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think sign is helpful? We could omit it and also omit the abs call on quotient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Oh... it's needed "in case the quotient is zero" 🤦

@scovich scovich marked this pull request as ready for review October 10, 2025 16:21
@scovich scovich requested a review from alamb October 10, 2025 16:21
@scovich
Copy link
Contributor Author

scovich commented Oct 10, 2025

@alamb -- this should be ready for your final review+merge

Self::try_new(integer, scale)
}

fn try_new_with_signed_scale(integer: $native, scale: i8) -> Result<Self, ArrowError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just go with this implementation for now

@alamb
Copy link
Contributor

alamb commented Oct 14, 2025

Thank you so much @scovich @liamzwbao and @klion26 -- this looks really nice now 👏

@alamb alamb merged commit b8fdd90 into apache:main Oct 14, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants