-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Define and use VariantDecimalType trait #8562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@alamb @liamzwbao thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks like a nice cleanup / consolidation of responsibility to me me -- thanks @scovich
| } | ||
| } | ||
|
|
||
| /// Trait to unify decimal unshredding across Decimal32/64/128 types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice way to avoid duplication in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely nice. It directly inspired this PR and the new VariantDecimalType, and you'll notice that DecimalUnshredRowBuilder -- which used to take DecimalSpec and invoke its into_variant method -- now takes VariantDecimalType and invokes its try_new_with_signed_scale constructor.
Do you feel that this trait captured some redundancy or boilerplate that the new trait misses?
| impl $struct_name { | ||
| /// Attempts to create a new instance of this decimal type, failing if the value or | ||
| /// scale is too large. | ||
| pub fn try_new(integer: $native, scale: u8) -> Result<Self, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you can avoid a macro and instead give try_new a default_impl in the trait definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trait-provided methods can only work with methods and associated types that trait provides (or pulls in as a type constraint on Self). Which unfortunately doesn't include the actual struct members.
Also, unless we're willing to take a dependency on the num crate, there are no traits on integer types that can be used to make a generic validation method, because i32::unsigned_abs is a completely different and unrelated function than i64::unsigned_abs, as far as rustc is concerned. Without a trait to connect them, their similar names mean nothing to the compiler.
I guess I could require PartialOrd and then split the check in two?
if integer > Self::MAX_UNSCALED_VALUE || integer < -Self::MAX_UNSCALED_VALUE {🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the split is nice (I'll push that), but any attempt at a generic checker needs so many type constraints on VariantDecimalType::Native that the resulting code was significantly bigger and harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, there are really nice helper traits for working with arrow primitive numeric types... but they're in the arrow-compute crate which variant (intentionally) doesn't depend on.
| Self::try_new(integer, scale) | ||
| } | ||
|
|
||
| fn try_new_with_signed_scale(integer: $native, scale: i8) -> Result<Self, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I originally hoped this could be a provided trait method, but checked_pow and checked_mul methods on integers are not available through any common trait (unless we take a dependency on the num crate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just go with this implementation for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice! could also benefit the variant to decimal PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for this work.
| DataType::Decimal32(_, _) | ||
| | DataType::Decimal64(_, _) | ||
| | DataType::Decimal128(_, _) | ||
| | DataType::Decimal256(_, _) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the match arm of typed_value, and Variant only have Decimal4/Decimal8/Decimal16, is there any chance the DataType::Decimal256(_, _) will happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal256 is not a valid shredding type:
Shredded values must use the following Parquet types:
Variant Type Parquet Physical Type Parquet Logical Type boolean BOOLEAN int8 INT32 INT(8, signed=true) int16 INT32 INT(16, signed=true) int32 INT32 int64 INT64 float FLOAT double DOUBLE decimal4 INT32 DECIMAL(P, S) decimal8 INT64 DECIMAL(P, S) decimal16 BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY DECIMAL(P, S) date INT32 DATE time INT64 TIME(false, MICROS) timestamptz(6) INT64 TIMESTAMP(true, MICROS) timestamptz(9) INT64 TIMESTAMP(true, NANOS) timestampntz(6) INT64 TIMESTAMP(false, MICROS) timestampntz(9) INT64 TIMESTAMP(false, NANOS) binary BINARY string BINARY STRING uuid FIXED_LEN_BYTE_ARRAY[len=16] UUID array GROUP; see Arrays below LIST object GROUP; see Objects below
| macro_rules! define_row_builder { | ||
| ( | ||
| struct $name:ident<$lifetime:lifetime $(, $generic:ident: $bound:path )?> | ||
| struct $name:ident<$lifetime:lifetime $(, $generic:ident $( : $bound:path )? )*> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the bound optional, because decimals use the where clause instead for better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also allow multiple generic params instead of just one
| { | ||
| array: &$lifetime $array_type, | ||
| $( $( $field: $field_type, )+ )? | ||
| _phantom: std::marker::PhantomData<($( $generic, )*)>, // capture all type params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capture all generic params automatically (decimal needs this, doesn't hurt other types)
| pub(crate) use primitive_conversion_single_value; | ||
|
|
||
| /// Convert a decimal value to a `VariantDecimal` | ||
| macro_rules! decimal_to_variant_decimal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by VariantDecimalType::try_new_with_signed_scale
| // Track the sign explicitly, in case the quotient is zero | ||
| let sign = if self.integer < 0 { "-" } else { "" }; | ||
| // Format an unsigned remainder with leading zeros and strip trailing zeros | ||
| let remainder = | ||
| format!("{:0width$}", remainder.abs(), width = self.scale as usize); | ||
| let remainder = remainder.trim_end_matches('0'); | ||
| let quotient = (self.integer / divisor).abs(); | ||
| return write!(f, "{sign}{quotient}.{remainder}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think sign is helpful? We could omit it and also omit the abs call on quotient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Oh... it's needed "in case the quotient is zero" 🤦
|
@alamb -- this should be ready for your final review+merge |
| Self::try_new(integer, scale) | ||
| } | ||
|
|
||
| fn try_new_with_signed_scale(integer: $native, scale: i8) -> Result<Self, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just go with this implementation for now
|
Thank you so much @scovich @liamzwbao and @klion26 -- this looks really nice now 👏 |
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
Rationale for this change
VariantDecimalXXstructs are structurally near-identical but lack any trait to that can expose that regularity.What changes are included in this PR?
Define and use a new
VariantDecimalTypetrait that exposes common functionality of all three variant decimal types.Are these changes tested?
Yes, existing unit tests cover the changes.
Are there any user-facing changes?
New pub trait.