-
Couldn't load subscription status.
- Fork 1k
Refactor arrow-cast decimal casting to unify the rescale logic used in Parquet variant casts #8689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor arrow-cast decimal casting to unify the rescale logic used in Parquet variant casts #8689
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow-cast/src/cast/decimal.rs
Outdated
| } | ||
|
|
||
| pub(crate) fn cast_decimal_to_decimal_error<I, O>( | ||
| fn cast_decimal_to_decimal_error<I, O>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
downgrade the visibility since it's only used in this file
arrow-cast/src/cast/decimal.rs
Outdated
| let is_infallible_cast = (input_precision as i8) + delta_scale <= (output_precision as i8); | ||
| let f_infallible = is_infallible_cast | ||
| .then_some(move |x| O::Native::from_decimal(x).unwrap().mul_wrapping(mul)); | ||
| Some((f, f_infallible)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chose to return f_infallible instead of is_infallible_cast because, unlike make_downscaler, we cannot derive an infallible closure from f. So to keep the interface consistent, I applied the same approach to make_downscaler to return (f, f_infallible) as well.
|
Benchmarked locally on an M3 Max Mac, and the results show no performance improvement or regression against the main branch. command used: git checkout a7572eb6
cargo bench -p arrow-cast --bench parse_decimal -- --save-baseline main
git checkout issue-8670-decimal-cast-refactor
cargo bench -p arrow-cast --bench parse_decimal -- --baseline main |
|
🤖 |
|
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @liamzwbao -- I ran benchmarks and agree with your assesment that this PR does not change the performance
However, I am somewhat concerned with the APIs that are proposed to make pub -- they implementations seem to rely on the caller correctly comparing / calling make_upscaler / make_downscaler which seems error prone to me (and the code does not really validate any of the inputs)
What if you left make_upscaler / make_downscaler private, and instead added a single new API like this, which validated the input and then called make_upscaler / make_downscaler appropriately?
pub fn make_scaler<I: DecimalType, O: DecimalType>(
input_precision: u8,
input_scale: i8,
output_precision: u8,
output_scale: i8,
) -> Result<Option<(
impl Fn(I::Native) -> Option<O::Native>,
Option<impl Fn(I::Native) -> O::Native>,
)>>
{
}| // then an increase of scale by 3 will have the following effect on the representation: | ||
| // [xxxxx] -> [xxxxx000], so for the cast to be infallible, the output type | ||
| // needs to provide at least 8 digits precision | ||
| let is_infallible_cast = (input_precision as i8) + delta_scale <= (output_precision as i8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the old code did this too, but it seems like the cast as i8 could porentially convert a number larger than 128 to a negative number : -- maybe that is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this becomes an issue once we expose the API. Normally it isn’t a concern, since the precision is guaranteed to stay under 128. Let me redesign the public interface
4fe7d19 to
252eee7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, I attempted to unify the API, but the return types of make_upscaler and make_downscaler are closures with incompatible types unless we box them. Because of that, I decided to move rescale_decimal from parquet-variant-compute into arrow-cast and expose it for use in the variant conversion.
I’ll add some tests for the new rescale_decimal API next. WDYT?
252eee7 to
49e72cd
Compare
|
🤖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @liamzwbao
I kicked off some benchmarks to verify that the performance is the same, but I expect that it will be
| I::Native: DecimalCast + ArrowNativeTypeOp, | ||
| O::Native: DecimalCast + ArrowNativeTypeOp, | ||
| { | ||
| let array = if let Some(f_infallible) = f_infallible { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very nice formulation now
|
🤖: Benchmark completed Details
|
|
Thanks again @liamzwbao and @scovich |
Which issue does this PR close?
Rationale for this change
We currently have two separate code paths that both handle decimal casting between different (precision, scale) pairs. Without unifying the logic, a fix in one place often needs to be duplicated in the other (e.g., #8579 fixed the
arrow-castand #8552 fixed the
parquet-variant-compute), which can easily lead to divergence when contributors lack full context. This PR consolidates the decimal rescale logic for botharrow-castandparquet-variant-compute.What changes are included in this PR?
convert_to_smaller_scale_decimalandconvert_to_bigger_or_equal_scale_decimalintoapply_decimal_castmake_upscalerandmake_downscalerso that they can be used inparquet-compute-variantrescale_decimalinparquet-compute-variantto use the newmake_upscalerandmake_downscalerutilities.One challenge is incorporating the large-scale reduction path (aka the
delta_scalecannot fit intoI::MAX_PRECISION) intomake_downscalerwithout hurting performance. Returning 0 directly is usually cheaper than applying a unary operation to return zero. Therefore,make_downscalermay return None, and it is the caller’s responsibility to handle this case appropriately based on the documented behavior.Are these changes tested?
Covered by existing tests
Are there any user-facing changes?
No