-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cast from/to interval and duration #4020
Conversation
9082dff
to
6d58d1c
Compare
arrow-cast/src/cast.rs
Outdated
array: &dyn Array, | ||
cast_options: &CastOptions, | ||
) -> Result<ArrayRef, ArrowError> { | ||
let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we propagate the error?
arrow-cast/src/cast.rs
Outdated
for i in 0..array.len() { | ||
if array.is_null(i) { | ||
builder.append_null(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For array with no nulls, I think we can have a faster path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrimitiveArray::from_trusted_len_iter
may fit here.
arrow-cast/src/cast.rs
Outdated
builder.append_null(); | ||
} else { | ||
let v = array.value(i) as i128; | ||
let v = v.mul_checked(scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scale 1 case, maybe we can skip this call?
arrow-cast/src/cast.rs
Outdated
return Err(ArrowError::ComputeError(format!( | ||
"Cannot cast to {:?}. Overflowing on {:?}", | ||
D::DATA_TYPE, | ||
e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e
is ArrowError
, so you will get nested ArrowError
.
LGTM, thanks for the effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks as well for this PR @Weijun-H -- it is getting there. I am not sure the overflow handling is handled correctly. I left some comments and suggested tests that I think will help get this PR ready to go
arrow-cast/src/cast.rs
Outdated
if cast_options.safe { | ||
let iter = array | ||
.iter() | ||
.map(|v| v.and_then(|v| ((v as i128).checked_mul(scale)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this code correctly converts v
into an IntervalMonthDayNanoType
-- the IntervalMonthDayNanoType
is represented as two fields but this is treating it as a single native type)
I think we should construct IntervalMonthDayNano with this method:
https://docs.rs/arrow/37.0.0/arrow/datatypes/struct.IntervalMonthDayNanoType.html#method.make_value
So something like
let months = 0;
let days = 0;
let nanos: i64 = v.checked_mul(scale)?;
Ok(IntervalMonthDayNanoType::make_value(months, days, nanos))
#[test] | ||
fn test_cast_from_duration_to_interval() { | ||
// from duration second to interval month day nano | ||
let array = vec![1234567]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also please test a value if i64::MAX
here and ensure an error is returned?
assert_eq!(casted_array.value(0), 0); | ||
|
||
// from interval month day nano to duration millisecond | ||
let array = vec![1234567]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably refactor this repetition into a function and avoid so much boiler plate in the tests
arrow-cast/src/cast.rs
Outdated
&CastOptions { safe: false }, | ||
) | ||
.unwrap(); | ||
assert_eq!(casted_array.value(0), 9223372036854775807000000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I convert this value back into fields:
let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(9223372036854775807000000);
println!("months: {months}, days: {days}, nanos: {nanos}");
it prints out:
months: 0, days: 499999, nanos: -1000000
Which doesn't seem right for 9223372036854775807
milliseconds
I think what is happening can be explained here (the nanoseconds fields have overflowed into the days)
https://docs.rs/arrow-array/37.0.0/src/arrow_array/types.rs.html#435-445
/*
https://github.com/apache/arrow/blob/02c8598d264c839a5b5cf3109bfd406f3b8a6ba5/cpp/src/arrow/type.h#L1475
struct MonthDayNanos {
int32_t months;
int32_t days;
int64_t nanoseconds;
}
128 112 96 80 64 48 32 16 0
+-------+-------+-------+-------+-------+-------+-------+-------+
| months | days | nanos |
+-------+-------+-------+-------+-------+-------+-------+-------+
*/
I think what should happen is
- (preferred): the cast should return an error if the nanosecond field would have overflowed
- (secondarily): if the nanosecond field overflows the overflow should be converted into days
I think erroring (at least at first) is preferred because it avoids questions like "how many seconds are there in a day" which is not actually a fixed quantity given leap seconds, etc.
If someone has need of casting such durations to intervals, we can then sort out the mechanics
Thanks @Weijun-H -- I think this is getting close. From my perspecitive all that is left is handling nanosecond overflow correctly and this will be good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bearing with my @Weijun-H -- sorry for the back and forth
if cast_options.safe { | ||
let iter = array | ||
.iter() | ||
.map(|v| v.and_then(|v| v.checked_mul(scale).map(|v| v as i128))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check the overflow in the safe case as well -- and and set the result to Null / NONE when overflow happened
); | ||
assert_eq!(casted_array.value(0), 1234567000); | ||
|
||
let array = vec![i64::MAX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also please add a test here (and the cases below) showing what happens when an overflow happens with DEFAULT_CAST_OPTIONS? I would expect the result to be Null
arrow-cast/src/cast.rs
Outdated
&DEFAULT_CAST_OPTIONS, | ||
) | ||
.unwrap(); | ||
assert_eq!(casted_array.value(0), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be that the casted_array.valid(0) is false, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is ready to go. I don't totally understand why https://github.com/apache/arrow-rs/pull/4020/files#r1167579187 is not needed, but the test coverage
let array = vec![i64::MAX];
let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
array.clone(),
&DEFAULT_CAST_OPTIONS,
)
.unwrap();
assert!(!casted_array.is_valid(0));
covers the case I was thinking of so 👍
Thank you @Weijun-H
Because v is |
Which issue does this PR close?
Closes #3998
Rationale for this change
Explain #3998
What changes are included in this PR?
Are there any user-facing changes?