-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-3882: [Rust] Cast Kernel for most types #3797
Conversation
@andygrove I created one cast fn to rule them all, but I need some help in DataFusion. When I replace the cast macros with this cast function, I get lifetime issues per below: error[E0621]: explicit lifetime required in the type of `expr`
--> datafusion\src\execution\expression.rs:340:24
|
299 | expr: &Expr,
| ----- help: add explicit lifetime `'static` to the type of `expr`: `&'static logicalplan::Expr`
...
340 | f: Rc::new(|batch: &RecordBatch| {
| ________________________^
341 | | // match compute::cast(batch.column(index), data_type) {
342 | | // Ok(array) => Ok(array),
343 | | // Err(e) => Err(e.into())
... |
346 | | .map_err(|e| ExecutionError::ArrowError(e))
347 | | }),
| |______________________^ lifetime `'static` required Adding a @paddyhoran @sunchao do you have any ideas on how we could overcome this? And does the implementation of cast itself look sound enough for me to continue with this? |
// Ok(array) => Ok(array), | ||
// Err(e) => Err(e.into()) | ||
// } | ||
compute::cast(batch.column(index).clone(), data_type.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove this is where I'm getting the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can help with this. I'll take a look now and see if it is easy to fix or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go. This works:
&Expr::Column(index) => {
let col = input_schema.field(index);
let dt = data_type.clone();
Ok(RuntimeExpr::Compiled {
name: col.name().clone(),
t: col.data_type().clone(),
f: Rc::new(move |batch: &RecordBatch| {
compute::cast(batch.column(index), &dt)
.map_err(|e| ExecutionError::ArrowError(e))
}),
The issue was that the closure being built had a reference to data_type from the expression. The fix was to clone it before creating the closure, and use the move
keyword to move ownership of the variables into the closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I would have never figured. Code's compiling, so I should be able to complete this PR by the weekend
(List(_), _) => Err(ArrowError::ComputeError( | ||
"Cannot cast list to non-list data types".to_string(), | ||
)), | ||
(_, List(_)) => unimplemented!("Casting scalars to lists not yet supported"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be easy to change these unimplemented!
calls into Err
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the places where I'm not already returning an error are placeholders. I'll replace the unimplemented
with cast functions
@andygrove @paddyhoran @liurenjie1024 this is ready for review |
use crate::error::{ArrowError, Result}; | ||
|
||
/// Macro rule to cast between numeric types | ||
macro_rules! cast_numeric_arrays { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these have to be macro? can we use generic function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had this problem in a lot of areas across the project. I don't know how I'd convert DateTime::Int32
to Int32Type
. Do you have an idea/example of how I could be able to make the cast generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just use:
fn cast_numeric_arrays<FROM, TO>(src: &ArrayRef) -> Result<ArrayRef>
where
FROM: ArrowNumericType,
TO: ArrowNumericType,
FROM::Native: num::NumCast,
TO::Native: num::NumCast,
{
match numeric_cast::<FROM, TO>(
src.as_any().downcast_ref::<PrimitiveArray<FROM>>().unwrap(),
) {
Ok(to) => Ok(Arc::new(to) as ArrayRef),
Err(e) => Err(e),
}
}
This is very similar to the numeric_cast
you have below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chao, I've removed all macros and replaced with the above
} | ||
|
||
/// Cast array to provided data type | ||
pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this to the top and add more comments on its behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't put it at the top as it depends on the macro definitions. I've expanded the doc comment to include behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved this by removing macros
(UInt64, Float32) => cast_numeric_arrays!(array, UInt64Type, Float32Type), | ||
(UInt64, Float64) => cast_numeric_arrays!(array, UInt64Type, Float64Type), | ||
|
||
(Int8, UInt8) => cast_numeric_arrays!(array, Int8Type, UInt8Type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these casts where the physical storage format is the same, can we just change the data type associated with ArrayData
such as how it is done in cpp .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be for going from UInt8
to Int{8|16|32|64}
and not the inverse like Int{x}
-> UInt{x}
? I've just done a quick test:
fn test_associated_data() {
let a = Int32Array::from(vec![-5, 6]);
let b: PrimitiveArray<UInt32Type> = PrimitiveArray::from(a.data());
let c: PrimitiveArray<Int32Type> = PrimitiveArray::from(b.data());
// I expect -5 to be invalid for UInt8, not sure how I'd change the bitmask
assert_eq!(false, b.is_valid(0));
assert_eq!(6, b.value(1));
// this is also not the case as I still get -5
assert_eq!(false, c.is_null(0));
assert_eq!(6, c.value(1));
}
I expect a negative Int cast to UInt to have is_valid(x) == false
, but because I haven't changed the bitmask, that's not the case.
I'll have a look at how I could do that if I need to cast from signed to unsigned. I also wouldn't mind getting help with those (and other more efficient) casts in follow-up JIRAs.
@@ -0,0 +1,543 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a kernels
directory under compute
and rename this to cast.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is to be consistent with cpp? In that case we should move all the existing kernels too (boolean, comparison).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think we should do that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @nevi-me
This is an implementation of a cast kernel for most Arrow types.
Limitations (when PR is complete):
StructArray
not supportedListArray
to non-list array not supported