-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define eq_dyn_scalar API #1074
Define eq_dyn_scalar API #1074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good @matthewmturner
One challenge is going to be making code that is maintainable and not just swaths of copy/paste (but different type code). It will be a fun exercise
builder.append_null().unwrap(); | ||
builder.append(223).unwrap(); | ||
let array = builder.finish(); | ||
let a_eq = eq_dyn_scalar(&array, 123).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good to me 👍
right | ||
)) | ||
})?; | ||
eq_scalar::<UInt8Type>(as_primitive_array::<UInt8Type>(left), right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
cc @shepmaster , @jorgecarleitao and @alippai |
@alamb yes agree. That's what I'm going to work on next, the first commit was basically just what you said (copy/paste) with minimal cleaning to show the high level api. Now going to try making macros that can be used for all dyn_scalar kernels. Just to confirm, we're still expecting type annotation on the scalar right? I.e 123u8 |
I am hoping that we can avoid those annotations, to be honest |
@alamb i think i might be getting close. I used your Not sure why im hitting a recursion limit error though - will need to do some more work on that. Do you think this is going in the right direction? |
@@ -898,6 +898,126 @@ pub fn gt_eq_utf8_scalar<OffsetSize: StringOffsetSizeTrait>( | |||
compare_op_scalar!(left, right, |a, b| a >= b) | |||
} | |||
|
|||
macro_rules! dyn_cmp_scalar { | |||
($LEFT: expr, $RIGHT: expr, $T: ident, $OP: ident, $TT: tt) => {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should $T
and $TT
be ty
?
type_name::<$T>(), | ||
)) | ||
})?; | ||
$OP::<$TT>(left, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$OP::<$TT>
could probably be fused as one expr
macro argument
} | ||
|
||
/// Perform `left == right` operation on an array and a numeric scalar | ||
/// value. Supports PrimtiveArrays, and DictionaryArrays that have primitive values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// value. Supports PrimtiveArrays, and DictionaryArrays that have primitive values | |
/// value. Supports PrimitiveArrays, and DictionaryArrays that have primitive values |
where | ||
K: ArrowNumericType, | ||
{ | ||
assert_eq!(dict_comparison.len(), left.values().len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why an assertion as opposed to an error or a "no this does not match?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an invariant (namely that left
is the dictionary and dict_comparison
is the result of comparing those values).
Perhaps we could rename the left
parameter to dict
to make this clearer
|
||
macro_rules! dyn_compare_scalar { | ||
($LEFT: expr, $RIGHT: expr, $OP: ident) => {{ | ||
let right = $RIGHT.try_into().map_err(|_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing where this right
value is used...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just hadn't yet updated where $RIGHT is used in all the match arms to use right instead
where | ||
T: IntoArrowNumericType + TryInto<i128> + Copy + std::fmt::Debug, | ||
{ | ||
dyn_compare_scalar!(left, right, eq_scalar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit this is a drive-by review, but I'm not seeing the benefit of the macros here yet. They don't do any repetition reduction. It looks like dyn_compare_scalar
could be inlined and dyn_cmp_scalar
could be a regular function.
Codecov Report
@@ Coverage Diff @@
## master #1074 +/- ##
==========================================
- Coverage 82.34% 82.31% -0.03%
==========================================
Files 168 168
Lines 49479 49577 +98
==========================================
+ Hits 40743 40810 +67
- Misses 8736 8767 +31
Continue to review full report at Codecov.
|
Ive made a number of updates to simplify this, such as de-macroing a level, and have finally made our test pass. Of course there would need to be considerable cleanup to this but I think that this could demonstrate an approach to making this work. To be fair, ive thought ive been close about 5 times now and have been wrong each time so definitely looking to get more insight from others. @alamb as always your feedback greatly appreciated. @shepmaster thank you for the review before. ive cleaned up based on some of your feedback with the goal of trying to demonstrate in simpler terms (i.e. less macros) what were trying to do. im still quite new to rust so im not sure how to manage inlining |
also as a note, i added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really close @matthewmturner -- 🙏 -- thank you. I do wonder about having both IntoNumericType
and Into<i128>
I feel like like they are redundant somehow
where | ||
K: ArrowNumericType, | ||
{ | ||
assert_eq!(dict_comparison.len(), left.values().len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an invariant (namely that left
is the dictionary and dict_comparison
is the result of comparing those values).
Perhaps we could rename the left
parameter to dict
to make this clearer
$LEFT.as_any().downcast_ref::<Int8Array>().ok_or_else(|| { | ||
ArrowError::CastError(String::from("Left array cannot be cast")) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use https://docs.rs/arrow/6.4.0/arrow/array/fn.as_primitive_array.html to simplify this
So something like
$LEFT.as_any().downcast_ref::<Int8Array>().ok_or_else(|| { | |
ArrowError::CastError(String::from("Left array cannot be cast")) | |
})?; | |
as_primitive_array::<Int8Array>($LEFT)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these functions require an Arc<dyn Array>
/ArrayRef
which would mean we need the end user to do that before passing the array to the kernel or do you think its okay to add the Arc
in the kernel?
let left = left | ||
.as_any() | ||
.downcast_ref::<DictionaryArray<UInt8Type>>() | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to use https://docs.rs/arrow/6.4.0/arrow/array/fn.as_dictionary_array.html here too:
let left = left | |
.as_any() | |
.downcast_ref::<DictionaryArray<UInt8Type>>() | |
.unwrap(); | |
let left = as_dictionary_array<DictionaryArray<UInt8Type>>()?; |
/// value. Supports PrimitiveArrays, and DictionaryArrays that have primitive values | ||
pub fn eq_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray> | ||
where | ||
T: IntoArrowNumericType + TryInto<i128> + Copy + std::fmt::Debug, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If T
is going to have TryInto<i128>
, I wonder if we still IntoArrowNumericType
at all? I think it may not be necessary any more.
My thinking is that since dyn_compare_scalar
converts right
into i128
immediately there are then conversion rules back to all of the primitive types needed to call eq_scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agree, I was starting to think the same. Will try it out.
@alamb to confirm, are you expecting this kernel to handle boolean and utf8 comparisons as well? Or is that only available with the future scalar API? |
@alamb to confirm, we arent handling floats with this kernel right? My understanding is our current ive also done some cleaning up - can you let me know if you think this is ok? In particular:
|
Ive added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @matthewmturner -- I think this is looking great ❤️
DataType::Dictionary(key_type, _) => { | ||
return dyn_compare_utf8_scalar!(&left, right, key_type, eq_utf8_scalar); | ||
} | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably wants to match on the DataType::Utf8
and DataType::LargeUtf8
but otherwise looks good to me
fn test_eq_dyn_scalar() { | ||
let array = Int32Array::from(vec![6, 7, 8, 8, 10]); | ||
let array = Arc::new(array); | ||
let a_eq = eq_dyn_scalar(array, 8).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn test_eq_dyn_utf8_scalar() { | ||
let array = StringArray::from(vec!["abc", "def", "xyz"]); | ||
let array = Arc::new(array); | ||
let a_eq = eq_dyn_utf8_scalar(array, "xyz").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
How shall we proceed -- get this one polished up (maybe also add |
Sure sounds good. Should I just add those on this PR? |
actually - are I get the following when trying to make one:
i also dont see any tests for |
@alamb if you're okay with the latest changes and can merge then ill prioritize doing another PR with the other comparison functions. thank you for all the guidance youve provided on this! |
DataType::Int8 => { | ||
let right: i8 = right.try_into().map_err(|_| { | ||
ArrowError::ComputeError(String::from( | ||
"Can not convert scalar to i128", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can not convert scalar to i128
-> Can not convert scalar to i8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It affects the messages below as well
_ => Err(ArrowError::ComputeError( | ||
"Kernel only supports PrimitiveArray or DictionaryArray with Primitive values".to_string(), | ||
)) | ||
} | ||
} | ||
|
||
/// Perform `left == right` operation on an array and a numeric scalar | ||
/// value. Supports StringArrays, and DictionaryArrays that have string values | ||
pub fn eq_dyn_utf8_scalar(left: Arc<dyn Array>, right: &str) -> Result<BooleanArray> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it
eq_bool_scalar(left, right) | ||
} | ||
_ => Err(ArrowError::ComputeError( | ||
"Kernel only supports BooleanArray".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- users can call cast
if they want to convert their array into boolean first
I recommend doing the code in multiple PRs to keep the reviews smaller.
I think they would be valid per the arrow spec, but I don't think they would be very useful -- A dictionary array of bools will be (much) less efficient both in term of space and CPU than a BooleanArray. This is probably why there is no array builder for them https://docs.rs/arrow/6.4.0/arrow/array/struct.DictionaryArray.html?search=dictionarybuilder I think it is fine to dictionary of bools unimplemented and people can implement them if they want |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks great to me. I had a few small suggestions, but otherwise all good 👍
DataType::Int8 => { | ||
let right: i8 = right.try_into().map_err(|_| { | ||
ArrowError::ComputeError(String::from( | ||
"Can not convert scalar to i128", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It affects the messages below as well
))), | ||
} | ||
}}; | ||
($LEFT: expr, $RIGHT: expr, $KT: ident, $OP: ident) => {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
($LEFT: expr, $RIGHT: expr, $KT: ident, $OP: ident) => {{ | |
/// Applies `LEFT OP RIGHT` when `LEFT` is a `DictionaryArray` with keys of type `KT` | |
($LEFT: expr, $RIGHT: expr, $KT: ident, $OP: ident) => {{ |
@@ -898,6 +900,305 @@ pub fn gt_eq_utf8_scalar<OffsetSize: StringOffsetSizeTrait>( | |||
compare_op_scalar!(left, right, |a, b| a >= b) | |||
} | |||
|
|||
macro_rules! dyn_compare_scalar { | |||
($LEFT: expr, $RIGHT: expr, $OP: ident) => {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
($LEFT: expr, $RIGHT: expr, $OP: ident) => {{ | |
/// Applies `LEFT OP RIGHT` when `LEFT` is a `DictionaryArray` with keys of type `KT` | |
($LEFT: expr, $RIGHT: expr, $OP: ident) => {{ |
@alamb updates made! |
seems im getting hit with that |
I think it has been resolved if you merge up from master again |
having issues with rebasing. for whatever reason when i rebase to upstream master im losing all my recent commits. still looking into it. |
25e3d7d
to
9d03e72
Compare
after wandering through git hell for a bit i think i figured it out. hopefully CI passes and were good here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @matthewmturner -- shall I merge this as is, or do you want to apply the suggestions from @liukun4515 first (fix error messages)?
Then, it seems like the next step might be to file tickets (I can do so if you wish) for the other kernels (lt_dyn_scalar
, gt_dyn_scalar
, etc)
@alamb odd, i had made updates for those suggestions. must have been missed with my rebase. ive made the fixes. i can make an issue for it - and im hoping to work on implementing it this weekend. do you have a preference for how small (i.e how many kernels) youd like each pr / issue to be? |
Thanks @matthewmturner ❤️
My personal bias is one PR per kernel (as that makes it easiest to review) but I am also happy to review a single PR too |
I think one pr per kernel is better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@alamb @liukun4515 one PR per kernel it is :) |
Thanks again @matthewmturner and @liukun4515 -- this is great. |
Which issue does this PR close?
Working on this in relation to #984 and #1068 with the end goal being to finalize how we want
eq_dyn_scalar
to work.Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?