-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dictionary like scalar kernels #2591
Conversation
A side effect of this PR resulted in some nice performance improvements as well. Some othese range form 2-3% to about 30/50% for some use cases. On my OCI 4 core arm machine these are the improvements I am getting Click me
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review properly tomorrow, but I do wonder if we could use ArrayAccessor instead of macros for this?
@@ -233,6 +233,91 @@ pub fn like_utf8<OffsetSize: OffsetSizeTrait>( | |||
}) | |||
} | |||
|
|||
macro_rules! like_scalar { | |||
($LEFT: expr, $RIGHT: expr) => {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be possibly rewritten as a function using ArrayAccessor.
I actually tried that. There were a lot of if else paths in the kernels. It would mean I would have to pick one of the below options
Let me know your thoughts here. |
Can you not just take the contents of the macro and make it into a free generic function on a concrete |
Okay , let me try that. |
nilike_scalar(left, right) | ||
} | ||
|
||
/// Perform SQL `left ILIKE right` operation on [`DictionaryArray`] with values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Perform SQL `left ILIKE right` operation on [`DictionaryArray`] with values | |
/// Perform SQL `left NOT ILIKE right` operation on [`DictionaryArray`] with values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thank you.
I wonder if we should add dyn versions of these kernels as a follow up? 🤔
bit_util::set_bit(bool_slice, i); | ||
unsafe { | ||
if left.value_unchecked(i).ends_with(ends_with) { | ||
bit_util::set_bit(bool_slice, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something for this PR but using MutableBuffer::from_trusted_len_iter may be significantly faster as it performs byte-size writes instead bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it just now , I didnt find it giving that significant performance gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I guess the string comparisons are substantially more expensive than for primitives
match left.value_type() { | ||
DataType::Utf8 => { | ||
let left = left.downcast_dict::<GenericStringArray<i32>>().unwrap(); | ||
like_scalar(left, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually looking at the implementation of the other dictionary comparison kernels and they opt to instead evaluate the predicate against the dictionary, and then call unpack_dict_comparison
to translate this to the values as a whole. Might be something to explore, I could see it being very beneficial for DictionaryArray with lots of repeated values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, Shall I make this change in the same PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets do it as a follow up
Benchmark runs are scheduled for baseline = 63afe25 and contender = 9abc5f5. 9abc5f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Partially implements #1975.
Rationale for this change
Enhancement to add like kernels for dictionary.
What changes are included in this PR?
Like kernels for dictionary and string
Are there any user-facing changes?
No