-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support arithmetic scalar operation with DictionaryArray #5151
Conversation
2a33923
to
8b7fcc7
Compare
Thanks @viirya -- it is great to see someone else using Dictionary arrays ❤️ Looks like there are some compilation errors on this PR. Marking as draft until those are resolved.
|
Thanks @alamb . Yea, I'm trying to fix them locally. |
e8669d4
to
699d7c2
Compare
3a0d313
to
67cd4a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- it would be great to eventually be able to write sql level tests for this feature, but there is no good way to make dictionary arrays yet from SQL (but I am hoping to get #5016 which would allow it).
Thank you for the contribution @viirya
cc @liukun4515 as this also impacts Decimals (though I don't expect any behavior change)
(DataType::Dictionary(_, value_type), _) => { | ||
is_numeric(value_type) && is_numeric(rhs_type) | ||
} | ||
(_, DataType::Dictionary(_, value_type)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this need to handle the case when both lhs and rhs are Dictionaries 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm wondering what Dictionary for ScalaValue
(rhs here) means although I have seen DataFusion has it.
It sounds reasonable to add the following:
(DataType::Dictionary(_, lhs_value_type), DataType::Dictionary(_, rhs_value_type2)) => {
is_numeric(lhs_value_type) && is_numeric(rhs_value_type2)
}
|
||
let mut dict_builder = PrimitiveDictionaryBuilder::<Int8Type, Int32Type>::new(); | ||
|
||
dict_builder.append(2)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
Operator::Modulo => { | ||
// todo: change to binary_primitive_array_op_dyn_scalar! once modulo is implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo change: apache/arrow-rs#3649
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove binary_primitive_array_op_scalar
once this is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, once modulo is ready to do.
Thank you @alamb. Yea, we have internal e2e tests that use dictionary arrays but seems I don't find similar stuff on DataFusion so far. We can definitely expand test coverage for this in the future. |
…e coercion pattern
BTW, I think array op array is also not supported yet on dictionary arrays. I will work on it after this. |
macro_rules! binary_primitive_array_op_dyn_scalar { | ||
($LEFT:expr, $RIGHT:expr, $OP:ident) => {{ | ||
// unwrap underlying (non dictionary) value | ||
let right = unwrap_dict_value($RIGHT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that we have "scalar dictionary":
/// Dictionary type: index type and value
Dictionary(Box<DataType>, Box<ScalarValue>),
wonder how useful this is ...
} | ||
Operator::Modulo => { | ||
// todo: change to binary_primitive_array_op_dyn_scalar! once modulo is implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove binary_primitive_array_op_scalar
once this is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Benchmark runs are scheduled for baseline = 224c682 and contender = c759865. c759865 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #5150.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?