-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix comparison of dictionary arrays #1606
Conversation
assert_batches_eq!(expected, &actual); | ||
|
||
// comparison with another dictionary column | ||
let sql = "SELECT d1 FROM test WHERE d1 = d2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test fails without the change
98e574d
to
c4db8f3
Compare
@@ -77,7 +77,7 @@ pub(crate) fn coerce_types( | |||
} | |||
|
|||
fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { | |||
if lhs_type == rhs_type { | |||
if lhs_type == rhs_type && !is_dictionary(lhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the arrow eq
kernels don't support DictionaryArray
<==> DictionaryArray
comparisons yet, so we need to handle this case specially. I will also file a ticket to add native DictionaryArray comparison support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -147,6 +147,10 @@ pub fn temporal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Dat | |||
} | |||
} | |||
|
|||
pub(crate) fn is_dictionary(t: &DataType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplication of code between this module and binary_rule.rs
is not good -- I am going to try and consolidate it as a follow on PR cc @liukun4515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1607 <-- doesn't remove duplication but it at least consolidates the logic into a single module
@@ -77,7 +77,7 @@ pub(crate) fn coerce_types( | |||
} | |||
|
|||
fn comparison_eq_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { | |||
if lhs_type == rhs_type { | |||
if lhs_type == rhs_type && !is_dictionary(lhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add TODO
or some comments for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do so -- it is a good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in e27110f
@@ -90,7 +90,7 @@ fn comparison_order_coercion( | |||
lhs_type: &DataType, | |||
rhs_type: &DataType, | |||
) -> Option<DataType> { | |||
if lhs_type == rhs_type { | |||
if lhs_type == rhs_type && !is_dictionary(lhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's same with above.
@@ -147,6 +147,10 @@ pub fn temporal_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Dat | |||
} | |||
} | |||
|
|||
pub(crate) fn is_dictionary(t: &DataType) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another point I want to point out is that the arrow-rs has some codes to check data type, and the logic is also in datafusion, for example, https://docs.rs/arrow/7.0.0/arrow/datatypes/enum.DataType.html#method.is_numeric, Maybe we should refactor them and make the codebase clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed as #1613
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @alamb
Which issue does this PR close?
Closes #1605
See ticket for more details