-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce Dictionary types for scalar functions #10077
Conversation
_ => comparison_binary_numeric_coercion(type_into, type_from).and_then( | ||
|coerced_type| { | ||
_ => comparison_binary_numeric_coercion(type_into, type_from) | ||
.or_else(|| dictionary_coercion(type_into, type_from, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check the inner data type with coerced_from
instead of comparison_coercion
🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite tell the difference. What would be the benefit?
I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure I understand the comment too. What you mean "check inner data type with coerced_from
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"check inner data type with coerced_from"
Similar to the current implementation of Dict
https://github.com/apache/arrow-datafusion/blob/671cef85c550969ab2c86d644968a048cb181c0c/datafusion/expr/src/type_coercion/functions.rs#L316-L321
The above once checks if the inner type in Dict is coercible by the coerced_from
function.
But dictionary_coercion
checks the inner type of Dict with comparison_coercion
.
The coerced_from
and comparison_coercion
are slightly different.
comparison_coercion
cares about the scenario in comparison, so loss is allowed. For example, i64 and u64, we return i64, while we get None in coerced_from
for casting u64 to i64.
I had tried to find one coercion for all but ended up with the conclusion that we keep these two coercion functions. #8302.
I suggest we don't mix the logic for coerce_from
and comparison_coercion
. It would be nice to avoid using comparison_binary_numeric_coercion
in coerced_from
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I don't notice coerced_from
is updated to coerce dictionary type. I was working on a branch without update yet. The current update with dictionary coercion looks good now to fix the issue I encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I notice that the current implementation has a small issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya and @jayzhan211 -- this looks like an improvement to me.
I don't fully understand the comment https://github.com/apache/arrow-datafusion/pull/10077/files#r1564548305 but it seems like something we could refine in a follow on PR as well
@@ -635,4 +635,19 @@ mod tests { | |||
.unwrap(); | |||
assert_eq!(return_type, DataType::Date32); | |||
} | |||
|
|||
#[test] | |||
fn test_coalesce_return_types_dictionary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 we saw something similar in #9925
_ => comparison_binary_numeric_coercion(type_into, type_from).and_then( | ||
|coerced_type| { | ||
_ => comparison_binary_numeric_coercion(type_into, type_from) | ||
.or_else(|| dictionary_coercion(type_into, type_from, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite tell the difference. What would be the benefit?
I always get a little confused with the type coercion logic -- that there are different rules for certain operations I think.
531bd1a
to
4a3a6fd
Compare
@@ -314,8 +314,13 @@ fn coerced_from<'a>( | |||
// match Dictionary first | |||
match (type_into, type_from) { | |||
// coerced dictionary first | |||
(cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type) | |||
if coerced_from(cur_type, value_type).is_some() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When coercing into dictionary type, the type_into
and type_from
parameters are in incorrect order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an excellent find. Thanks @jayzhan211 for pointing that out
@@ -314,8 +314,13 @@ fn coerced_from<'a>( | |||
// match Dictionary first | |||
match (type_into, type_from) { | |||
// coerced dictionary first | |||
(cur_type, Dictionary(_, value_type)) | (Dictionary(_, value_type), cur_type) | |||
if coerced_from(cur_type, value_type).is_some() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an excellent find. Thanks @jayzhan211 for pointing that out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya
Thank you @alamb @jayzhan211 |
Thank you @andygrove |
* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test
* Coerce Dictionary types for scalar functions * Fix * Fix format * Add test
Which issue does this PR close?
Closes #10076.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?