-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Coalesce
casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion
#10268
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
datafusion/expr/src/signature.rs
Outdated
@@ -92,14 +92,19 @@ pub enum TypeSignature { | |||
/// A function such as `concat` is `Variadic(vec![DataType::Utf8, DataType::LargeUtf8])` | |||
Variadic(Vec<DataType>), | |||
/// One or more arguments of an arbitrary but equal type. | |||
/// DataFusion attempts to coerce all argument types to match the first argument's type | |||
/// DataFusion attempts to coerce all argument types to match to the common type with comparision coercion. | |||
/// | |||
/// # Examples | |||
/// Given types in signature should be coercible to the same final type. | |||
/// A function such as `make_array` is `VariadicEqual`. | |||
/// | |||
/// `make_array(i32, i64) -> make_array(i64, i64)` | |||
VariadicEqual, |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
// Note that not all rules in `comparison_coercion` can be reused here. | ||
// For example, all numeric types can be coerced into Utf8 for comparison, | ||
// but not for function arguments. | ||
_ => comparison_binary_numeric_coercion(type_into, type_from).and_then( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is introduced in #9459, so I think it is safe to remove together with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @viirya
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
coalesce
, introduce signature VariadicEqualOrNull for coalesce
.Coalesce
casting logic that follows Postgres and DuckDB. Introduce signature that do non-comparison coercion
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
In general I am concerned about the potential downstream effects of this change. I don't fully understand them What I would ideally like to do is to run the influxdb_iox regression tests with this change and see what happens. I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔 |
Basically I worry this is just changing behavior rather than fixing a bug and will result in churn for no benefit downstream. I may be mis understanding the change and rationale however |
I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual function. This way, it would be clear which coercion rule applies to each function or signature. Having a single function handle multiple coercion rules makes the code harder to reason about and could cause downstream issues, as you have pointed out. I've also filed an issue to track this at @10507. Would it be better to start by removing the rule from coerce_from and moving it to the specific function or signature? |
@alamb I'm happy to coordinate with our performance team and run an influxdb_iox regression tests if needed |
I don't think this needs a performance test -- I was thinking just the main test suite |
Hi @jayzhan211 -- I am sorry I haven't had a chance to devote more time to this and review. I will try and find some time in the next few days |
I was thinking performance regression 😅 Got it! I can put up an Influxdata test against this branch, sometime this afternoon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update here is that @appletreeisyellow did run the internal influx db tests against this branch and we didn't find any problems. Not that it is required that we don't have to change our code, but I have found that our test suite has a bit broader coverage and thus the fact it doesn't have problems suggests to me this PR is less likely to have knock on effects
I had another look at this PR this morning and Other than the changes in select.slt I think it looks great to me
Thanks again @jayzhan211
@@ -1473,7 +1473,7 @@ DROP TABLE t; | |||
|
|||
# related to https://github.com/apache/datafusion/issues/8814 | |||
statement ok | |||
create table t(x int, y int) as values (1,1), (2,2), (3,3), (0,0), (4,0); | |||
create table t(x bigint, y bigint) as values (1,1), (2,2), (3,3), (0,0), (4,0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep the existing test and add a new test that uses bigint
so it is clearer what behavior is changing?
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@@ -1778,7 +1778,7 @@ AS VALUES | |||
('BB', 6, 1); | |||
|
|||
query TII | |||
select col1, col2, coalesce(sum_col3, 0) as sum_col3 | |||
select col1, col2, coalesce(sum_col3, arrow_cast(0, 'UInt64')) as sum_col3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also we can change this test back too
Looks good, and existing tests are passing 👍 🚀 |
Thanks @alamb and @appletreeisyellow |
…Introduce signature that do non-comparison coercion (apache#10268) * remove casting for coalesce Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * crate only visibility Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * polish comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * improve test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * backup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * introduce new signautre for coalesce Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * ignore err msg Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix doc Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * switch to type_resolution coercion Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix i64 and u64 case Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more tests Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add null case Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rename to type_union_resolution Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup since rebase Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix msg Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm pure_string_coercion Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm duplicate Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change type in select.slt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix slt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Thank you for sticking with it @jayzhan211 -- so much is going on!@ |
…Introduce signature that do non-comparison coercion (apache#10268) * remove casting for coalesce Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * crate only visibility Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * polish comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * improve test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * backup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * introduce new signautre for coalesce Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * ignore err msg Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix doc Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * switch to type_resolution coercion Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix i64 and u64 case Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more tests Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add null case Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rename to type_union_resolution Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup since rebase Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add more test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix msg Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm pure_string_coercion Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm duplicate Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change type in select.slt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix slt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Which issue does this PR close?
Closes #10261
Closes #10241
Part of #10507
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?