-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move date_part, date_trunc, date_bin functions to datafusion-functions #9435
Conversation
# Conflicts: # datafusion/proto/src/logical_plan/from_proto.rs
I have no idea why the tpch test would be failing. When I run tpch locally it's q1 that fails first. Any suggestions are welcome |
…lain text to reflect the change. The physical plan is unchanged.
# Conflicts: # datafusion/proto/src/logical_plan/from_proto.rs
blocked by #9450 |
# Conflicts: # datafusion/physical-expr/src/equivalence/projection.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this @Omega359 -- it is looking very good. I am a little worried about some of the test changes, but I have some ideas on how to fix them.
Sorry for the delay in reviewing
@@ -0,0 +1,765 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so nice that the date_bin
function has been moved into its own module
@@ -761,10 +735,7 @@ mod tests { | |||
// expected | |||
vec![ | |||
// [a_new ASC, date_bin_res ASC] | |||
// Please note that result is not [a_new ASC, date_bin_res ASC, b_new ASC] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this is important test coverage (it ensures that date_bin
is properly being treated as monotonic). Indeed perhaps that montonic
marking was not brought over to the the UDF 🤔
Though on the other hand, I think this case is already covered by .slt test (a plan test, specifically). @mustafasrepo do you remember offhand? Otherwise I will dig around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous version still have the test in there .. but then I had dependencies on the function crate. I saw some things with date_bin and ordering in ordering.slt that I thought might cover it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked whether this functionality is covered by the .slt
tests. It seems that, .slt
tests covers cases in the form where date_bin
argument (such as ts
) is leading ordering (e.g first ordering in lexicographical ordering expression). Then after date_bin
is applied, date_bin
result itself is ordered. In this sense, monotonicity
behaviour is covered.
However, this unit test, tests when table is ordered as [a ASC, ts ASC]
,after projection a, date_bin(ts)
result should have ordering: [a, date_bin(ts)]
. Meaning, when date_bin
argument (e.g ts
) is not leading ordering, lexicographical ordering is still preserved. As long as previous expressions are present in the projection (in the example a
).
I checked whether this behaviour is preserved in this PR. By running following test
# create an unbounded table that contains name, timestamp.
# where table is ordered by name DESC, ts DESC
statement ok
CREATE UNBOUNDED EXTERNAL TABLE unbounded_csv_with_timestamps2 (
name VARCHAR,
ts TIMESTAMP
)
STORED AS CSV
WITH ORDER (name DESC, ts DESC)
LOCATION '../core/tests/data/timestamps.csv'
query TT
EXPLAIN SELECT name, date_bin('15 minutes', ts) as time_chunks
FROM unbounded_csv_with_timestamps2
ORDER BY name DESC, time_chunks DESC
LIMIT 5;
----
logical_plan
Limit: skip=0, fetch=5
--Sort: unbounded_csv_with_timestamps2.name DESC NULLS FIRST, time_chunks DESC NULLS FIRST, fetch=5
----Projection: unbounded_csv_with_timestamps2.name, date_bin(IntervalMonthDayNano("900000000000"), unbounded_csv_with_timestamps2.ts) AS time_chunks
------TableScan: unbounded_csv_with_timestamps2 projection=[name, ts]
physical_plan
GlobalLimitExec: skip=0, fetch=5
--SortPreservingMergeExec: [name@0 DESC,time_chunks@1 DESC], fetch=5
----ProjectionExec: expr=[name@0 as name, date_bin(900000000000, ts@1) as time_chunks]
------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
--------StreamingTableExec: partition_sizes=1, projection=[name, ts], infinite_source=true, output_ordering=[name@0 DESC, ts@1 DESC]
I think, resulting plan is correct. (e.g after projection ordering [name DESC, time_chunks DESC]
is still satisfied). So, we have desired behaviour but not test coverage currently.
However, it would be nice to have unit test. Given that in case of failure, error context is much more visible. It would really speed up debugging. With end to end test coverage, considerable time goes to understanding the root cause of the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this test has been added by @Omega359 so I think we are good to go. Thank you @mustafasrepo
# Conflicts: # datafusion/physical-expr/src/equivalence/projection.rs
I think, plan change comes from |
I have dubugged this further. It seemed like a small fix. Hence I sent a commit to fix this change. With this change, I also added the test in my previous comment to make sure we have desired coverage. |
Thank you very much for your assistance in resolving this issue in this PR - it's very much appreciated |
@@ -419,6 +419,11 @@ impl ScalarFunction { | |||
args, | |||
} | |||
} | |||
|
|||
/// Create a new ScalarFunction expression with a user-defined function (UDF) | |||
pub fn new_func_def(func_def: ScalarFunctionDefinition, args: Vec<Expr>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to inline the arguments directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do that, sure.
Thanks @Omega359 -- I plan to review this one in the next day or two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sticking with this one @Omega359 -- I think this PR looks great.
@@ -761,10 +735,7 @@ mod tests { | |||
// expected | |||
vec![ | |||
// [a_new ASC, date_bin_res ASC] | |||
// Please note that result is not [a_new ASC, date_bin_res ASC, b_new ASC] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this test has been added by @Omega359 so I think we are good to go. Thank you @mustafasrepo
I took the liberty of merging up from main to resolve conflicts (and then reapplied the changes from #9486 manually here) |
And another merge to resolve conflicts |
Thanks again @Omega359 and @mustafasrepo -- sorry this one took so long to get in. |
Which issue does this PR close?
Closes #9421
Rationale for this change
Function migration.
What changes are included in this PR?
Code, tests moved to new location, references updated.
Are these changes tested?
Yes.
Are there any user-facing changes?
Functions are moved to new location, otherwise none.