-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove builtin count #10893
Remove builtin count #10893
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- this look great. The only concern I have is the new dependency from optimizer --> functions. Otherwise 🚀 🦾
@@ -33,8 +33,6 @@ use strum_macros::EnumIter; | |||
// https://datafusion.apache.org/contributor-guide/index.html#how-to-add-a-new-aggregate-function | |||
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Hash, EnumIter)] | |||
pub enum AggregateFunction { | |||
/// Count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There are still a bunch of these functions -- at some point we can probably file tickets and spread out the work / fun to migrate them over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I constantly file 1~2 issues for #8708 so the tickets could kept on the first page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps that is how PRs like #10898 show up -- I can't keep up anymore. Thanks @jayzhan211 -- really nice
@@ -56,5 +56,6 @@ regex-syntax = "0.8.0" | |||
[dev-dependencies] | |||
arrow-buffer = { workspace = true } | |||
ctor = { workspace = true } | |||
datafusion-functions-aggregate = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 -- I thought we were trying to avoid making the optimizer depend on the function library -- can we use the sub implementation instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was fine for dev-dependencies?
If not, when can we import functions for dev-dependencies, when we should not 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 -- I can't remember anymore.
It is probably cleaner to to avoid the dependencies, but we can clean it up as a follow on PR
)); | ||
let expr = count_udaf() | ||
.call(vec![col("a")]) | ||
.distinct() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
I took the liberty of merging up from main for this PR to get a clean CI run |
Thanks @alamb |
* rm expr fn Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm function Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix query and fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix example Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Update datafusion/expr/src/test/function_stub.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Builtin Count was removed upstream. TBD whether we want to re-implement `count_star` with new API. Ref: apache/datafusion#10893
Builtin Count was removed upstream. TBD whether we want to re-implement `count_star` with new API. Ref: apache/datafusion#10893
Builtin Count was removed upstream. TBD whether we want to re-implement `count_star` with new API. Ref: apache/datafusion#10893
* chore: update datafusion deps * feat: impl ExecutionPlan::static_name() for DatasetExec This required trait method was added upstream [0] and recommends to simply forward to `static_name`. [0]: apache/datafusion#10266 * feat: update first_value and last_value wrappers. Upstream signatures were changed for the new new `AggregateBuilder` api [0]. This simply gets the code to work. We should better incorporate that API into `datafusion-python`. [0] apache/datafusion#10560 * migrate count to UDAF Builtin Count was removed upstream. TBD whether we want to re-implement `count_star` with new API. Ref: apache/datafusion#10893 * migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF Ref: approx_distinct apache/datafusion#10851 Ref: approx_median apache/datafusion#10840 Ref: approx_percentile_cont and _with_weight apache/datafusion#10917 * migrate avg to UDAF Ref: apache/datafusion#10964 * migrage corr to UDAF Ref: apache/datafusion#10884 * migrate grouping to UDAF Ref: apache/datafusion#10906 * add alias `mean` for UDAF `avg` * migrate stddev to UDAF Ref: apache/datafusion#10827 * remove rust alias for stddev The python wrapper now provides stddev_samp alias. * migrage var_pop to UDAF Ref: apache/datafusion#10836 * migrate regr_* functions to UDAF Ref: apache/datafusion#10898 * migrate bitwise functions to UDAF The functions now take a single expression instead of a Vec<_>. Ref: apache/datafusion#10930 * add missing variants for ScalarValue with todo * fix typo in approx_percentile_cont * add distinct arg to count * comment out failing test `approx_percentile_cont` is now returning a DoubleArray instead of an IntArray. This may be a bug upstream; it requires further investigation. * update tests to expect lowercase `sum` in query plans This was changed upstream. Ref: apache/datafusion#10831 * update ScalarType data_type map * add docs dependency pickleshare * re-implement count_star * lint: ruff python lint * lint: rust cargo fmt * include name of window function in error for find_window_fn * refactor `find_window_fn` for debug clarity * search default aggregate functions by both name and aliases The alias list no longer includes the name of the function. Ref: apache/datafusion#10658 * fix markdown in find_window_fn docs * parameterize test_window_functions `first_value` and `last_value` are currently failing and marked as xfail. * add test ids to test_simple_select tests marked xfail * update find_window_fn to search built-ins first The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior. This allowed me to remove `marks=pytest.xfail` from the window tests. * improve first_call and last_call use of the builder API * remove trailing todos * fix examples/substrait.py * chore: remove explicit aliases from functions.rs Ref: #779 * remove `array_fn!` aliases * remove alias rules for `expr_fn_vec!` * remove alias rules from `expr_fn!` macro * remove unnecessary pyo3 var-arg signatures in functions.rs * remove pyo3 signatures that provided defaults for first_value and last_value * parametrize test_string_functions * test regr_ function wrappers Closes #778
Which issue does this PR close?
Part of #8708
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?