-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029
Conversation
Still has not test failures that needs to be addressed.
@@ -37,7 +37,7 @@ async fn csv_query_array_agg_distinct() -> Result<()> { | |||
Schema::new(vec![Field::new_list( | |||
"ARRAY_AGG(DISTINCT aggregate_test_100.c2)", | |||
Field::new("item", DataType::UInt32, true), | |||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have access to the nullability of input arguments when defining UDAFs? I did not see how to maintain the behavior here.
|| name_lower_case == "array_agg" | ||
|| name_lower_case == "nth_value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert to lower name in follow up to keep the PRs managable size.
@@ -46,6 +46,7 @@ arrow-array = { workspace = true } | |||
arrow-schema = { workspace = true } | |||
datafusion-common = { workspace = true, default-features = true } | |||
datafusion-expr = { workspace = true } | |||
datafusion-functions-aggregate= { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not desired. Needed because of this function:
The comment claims to that it should be possible to move to ArrayAgg::simplify, will need to look more into that API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how it would be possible to do with the current simplify API.
@@ -4974,7 +4974,7 @@ logical_plan | |||
02)--Aggregate: groupBy=[[multiple_ordered_table.a, multiple_ordered_table.b]], aggr=[[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c DESC NULLS FIRST]]] | |||
03)----TableScan: multiple_ordered_table projection=[a, b, c] | |||
physical_plan | |||
01)AggregateExec: mode=Single, gby=[a@0 as a, b@1 as b], aggr=[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c DESC NULLS FIRST]], ordering_mode=Sorted | |||
01)AggregateExec: mode=Single, gby=[a@0 as a, b@1 as b], aggr=[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c ASC NULLS LAST]], ordering_mode=Sorted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so 🤔
) | ||
match agg_func.func_def { | ||
AggregateFunctionDefinition::UDF(ref udf) => { | ||
udf.inner().as_any().downcast_ref::<ArrayAgg>().is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can compare with name
for UDAF
Will be replaced by smaller PRs like #11045 |
Still has some test failures that needs to be addressed.
Which issue does this PR close?
Closes #10999.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?