WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029

eejbyfeldt · 2024-06-20T13:25:52Z

Still has some test failures that needs to be addressed.

Which issue does this PR close?

Closes #10999.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Still has not test failures that needs to be addressed.

eejbyfeldt · 2024-06-20T13:39:07Z

datafusion/core/tests/sql/aggregates.rs

@@ -37,7 +37,7 @@ async fn csv_query_array_agg_distinct() -> Result<()> {
        Schema::new(vec![Field::new_list(
            "ARRAY_AGG(DISTINCT aggregate_test_100.c2)",
            Field::new("item", DataType::UInt32, true),
-            false


Do we have access to the nullability of input arguments when defining UDAFs? I did not see how to maintain the behavior here.

eejbyfeldt · 2024-06-20T13:40:36Z

datafusion/functions-aggregate/src/lib.rs

+                || name_lower_case == "array_agg"
+                || name_lower_case == "nth_value"


convert to lower name in follow up to keep the PRs managable size.

eejbyfeldt · 2024-06-20T13:45:02Z

datafusion/sql/Cargo.toml

@@ -46,6 +46,7 @@ arrow-array = { workspace = true }
 arrow-schema = { workspace = true }
 datafusion-common = { workspace = true, default-features = true }
 datafusion-expr = { workspace = true }
+datafusion-functions-aggregate= { workspace = true }


This is probably not desired. Needed because of this function:

https://github.com/eejbyfeldt/datafusion/blob/b42c70bac4af5ef17ce24ca1a2e95efa3f7cece9/datafusion/sql/src/expr/mod.rs#L596-L626

The comment claims to that it should be possible to move to ArrayAgg::simplify, will need to look more into that API.

I am not sure how it would be possible to do with the current simplify API.

eejbyfeldt · 2024-06-20T13:45:40Z

datafusion/sqllogictest/test_files/group_by.slt

@@ -4974,7 +4974,7 @@ logical_plan
 02)--Aggregate: groupBy=[[multiple_ordered_table.a, multiple_ordered_table.b]], aggr=[[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c DESC NULLS FIRST]]]
 03)----TableScan: multiple_ordered_table projection=[a, b, c]
 physical_plan
-01)AggregateExec: mode=Single, gby=[a@0 as a, b@1 as b], aggr=[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c DESC NULLS FIRST]], ordering_mode=Sorted
+01)AggregateExec: mode=Single, gby=[a@0 as a, b@1 as b], aggr=[ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c ASC NULLS LAST]], ordering_mode=Sorted


I think this change is expected?

I don't think so 🤔

jayzhan211 · 2024-06-20T14:12:40Z

datafusion/sql/src/expr/mod.rs

-                )
+            match agg_func.func_def {
+                AggregateFunctionDefinition::UDF(ref udf) => {
+                    udf.inner().as_any().downcast_ref::<ArrayAgg>().is_some()


we can compare with name for UDAF

eejbyfeldt · 2024-06-21T08:01:29Z

Will be replaced by smaller PRs like #11045

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jun 20, 2024

WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF

6fc9a71

Still has not test failures that needs to be addressed.

eejbyfeldt force-pushed the i10999 branch from 518b074 to 6fc9a71 Compare June 20, 2024 13:31

Remove commented out code

f50de77

eejbyfeldt mentioned this pull request Jun 20, 2024

Convert ArrayAgg to UDAF #10999

Closed

eejbyfeldt commented Jun 20, 2024

View reviewed changes

remove more commented out code

b42c70b

eejbyfeldt commented Jun 20, 2024

View reviewed changes

jayzhan211 reviewed Jun 20, 2024

View reviewed changes

eejbyfeldt closed this Jun 21, 2024

eejbyfeldt deleted the i10999 branch September 1, 2024 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029

WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029

eejbyfeldt commented Jun 20, 2024 •

edited

Loading

eejbyfeldt Jun 20, 2024

eejbyfeldt Jun 20, 2024

eejbyfeldt Jun 20, 2024

eejbyfeldt Jun 20, 2024

eejbyfeldt Jun 20, 2024

jayzhan211 Jun 20, 2024

jayzhan211 Jun 20, 2024

eejbyfeldt commented Jun 21, 2024

		\|\| name_lower_case == "array_agg"
		\|\| name_lower_case == "nth_value"

WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029

WIP: Convert ARRAY_AGG and NTH_VALUE to UDAF #11029

Conversation

eejbyfeldt commented Jun 20, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

eejbyfeldt Jun 20, 2024

Choose a reason for hiding this comment

eejbyfeldt Jun 20, 2024

Choose a reason for hiding this comment

eejbyfeldt Jun 20, 2024

Choose a reason for hiding this comment

eejbyfeldt Jun 20, 2024

Choose a reason for hiding this comment

eejbyfeldt Jun 20, 2024

Choose a reason for hiding this comment

jayzhan211 Jun 20, 2024

Choose a reason for hiding this comment

jayzhan211 Jun 20, 2024

Choose a reason for hiding this comment

eejbyfeldt commented Jun 21, 2024

eejbyfeldt commented Jun 20, 2024 •

edited

Loading