-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ordering sensitive aggregation #6332
Add support for ordering sensitive aggregation #6332
Conversation
# Conflicts: # datafusion/core/tests/sqllogictests/test_files/aggregate.slt
# Conflicts: # datafusion/core/src/physical_plan/planner.rs # datafusion/core/tests/sqllogictests/test_files/aggregate.slt # datafusion/expr/src/expr.rs # datafusion/expr/src/tree_node/expr.rs # datafusion/expr/src/udaf.rs # datafusion/optimizer/src/analyzer/type_coercion.rs # datafusion/optimizer/src/common_subexpr_eliminate.rs # datafusion/proto/src/logical_plan/from_proto.rs # datafusion/proto/src/logical_plan/mod.rs # datafusion/proto/src/logical_plan/to_proto.rs # datafusion/sql/src/expr/function.rs # datafusion/sql/src/utils.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mustafasrepo
I will try and review this PR over the next day or two.
I think we should document this feature as well in https://arrow.apache.org/datafusion/user-guide/sql/select.html#group-by-clause if possible
I also noticed that window functions are not actually documented yet, so I filed #6338 to track that
Thanks @alamb, I wasn't aware of the documentation part. I have updated the documentation at https://arrow.apache.org/datafusion/user-guide/sql/select.html#group-by-clause and https://arrow.apache.org/datafusion/user-guide/sql/aggregate_functions.html#array-agg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mustafasrepo -- this looks really nice
I especially like how ordering is expressed as an updated "required_input_ordering" and then the rest of the datafusion planning machinery handles generating and optimizing the actual sorting. Very nice 👏
TUR [75.0, 100.0] | ||
|
||
# test_ordering_sensitive_aggregation2 | ||
# We should be able to satisfy the finest requirement among all aggregators, when we have multiple aggregators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test verifies that an aggregate with no sort and a single expr works
I think it would be valuable to also test more than one column, something like:
ARRAY_AGG(s.amount ORDER BY s.amount DESC, ts ASC) AS amounts,
SUM(s.amount ORDER BY s.amount DESC) AS sum1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we cannot support these kind of queries (because we cannot parse multiple requirement, PR in the sqlparser fixes the problem. However, it is not merged yet, I am not sure when it will be available to datafusion.). However, once this support is added. This case should be fine. I have added a test for this use case, we currently generate an error, once this is added. We can change the test.
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
Thanks again @mustafasrepo |
Which issue does this PR close?
Closes #6189.
Rationale for this change
Currently datafusion has support for queries as below
However, when aggregation queries expect their input to have an ordering, queries fail. As an example see the query below.
Above query can be parsed successfully, however it cannot be executed. With this PR we add support for queries in the form above, where aggregate function requires an ordering for its input
What changes are included in this PR?
In this PR
create_physical_sort_expr
function.Are these changes tested?
Yes new tests are added to the
aggregate.slt
file.Are there any user-facing changes?