-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support SQL filter clause for aggregate expressions, add SQL dialect support #5868
Conversation
I plan to review this PR tomorrow. Thank you @yjshen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @andygrove @Dandandan and @jdye64 |
Thank you @alamb for the detailed review! I have made the following updates to the PR based on your feedback:
This PR is now ready for further review. Thank you again, @alamb! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me @yjshen -- thank you. I will upstream the parsing for dialect names now.
The only other thing I really want to do prior to merging this PR is to verify it doesn't change performance. I don't expect that it will but I want to double check to be sure
// 1.2 | ||
let batch = match filter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
cc @tustvold @mustafasrepo @crepererum and @Dandandan who I think are all interested in grouping performance |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
I benched |
I ran this branch against main using https://github.com/alamb/datafusion-benchmarking and I see no performance difference 👍
|
#6616 -- PR to use upstreamed version of parse_sql_dialect |
Which issue does this PR close?
Closes #5873.
Closes #5608
Closes #2214
Rationale for this change
This pull request introduces support for the FILTER (WHERE) clause in aggregate expressions. This feature enables users to filter the rows that are considered for aggregation, similar to how it is done in popular SQL databases such as PostgreSQL, SQLite, Spark, and Hive.
What changes are included in this PR?
physical_plan/aggregate
module is where the majority of the work for this project was completed.Are these changes tested?
New tests were added in
group_by.rs
to cover various scenarios using the FILTER (WHERE) clause with different situations.Are there any user-facing changes?
Yes.