-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: implement QUALIFY clause #16933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @haohuaijin, I accidentally worked on the same feature. The approach is the same. So, I don't want anyhow conflict with your contribution and will just wait until your PR is merged. I just want to highlight 2 cases that probably are not currently covered in your PR.
CREATE TABLE qt (i INT, p VARCHAR, o INT) AS VALUES
(1, 'A', 1),
(2, 'A', 2),
(3, 'B', 1),
(4, 'B', 2);
SELECT p, SUM(o) AS s
FROM qt
GROUP BY p
QUALIFY RANK() OVER (ORDER BY s DESC) = 1
ORDER BY p;
CREATE TABLE web_base_events_this_run (
domain_sessionid VARCHAR,
app_id VARCHAR,
page_view_id VARCHAR,
derived_tstamp TIMESTAMP,
dvce_created_tstamp TIMESTAMP,
event_id VARCHAR
) AS SELECT * FROM VALUES
('ds1', 'appA', NULL, '2025-01-01 10:00:00'::timestamp, '2025-01-01 10:05:00'::timestamp, 'e1'),
('ds1', 'appA', NULL, '2025-01-01 11:00:00'::timestamp, '2025-01-01 11:00:00'::timestamp, 'e2'),
('ds1', 'appA', 'pv', '2025-01-01 12:00:00'::timestamp, '2025-01-01 12:00:00'::timestamp, 'e3'),
('ds2', 'appB', NULL, '2025-01-01 09:00:00'::timestamp, '2025-01-01 09:10:00'::timestamp, 'e4'),
('ds2', 'appB', NULL, '2025-01-01 09:05:00'::timestamp, '2025-01-01 09:09:00'::timestamp, 'e5');
SELECT domain_sessionid, app_id
FROM web_base_events_this_run
WHERE page_view_id IS NULL
QUALIFY ROW_NUMBER() OVER (
PARTITION BY domain_sessionid
ORDER BY derived_tstamp, dvce_created_tstamp, event_id
) = 1
ORDER BY domain_sessionid; I covered the first one by adding let aggr_expr_haystack = select_exprs
.iter()
.chain(having_expr_opt.iter())
.chain(qualify_expr_opt_pre_aggr.iter()); The second one required me to change the logic in |
Hi @Vedin , thanks for bringing up these two use cases for @alamb @jayzhan211 , could you please take a look at this PR? If everything looks good, perhaps we can merge this as the initial version of qualify, and then @Vedin can follow up with the two additional cases in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @haohuaijin -- this looks good to me
@jonahgao as our resident SQL planner expert, perhaps you would like to review this PR as well?
I think we should also add documentation and an example about the QUALIFY clause in the documentation, similar to the HAVING clause: https://datafusion.apache.org/user-guide/sql/select.html#having-clause
However, we could do that as a separate PR if you prefer
let err = logical_plan(sql).unwrap_err(); | ||
assert_eq!( | ||
err.strip_backtrace(), | ||
"Error during planning: QUALIFY clause requires window functions in the SELECT list or QUALIFY clause" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this is consistent with the DuckDB behavior:
D SELECT person.id FROM person QUALIFY person.id > 1;
Binder Error:
at least one window function must appear in the SELECT column or QUALIFY clause
I think this sounds like a good plan. Once we merge this PR we can then file some tickets to track the other features. Thanks @haohuaijin and @Vedin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
``` | ||
|
||
## QUALIFY clause | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could provide a little context here (like explaining that the QUALIFY statement can refer to the output of window functions)
However, I think this is consistent with the sparse docs for HAVING
Maybe we can make a follow on PR to improve the documentation with a sentence or two for each clause
|
Thanks again @haohuaijin @Vedin and @jonahgao |
Which issue does this PR close?
Closes #15485
Rationale for this change
see #15485, support
qualify
can filter windows function's result without use subquerybefore
after
What changes are included in this PR?
support the
qualify
clause like what we do forhaving
clauseAre these changes tested?
yes, add integration test and sqllogictest test
Are there any user-facing changes?
It is easier to filter the windows function's result