Skip to content

Conversation

haohuaijin
Copy link
Contributor

Which issue does this PR close?

Closes #15485

Rationale for this change

see #15485, support qualify can filter windows function's result without use subquery
before

with ranked as (
	select row_number() over (PARTITION BY region) as rk from t
) select * from ranked where rk > 1;

after

select row_number() over (PARTITION BY region) as rk from t qualify rk > 1;

What changes are included in this PR?

support the qualify clause like what we do for having clause

Are these changes tested?

yes, add integration test and sqllogictest test

Are there any user-facing changes?

It is easier to filter the windows function's result

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Jul 27, 2025
@Vedin
Copy link

Vedin commented Aug 13, 2025

Hi @haohuaijin, I accidentally worked on the same feature. The approach is the same. So, I don't want anyhow conflict with your contribution and will just wait until your PR is merged. I just want to highlight 2 cases that probably are not currently covered in your PR.

  1. Aggregate functions with QUALIFY
CREATE TABLE qt (i INT, p VARCHAR, o INT) AS VALUES
  (1, 'A', 1),
  (2, 'A', 2),
  (3, 'B', 1),
  (4, 'B', 2);

SELECT p, SUM(o) AS s
FROM qt
GROUP BY p
QUALIFY RANK() OVER (ORDER BY s DESC) = 1
ORDER BY p;
  1. Constant filter + QUALIFY
CREATE TABLE web_base_events_this_run (
  domain_sessionid VARCHAR,
  app_id VARCHAR,
  page_view_id VARCHAR,
  derived_tstamp TIMESTAMP,
  dvce_created_tstamp TIMESTAMP,
  event_id VARCHAR
) AS SELECT * FROM VALUES
  ('ds1', 'appA', NULL, '2025-01-01 10:00:00'::timestamp, '2025-01-01 10:05:00'::timestamp, 'e1'),
  ('ds1', 'appA', NULL, '2025-01-01 11:00:00'::timestamp, '2025-01-01 11:00:00'::timestamp, 'e2'),
  ('ds1', 'appA', 'pv', '2025-01-01 12:00:00'::timestamp, '2025-01-01 12:00:00'::timestamp, 'e3'),
  ('ds2', 'appB', NULL, '2025-01-01 09:00:00'::timestamp, '2025-01-01 09:10:00'::timestamp, 'e4'),
  ('ds2', 'appB', NULL, '2025-01-01 09:05:00'::timestamp, '2025-01-01 09:09:00'::timestamp, 'e5');
  
  SELECT domain_sessionid, app_id
FROM web_base_events_this_run
WHERE page_view_id IS NULL
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY domain_sessionid
  ORDER BY derived_tstamp, dvce_created_tstamp, event_id
) = 1
ORDER BY domain_sessionid;

I covered the first one by adding qualify expressions to aggr_expr_haystack:

        let aggr_expr_haystack = select_exprs
            .iter()
            .chain(having_expr_opt.iter())
            .chain(qualify_expr_opt_pre_aggr.iter());

The second one required me to change the logic in common_subexpr_eliminate.rs. You can check it out here (Embucket#34). Maybe you'll come up with a better solution.
Hope this helps.

@haohuaijin
Copy link
Contributor Author

haohuaijin commented Aug 14, 2025

Hi @Vedin , thanks for bringing up these two use cases for qualify. I wasn’t aware of them before. However, I’ve been quite busy lately and won’t be able to include them in this PR.

@alamb @jayzhan211 , could you please take a look at this PR? If everything looks good, perhaps we can merge this as the initial version of qualify, and then @Vedin can follow up with the two additional cases in a separate PR.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- this looks good to me

@jonahgao as our resident SQL planner expert, perhaps you would like to review this PR as well?

I think we should also add documentation and an example about the QUALIFY clause in the documentation, similar to the HAVING clause: https://datafusion.apache.org/user-guide/sql/select.html#having-clause

However, we could do that as a separate PR if you prefer

let err = logical_plan(sql).unwrap_err();
assert_eq!(
err.strip_backtrace(),
"Error during planning: QUALIFY clause requires window functions in the SELECT list or QUALIFY clause"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this is consistent with the DuckDB behavior:

D SELECT person.id FROM person QUALIFY person.id > 1;
Binder Error:
at least one window function must appear in the SELECT column or QUALIFY clause

@alamb
Copy link
Contributor

alamb commented Aug 14, 2025

@alamb @jayzhan211 , could you please take a look at this PR? If everything looks good, perhaps we can merge this as the initial version of qualify, and then @Vedin can follow up with the two additional cases in a separate PR.

I think this sounds like a good plan. Once we merge this PR we can then file some tickets to track the other features.

Thanks @haohuaijin and @Vedin

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me👍

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 15, 2025
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Aug 15, 2025

Thanks @alamb @jonahgao for reviews, i add the document for QUALIFY clause like HAVING clause.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

```

## QUALIFY clause

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could provide a little context here (like explaining that the QUALIFY statement can refer to the output of window functions)

However, I think this is consistent with the sparse docs for HAVING

Maybe we can make a follow on PR to improve the documentation with a sentence or two for each clause

@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

Hi @Vedin , thanks for bringing up these two use cases for qualify. I wasn’t aware of them before. However, I’ve been quite busy lately and won’t be able to include them in this PR.

@alamb @jayzhan211 , could you please take a look at this PR? If everything looks good, perhaps we can merge this as the initial version of qualify, and then @Vedin can follow up with the two additional cases in a separate PR.

@alamb alamb merged commit f3941b2 into apache:main Aug 15, 2025
28 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

Thanks again @haohuaijin @Vedin and @jonahgao

@haohuaijin haohuaijin deleted the support-qualify branch September 29, 2025 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QUALIFY clause
4 participants