Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Jul 26, 2025

Which issue does this PR close?

Rationale for this change

support distinct for window

Details for window:

  1. Support sql to rel to support distinct
  2. Support logic plan generate distinct
  3. Support physic plan generate distinct
  4. Add slide window to support SlidingDistinctCountAccumulator
  5. Add test and fix existed tests

Follow-up, improve the performance for SlidingDistinctCountAccumulator, because we may can add each type specific for it also.

What changes are included in this PR?

support distinct for window

Details for window:

  1. Support sql to rel to support distinct
  2. Support logic plan generate distinct
  3. Support physic plan generate distinct
  4. Add slide window to support SlidingDistinctCountAccumulator
  5. Add test and fix existed tests

Follow-up, improve the performance for SlidingDistinctCountAccumulator, because we may can add each type specific for it also.

Are these changes tested?

Yes

Are there any user-facing changes?

New feature support now

@zhuqi-lucas zhuqi-lucas marked this pull request as draft July 26, 2025 14:16
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate functions Changes to functions implementation physical-plan Changes to the physical-plan crate labels Jul 26, 2025
@github-actions github-actions bot added the proto Related to proto crate label Jul 26, 2025
@zhuqi-lucas zhuqi-lucas marked this pull request as ready for review July 27, 2025 10:18
@zhuqi-lucas zhuqi-lucas changed the title Draft: support distinct for window feat: support distinct for window Jul 27, 2025
@alamb alamb requested a review from crepererum July 27, 2025 13:32
@alamb
Copy link
Contributor

alamb commented Jul 27, 2025

FYI @crepererum

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Jul 27, 2025

Follow-up, we need to support more sliding distinct Accumulator:

Currently, this PR only support sliding distinct Accumulator for count. But it should be easy because we already supported the whole steps from sql to physical plan with distinct in this PR. I can create follow-up PRs after this PR.

Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be way more involved than I originally thought. I was hoping that this was just a forgotten boolean flag, but it seems that there's actually some proper wiring coding involved 😅

So thank you very much for implementing this so quickly ❤️

@crepererum crepererum merged commit c6d5520 into apache:main Jul 28, 2025
28 checks passed
@zhuqi-lucas
Copy link
Contributor Author

Thank you @crepererum for review!

adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 29, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 29, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 29, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
window_frame,
aggregate,
)
if distinct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let aggregate = if distinct {..} else {..};
window_expr_from_aggregate_expr(
                    partition_by,
                    order_by,
                    window_frame,
                    aggregate,
)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Omega359 for good suggestion and review, i will address it in any follow-up PR for me.

Standing-Man pushed a commit to Standing-Man/datafusion that referenced this pull request Aug 4, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Aug 25, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Sep 5, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Sep 5, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
erratic-pattern pushed a commit to influxdata/arrow-datafusion that referenced this pull request Oct 6, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
fsdvh pushed a commit to coralogix/arrow-datafusion that referenced this pull request Oct 10, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
erratic-pattern pushed a commit to influxdata/arrow-datafusion that referenced this pull request Oct 21, 2025
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-plan Changes to the physical-plan crate proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants