Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Partitioned aggregation mode #6892

Closed
Dandandan opened this issue Jul 8, 2023 · 8 comments
Closed

Introduce Partitioned aggregation mode #6892

Dandandan opened this issue Jul 8, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@Dandandan
Copy link
Contributor

Is your feature request related to a problem or challenge?

Currently, we often use two modes in aggregations:Partial + FinalPartitioned. FinalPartitioned requires the input to be hash-repartioned, so a RepartionExec is added in between.

In certain cases, like when only aggregating on a single column, it is faster to skip the Partial aggregation and directly perform the aggregation on hash-partitioned input, as doing the Partial + RepartionExec + FinalPartitioned will be more work than doing the aggregation in one step (RepartionExec + Partitioned).

Reasoning: RepartionExec (hash) is itself faster than Partial (although it doesn't reduce the output) and is necessary for FinalPartitioned.

Describe the solution you'd like

  1. Introduce Partitioned aggregation mode that requires input to be partitioned on the groupby-keys.
  2. Utilize it in certain cases, like aggregations utilizing only one/few columns (consider queries such asSELECT COUNT(DISTINCT a) FROM T)

Describe alternatives you've considered

No response

Additional context

No response

@Dandandan Dandandan added enhancement New feature or request performance Make DataFusion faster labels Jul 8, 2023
@alamb
Copy link
Contributor

alamb commented Jul 9, 2023

cc @mingmwang who I think was considering something similar recently

@Dandandan Dandandan self-assigned this Jul 12, 2023
@Dandandan
Copy link
Contributor Author

Working on this one

@Dandandan
Copy link
Contributor Author

Probably #6937 makes more sense to do than doing this statically, closing this issue

@Dandandan Dandandan closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023
@mingmwang
Copy link
Contributor

@Dandandan @alamb
I guess the purpose of this ticket is try to introduce some method to skip the partial aggregation if the partial aggregation can not reduce the row count(for high cardinality aggregation), correct ?
In Snowflake and Teradata, they call this adaptive partial aggregation.

I'm not sure for TPC-H queries whether this will help or not, but for some TPC-DS queries, this is quite useful.

@alamb
Copy link
Contributor

alamb commented Jul 13, 2023

I'm not sure for TPC-H queries whether this will help or not, but for some TPC-DS queries, this is quite useful.

I think at least q17 has a high cardinality grouping that it might help for (in the subquery), for what it is worth

@Dandandan
Copy link
Contributor Author

Dandandan commented Jul 13, 2023

Yes that is the idea

  • I found out the Single aggregation mode which already does what we want to do (do aggregation in one go), so there is no need to create a new aggregation mode. I try to clean up / improve the mode a bit in Fix required partitioning of Single aggregation mode #6950 so we could potentially use it in more scenario's.

  • I did some experiments skipping the Partial aggregation based on heuristic (e.g. for tables up to a number of columns), but this gets mixed results:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ fast_gby_hash ┃ aggregate_partition_mode ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │      194.12ms │                 194.38ms │     no change │
│ QQuery 2     │       36.59ms │                  30.54ms │ +1.20x faster │
│ QQuery 3     │       44.99ms │                  45.69ms │     no change │
│ QQuery 4     │       37.34ms │                  37.88ms │     no change │
│ QQuery 5     │       91.69ms │                  90.49ms │     no change │
│ QQuery 6     │       10.27ms │                  10.25ms │     no change │
│ QQuery 7     │      193.05ms │                 190.58ms │     no change │
│ QQuery 8     │       69.37ms │                  69.39ms │     no change │
│ QQuery 9     │      132.29ms │                 132.95ms │     no change │
│ QQuery 10    │       91.51ms │                  90.86ms │     no change │
│ QQuery 11    │       40.53ms │                  39.73ms │     no change │
│ QQuery 12    │       67.70ms │                  66.71ms │     no change │
│ QQuery 13    │      130.96ms │                 132.62ms │     no change │
│ QQuery 14    │       11.87ms │                  12.10ms │     no change │
│ QQuery 15    │       14.80ms │                  19.92ms │  1.35x slower │
│ QQuery 16    │       37.79ms │                  37.07ms │     no change │
│ QQuery 17    │      210.67ms │                 209.54ms │     no change │
│ QQuery 18    │      315.60ms │                 381.18ms │  1.21x slower │
│ QQuery 19    │       57.40ms │                  57.59ms │     no change │
│ QQuery 20    │       70.88ms │                  58.71ms │ +1.21x faster │
│ QQuery 21    │      248.35ms │                 252.92ms │     no change │
│ QQuery 22    │       28.11ms │                  27.87ms │     no change │
└──────────────┴───────────────┴──────────────────────────┴───────────────┘

My hope is better for #6937 which I think might be similar to the "adaptive partial aggregation" of snowflake / teradata?

@Dandandan
Copy link
Contributor Author

Based on the experiment I think the only way to do this reliable enough during planning is to have cardinality statistics on the group by columns, so the cardinality of the aggregation can be estimated.

@alamb
Copy link
Contributor

alamb commented Jul 14, 2023

Based on the experiment I think the only way to do this reliable enough during planning is to have cardinality statistics on the group by columns, so the cardinality of the aggregation can be estimated.

I think it is for this reason that most people argue for doing the adaptation at runtime (not plan time) because the statistics always have edge cases (like data skew) that can result in poor plans without any degree of predictability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

3 participants