-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement hash partitioned aggregation #320
Conversation
Codecov Report
@@ Coverage Diff @@
## master apache/arrow-datafusion#320 +/- ##
==========================================
- Coverage 75.72% 75.71% -0.01%
==========================================
Files 143 143
Lines 23832 23881 +49
==========================================
+ Hits 18046 18081 +35
- Misses 5786 5800 +14
Continue to review full report at Codecov.
|
input_schema, | ||
)?)) | ||
// TODO: dictionary type not yet supported in Hash Repartition | ||
let contains_dict = groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will create an issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really cool @Dandandan. I suggest you add a test, if possible, that shows the plan with the repartition exec operation in order to prevent someone accidentally turning off this optimization during a refactor
input_schema, | ||
)?)) | ||
// TODO: dictionary type not yet supported in Hash Repartition | ||
let contains_dict = groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
… into agg_partition
Good idea! Added a test for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is looking great. Thanks @Dandandan
@@ -202,6 +209,9 @@ impl ExecutionPlan for HashAggregateExec { | |||
fn required_child_distribution(&self) -> Distribution { | |||
match &self.mode { | |||
AggregateMode::Partial => Distribution::UnspecifiedDistribution, | |||
AggregateMode::FinalPartitioned => Distribution::HashPartitioned( | |||
self.group_expr.iter().map(|x| x.0.clone()).collect(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this and it looks great. Thanks a lot @Dandandan , great work. 💯
from my side.
We may want to give some time in case someone else would like to go through this before merging.
❤️
@andygrove maybe? :) |
I will make time this weekend to review this and take it for a spin! |
Awesome, thanks @andygrove ! I have also some nice followup this weekend for more performance improvements for hash aggregates :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested this out locally and confirmed that performance is much better for unpartitioned data, and about the same for partitioned data.
@Dandandan Looks like there is a conflict that needs fixing |
I filed apache/datafusion-ballista#23 for implementing this optimization in Ballista |
Somehow coverage run seems to fail... But doesn't show what is failing |
The https://github.com/apache/arrow-datafusion/pull/320/checks?check_run_id=2591763548 shows this buried in the logs (not at the end, annoyingly):
|
thanks @alamb will merge it now when it's green |
Thanks all 🎉 |
Which issue does this PR close?
Closes #27
Rationale for this change
A more scalable hash aggregate that works well for group by expressions that have high cardinality int he output and allows to scale better with the number of cpu cores.
The algorithm changes the steps from:
partial hash aggregate -> merge (to 1 partition) -> full hash aggregate
to
partial hash aggregate -> repartition on group by expressions -> hash aggregate (on partitions)
This is the same as what Spark is doing.
This mostly has an effect on group by with higher cardinality, but no substantial effect on lower cardinality (as partial result is already small).
For Ballista this would also be required I think @andygrove - currently every partition is merged into one which can be problematic (and slow).
For example, TPC-H query 3 benefits quite a bit:
Master:
PR:
The db-benchmark group by queries improve by quite a bit:
Master:
PR
What changes are included in this PR?
Are there any user-facing changes?