-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix grouping set subset satisfaction #19853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix grouping set subset satisfaction #19853
Conversation
|
cc @gene-bordegaray do this changes look good to you? |
|
I can confirm that this change does fix the issue, I'll let @gene-bordegaray chime in as the one assigned to the original issue. Unless there are any relevant comments, I'll merge this in as soon as CI passes. |
|
Yes this looks correct, thanks @freakyzoidberg . Let me unassign myself and comment take on issue only thing may be wanting to add plans in the sqllogictests 😄 cc: @gabotechs |
| 04)------AggregateExec: mode=FinalPartitioned, gby=[channel@0 as channel, brand@1 as brand, __grouping_id@2 as __grouping_id], aggr=[sum(sub.total)] | ||
| 05)--------RepartitionExec: partitioning=Hash([channel@0, brand@1, __grouping_id@2], 4), input_partitions=4 | ||
| 06)----------AggregateExec: mode=Partial, gby=[(NULL as channel, NULL as brand), (channel@0 as channel, NULL as brand), (channel@0 as channel, brand@1 as brand)], aggr=[sum(sub.total)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I imagine before this PR the RepartitionExec would not be there right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good! thanks @freakyzoidberg and @gene-bordegaray
## Summary - Fixes incorrect results from ROLLUP/CUBE/GROUPING SETS queries when using multiple partitions - The subset satisfaction optimization was incorrectly allowing hash partitioning on fewer columns to satisfy requirements that include `__grouping_id` - This caused partial aggregates from different partitions to be finalized independently, producing duplicate grand totals Closes apache#19849
|
Love it -- thank you |
|
Thank you @freakyzoidberg and @gabotechs Do you we know what changes in 52 introduced this problem? |
|
You have the full conclusion here #19849 |
Brings #19853 into `branch-52` Co-authored-by: Pierre Lacave <pierre.lacave@datadoghq.com>
|
Thanks @gabotechs -- I'll ask some follow up questions there |
NGA-TRAN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Summary
__grouping_idCloses #19849