-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Underlying Grouping Sets #42631
Closed
Labels
sig/execution
SIG execution
sig/planner
SIG: Planner
type/feature-request
Categorizes issue or PR as related to a new feature.
Comments
AilinKid
added
type/feature-request
Categorizes issue or PR as related to a new feature.
sig/planner
SIG: Planner
sig/execution
SIG execution
labels
Mar 28, 2023
13 tasks
13 tasks
13 tasks
hawkingrei
pushed a commit
to hawkingrei/tidb
that referenced
this issue
Aug 1, 2024
hawkingrei
pushed a commit
to hawkingrei/tidb
that referenced
this issue
Aug 1, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
sig/execution
SIG execution
sig/planner
SIG: Planner
type/feature-request
Categorizes issue or PR as related to a new feature.
Feature Request
Is your feature request related to a problem? Please describe:
Grouping Sets is internal implementation mechanism for supporting Multi-Distinct-Aggregate MPP Optimization and Rollup/Cube syntax.
For modern databases like Spark SQL, it allows user to explicitly describe wanted grouping sets explicitly like this:
Different listed grouping sets/grouping layout requirement above will ask the underlying data to be expanded as multi copies to feed different requirement of Aggregation granularity. As a consequence, the leveled-aggregated result rows will be a union to user.
Apart from explicit requirement from sql syntax level, there is a another way to implicitly describe a composed grouping sets. That's what exactly rollup and cube syntax does. For more detail about a example like rollup(a,b,c), it has implicit N = 4 grouping sets derived from incremental expression composition, such as grouping sets (), (a),(a,b),(a,b,c), so does cube syntax which will be more complicated one.
For Multi Distinct Aggregate case like
distinct nature require a implement of aggregation on groups grouped by a or b here, while single one copy of data can't satisfied both grouping by a and grouping b synchronously. As a consequence, we resort to different grouping sets like (a) and (b) to ask the underlying data to be expanded to feed different aggregation vertically.
Both of the 3 cases above is dependent/based on the implementation of Grouping Sets and Expand Operator, so that's why this issue calls for.
Describe the feature you'd like:
Shown above.
Describe alternatives you've considered:
For Rollup Syntax workaround, rewrite the SQL as union of many sub-query with individual group by items.
For Multi Distinct Aggregate Optimization workaround, there is no way to migrate the computation task to multi mpp nodes.
Teachability, Documentation, Adoption, Migration Strategy:
related issues schedule
Underlying Grouping Sets and Expand Operator
Rollup Syntax
grouping
function forroll up
#42463 Grouping function TiDB side implementationInfra Support
Plan Operator
Bug Fix
The text was updated successfully, but these errors were encountered: