-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](nereids) add rewrite rule :EliminateGroupByKeyByUniform #43391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](nereids) add rewrite rule :EliminateGroupByKeyByUniform #43391
Conversation
|
run buildall |
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
| import java.util.Map; | ||
| import java.util.Set; | ||
|
|
||
| /**ProjectFilterTransform*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class comment should contain what the rule want to do and how
|
run buildall |
TPC-H: Total hot run time: 41389 ms |
|
run buildall |
TPC-H: Total hot run time: 41292 ms |
e668096 to
452bb5f
Compare
|
run buildall |
0c01e45 to
6333e6f
Compare
|
run buildall |
6f13f10 to
892b7ec
Compare
|
run buildall |
6b4a551 to
852065c
Compare
|
run buidlall |
b393ff3 to
0c470a6
Compare
|
run buildall |
73c21af to
c2bc25a
Compare
|
run buildall |
cdb345c to
c99e369
Compare
|
run buildall |
1d2ad46 to
d211163
Compare
|
run buildall |
ClickBench: Total hot run time: 32.14 s |
|
run p0 |
| sql "insert into test1 values(1,1),(2,1),(3,1);" | ||
| sql "create table test2(a int, b int) distributed by hash(a) properties('replication_num'='1');" | ||
| sql "insert into test2 values(1,105),(2,105);" | ||
| qt_full_join_uniform_should_not_eliminate_group_by_key "select t2.b,t1.b from test1 t1 full join (select * from test2 where b=105) t2 on t1.a=t2.a group by t2.b,t1.b order by 1,2;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the original code have bug for this case?
|
PR approved by at least one committer and no changes requested. |
…e#43391) This PR introduces two main changes: 1. Adds an optional constant value to the uniform attribute in DataTrait. A slot with a constant value that is not null will be considered uniform and not null. 2. Introduces a new transform rule: EliminateGroupByKeyByUniform, which utilizes the newly added part of the uniform attribute. Following is example transformation: from +--aggregate(group by a,b output a,b,max(c)) (a is uniform and not null: e.g. a is projection 2 as a in logicalProject) to +--aggregate(group by b output b,any_value(a) as a,max(c))
…niform when group sets exist (#56942) Fix not in aggregate's output err after eliminate by uniform when group sets exist if query as following, would cause `ERROR 1105 (HY000): errCode = 2, detailMessage = GROUPING_PREFIX_event_name_group not in aggregate's output` the pr fix this ```sql SELECT CASE WHEN GROUPING(event_date) = 1 THEN '(TOTAL)' ELSE CAST(event_date AS VARCHAR) END AS event_date, user_id, MAX(conversion_level) AS conversion_level, CASE WHEN GROUPING(event_name_group) = 1 THEN '(TOTAL)' ELSE event_name_group END AS event_name_group FROM ( SELECT src.event_date, src.user_id, WINDOW_FUNNEL( 3600 * 24 * 1, 'default', src.event_time, src.event_name = 'shop_buy', src.event_name = 'shop_buy' ) AS conversion_level, src.event_name_group FROM ( SELECT CAST(etb.`@dt` AS DATE) AS event_date, etb.`@event_name` AS event_name, etb.`@event_time` AS event_time, etb.`@event_name` AS event_name_group, etb.`@user_id` AS user_id FROM `test_event` AS etb WHERE etb.`@dt` between '2025-09-03 02:00:00' AND '2025-09-10 01:59:59' AND etb.`@event_name` = 'shop_buy' AND etb.`@user_id` IS NOT NULL AND etb.`@user_id` > '0' ) AS src GROUP BY src.event_date, src.user_id, src.event_name_group ) AS fwt GROUP BY GROUPING SETS ( (user_id), (user_id, event_date), (user_id, event_name_group), (user_id, event_date, event_name_group) ); ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: #43391 Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…niform when group sets exist (#56942) Fix not in aggregate's output err after eliminate by uniform when group sets exist if query as following, would cause `ERROR 1105 (HY000): errCode = 2, detailMessage = GROUPING_PREFIX_event_name_group not in aggregate's output` the pr fix this ```sql SELECT CASE WHEN GROUPING(event_date) = 1 THEN '(TOTAL)' ELSE CAST(event_date AS VARCHAR) END AS event_date, user_id, MAX(conversion_level) AS conversion_level, CASE WHEN GROUPING(event_name_group) = 1 THEN '(TOTAL)' ELSE event_name_group END AS event_name_group FROM ( SELECT src.event_date, src.user_id, WINDOW_FUNNEL( 3600 * 24 * 1, 'default', src.event_time, src.event_name = 'shop_buy', src.event_name = 'shop_buy' ) AS conversion_level, src.event_name_group FROM ( SELECT CAST(etb.`@dt` AS DATE) AS event_date, etb.`@event_name` AS event_name, etb.`@event_time` AS event_time, etb.`@event_name` AS event_name_group, etb.`@user_id` AS user_id FROM `test_event` AS etb WHERE etb.`@dt` between '2025-09-03 02:00:00' AND '2025-09-10 01:59:59' AND etb.`@event_name` = 'shop_buy' AND etb.`@user_id` IS NOT NULL AND etb.`@user_id` > '0' ) AS src GROUP BY src.event_date, src.user_id, src.event_name_group ) AS fwt GROUP BY GROUPING SETS ( (user_id), (user_id, event_date), (user_id, event_name_group), (user_id, event_date, event_name_group) ); ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: #43391 Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…niform when group sets exist (#56942) Fix not in aggregate's output err after eliminate by uniform when group sets exist if query as following, would cause `ERROR 1105 (HY000): errCode = 2, detailMessage = GROUPING_PREFIX_event_name_group not in aggregate's output` the pr fix this ```sql SELECT CASE WHEN GROUPING(event_date) = 1 THEN '(TOTAL)' ELSE CAST(event_date AS VARCHAR) END AS event_date, user_id, MAX(conversion_level) AS conversion_level, CASE WHEN GROUPING(event_name_group) = 1 THEN '(TOTAL)' ELSE event_name_group END AS event_name_group FROM ( SELECT src.event_date, src.user_id, WINDOW_FUNNEL( 3600 * 24 * 1, 'default', src.event_time, src.event_name = 'shop_buy', src.event_name = 'shop_buy' ) AS conversion_level, src.event_name_group FROM ( SELECT CAST(etb.`@dt` AS DATE) AS event_date, etb.`@event_name` AS event_name, etb.`@event_time` AS event_time, etb.`@event_name` AS event_name_group, etb.`@user_id` AS user_id FROM `test_event` AS etb WHERE etb.`@dt` between '2025-09-03 02:00:00' AND '2025-09-10 01:59:59' AND etb.`@event_name` = 'shop_buy' AND etb.`@user_id` IS NOT NULL AND etb.`@user_id` > '0' ) AS src GROUP BY src.event_date, src.user_id, src.event_name_group ) AS fwt GROUP BY GROUPING SETS ( (user_id), (user_id, event_date), (user_id, event_name_group), (user_id, event_date, event_name_group) ); ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: #43391 Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…niform when group sets exist (apache#56942) Fix not in aggregate's output err after eliminate by uniform when group sets exist if query as following, would cause `ERROR 1105 (HY000): errCode = 2, detailMessage = GROUPING_PREFIX_event_name_group not in aggregate's output` the pr fix this ```sql SELECT CASE WHEN GROUPING(event_date) = 1 THEN '(TOTAL)' ELSE CAST(event_date AS VARCHAR) END AS event_date, user_id, MAX(conversion_level) AS conversion_level, CASE WHEN GROUPING(event_name_group) = 1 THEN '(TOTAL)' ELSE event_name_group END AS event_name_group FROM ( SELECT src.event_date, src.user_id, WINDOW_FUNNEL( 3600 * 24 * 1, 'default', src.event_time, src.event_name = 'shop_buy', src.event_name = 'shop_buy' ) AS conversion_level, src.event_name_group FROM ( SELECT CAST(etb.`@dt` AS DATE) AS event_date, etb.`@event_name` AS event_name, etb.`@event_time` AS event_time, etb.`@event_name` AS event_name_group, etb.`@user_id` AS user_id FROM `test_event` AS etb WHERE etb.`@dt` between '2025-09-03 02:00:00' AND '2025-09-10 01:59:59' AND etb.`@event_name` = 'shop_buy' AND etb.`@user_id` IS NOT NULL AND etb.`@user_id` > '0' ) AS src GROUP BY src.event_date, src.user_id, src.event_name_group ) AS fwt GROUP BY GROUPING SETS ( (user_id), (user_id, event_date), (user_id, event_name_group), (user_id, event_date, event_name_group) ); ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#43391 Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…niform when group sets exist (apache#56942) Fix not in aggregate's output err after eliminate by uniform when group sets exist if query as following, would cause `ERROR 1105 (HY000): errCode = 2, detailMessage = GROUPING_PREFIX_event_name_group not in aggregate's output` the pr fix this ```sql SELECT CASE WHEN GROUPING(event_date) = 1 THEN '(TOTAL)' ELSE CAST(event_date AS VARCHAR) END AS event_date, user_id, MAX(conversion_level) AS conversion_level, CASE WHEN GROUPING(event_name_group) = 1 THEN '(TOTAL)' ELSE event_name_group END AS event_name_group FROM ( SELECT src.event_date, src.user_id, WINDOW_FUNNEL( 3600 * 24 * 1, 'default', src.event_time, src.event_name = 'shop_buy', src.event_name = 'shop_buy' ) AS conversion_level, src.event_name_group FROM ( SELECT CAST(etb.`@dt` AS DATE) AS event_date, etb.`@event_name` AS event_name, etb.`@event_time` AS event_time, etb.`@event_name` AS event_name_group, etb.`@user_id` AS user_id FROM `test_event` AS etb WHERE etb.`@dt` between '2025-09-03 02:00:00' AND '2025-09-10 01:59:59' AND etb.`@event_name` = 'shop_buy' AND etb.`@user_id` IS NOT NULL AND etb.`@user_id` > '0' ) AS src GROUP BY src.event_date, src.user_id, src.event_name_group ) AS fwt GROUP BY GROUPING SETS ( (user_id), (user_id, event_date), (user_id, event_name_group), (user_id, event_date, event_name_group) ); ``` Issue Number: close #xxx Related PR: apache#43391 Problem Summary: None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…niform when group sets exist (apache#56942) Fix not in aggregate's output err after eliminate by uniform when group sets exist if query as following, would cause `ERROR 1105 (HY000): errCode = 2, detailMessage = GROUPING_PREFIX_event_name_group not in aggregate's output` the pr fix this ```sql SELECT CASE WHEN GROUPING(event_date) = 1 THEN '(TOTAL)' ELSE CAST(event_date AS VARCHAR) END AS event_date, user_id, MAX(conversion_level) AS conversion_level, CASE WHEN GROUPING(event_name_group) = 1 THEN '(TOTAL)' ELSE event_name_group END AS event_name_group FROM ( SELECT src.event_date, src.user_id, WINDOW_FUNNEL( 3600 * 24 * 1, 'default', src.event_time, src.event_name = 'shop_buy', src.event_name = 'shop_buy' ) AS conversion_level, src.event_name_group FROM ( SELECT CAST(etb.`@dt` AS DATE) AS event_date, etb.`@event_name` AS event_name, etb.`@event_time` AS event_time, etb.`@event_name` AS event_name_group, etb.`@user_id` AS user_id FROM `test_event` AS etb WHERE etb.`@dt` between '2025-09-03 02:00:00' AND '2025-09-10 01:59:59' AND etb.`@event_name` = 'shop_buy' AND etb.`@user_id` IS NOT NULL AND etb.`@user_id` > '0' ) AS src GROUP BY src.event_date, src.user_id, src.event_name_group ) AS fwt GROUP BY GROUPING SETS ( (user_id), (user_id, event_date), (user_id, event_name_group), (user_id, event_date, event_name_group) ); ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#43391 Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
This PR introduces two main changes:
+--aggregate(group by a,b output a,b,max(c))
(a is uniform and not null: e.g. a is projection 2 as a in logicalProject)
->
+--aggregate(group by b output b,any_value(a) as a,max(c))
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Release note
None
Check List (For Reviewer who merge this PR)