-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Changes to break plan when encountering JOIN node children with same CTE ID #55763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34572 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
82aca95 to
ab21d4e
Compare
|
run buildall |
TPC-DS: Total hot run time: 190600 ms |
ClickBench: Total hot run time: 30.19 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run P0 |
|
run buildall |
TPC-DS: Total hot run time: 190703 ms |
ClickBench: Total hot run time: 30.49 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-DS: Total hot run time: 190023 ms |
ClickBench: Total hot run time: 30.54 s |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
query64.sql from TPCDS test cases was failing due to TIMEOUT. The dataset used for this query had lot of duplicates in the DB tables.
When we saw the generated plan a JOIN node had the same CTE node as its two children - This could cause cyclic dependency where one CTE is waiting for the other and vice versa.
Also, the data size was big and probe side buffer becomes full.
To avoid this - the changes in this PR are made to inline the plan when such a situation occurs (where JOIN node has 2 or more same CTE nodes as its children)
Example:
Consider the following query example (in tpcds database):
with cs_ui as (
select cs_item_sk, sum(cs_ext_list_price) as sale
from catalog_sales
group by cs_item_sk
) select a.cs_item_sk
from cs_ui a, cs_ui b
where a.cs_item_sk = b.cs_item_sk;
In this query, the CTE cs_ui is used twice (as a and b) - Since both sides of the JOIN node reference the same CTE,
we break the CTE optimization for this plan. Each reference to cs_ui must remain distinct during the JOIN operation.
By breaking the plan and inlining it, the above change that each instance of CTE is handled sperately, we can avoid the problems of cyclic dependency.
Note that the above changes are applied ONLY to JOIN nodes and UNION and AGGREAGTES are excluded from this.
A new session variable "enable_join_same_cte_child" (with default value as true) - This will allow JOIN node with same children. (same as existing/default bahavior)
If the user wishes to use the new behavior of not having JOIN with same children - need to set
"enable_join_same_cte_child" = "false"
Following is the explain output when "enable_join_same_cte_child" = "true" (EXISTING BEHAVIOR)
mysql> set enable_join_same_cte_child = "true";
Query OK, 0 rows affected (0.01 sec)
mysql> show variables like "%enable_join_same_cte_child%";
+----------------------------+-------+---------------+---------+
| Variable_name | Value | Default_Value | Changed |
+----------------------------+-------+---------------+---------+
| enable_join_same_cte_child | true | true | 1 |
+----------------------------+-------+---------------+---------+
1 row in set (0.00 sec)
mysql> explain with cs_ui as (select cs_item_sk, sum(cs_ext_list_price) as sale from catalog_sales group by cs_item_sk) select a.cs_item_sk from cs_ui a, cs_ui b where a.cs_item_sk = b.cs_item_sk;
+------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS: |
| cs_item_sk[#41] |
| PARTITION: RANDOM |
| |
| HAS_COLO_PLAN_NODE: false |
| |
| VRESULT SINK |
| MYSQL_PROTOCAL |
| |
| 6:VHASH JOIN(375) |
| | join op: INNER JOIN(BROADCAST)[] |
| | equal join conjunct: (cs_item_sk[#38] = cs_item_sk[#37]) |
| | cardinality=200,414 |
| | vec output tuple id: 7 |
| | output tuple id: 7 |
| | vIntermediate tuple ids: 6 |
| | hash output slot ids: 38 |
| | final projections: cs_item_sk[#39] |
| | final project output tuple id: 7 |
| | distribute expr lists: |
| | distribute expr lists: |
| | |
| |----4:VEXCHANGE |
| | offset: 0 |
| | distribute expr lists: |
| | |
| 5:VEXCHANGE |
| offset: 0 |
| distribute expr lists: |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| cs_item_sk[#36] |
| PARTITION: HASH_PARTITIONED: cs_item_sk[#35] |
| |
| HAS_COLO_PLAN_NODE: true |
| |
| MultiCastDataSinks |
| STREAM DATA SINK |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| PROJECTIONS: cs_item_sk[#36] |
| PROJECTION TUPLE: 4 |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| RANDOM |
| PROJECTIONS: cs_item_sk[#36] |
| PROJECTION TUPLE: 5 |
| |
| 3:VAGGREGATE (merge finalize)(357) |
| | group by: cs_item_sk[#35] |
| | sortByGroupKey:false |
| | cardinality=200,414 |
| | distribute expr lists: cs_item_sk[#35] |
| | |
| 2:VEXCHANGE |
| offset: 0 |
| distribute expr lists: cs_item_sk[#35] |
| |
| PLAN FRAGMENT 2 |
| |
| PARTITION: HASH_PARTITIONED: cs_item_sk[#1], cs_order_number[#2] |
| |
| HAS_COLO_PLAN_NODE: false |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: cs_item_sk[#35] |
| |
| 1:VAGGREGATE (update finalize)(349) |
| | STREAMING |
| | group by: cs_item_sk[#34] |
| | sortByGroupKey:false |
| | cardinality=200,414 |
| | distribute expr lists: cs_item_sk[#34] |
| | |
| 0:VOlapScanNode(341) |
| TABLE: tpcds.catalog_sales(catalog_sales), PREAGGREGATION: ON |
| partitions=1/1 (catalog_sales) |
| tablets=96/96, tabletList=1757302757630,1757302757632,1757302757634 ... |
| cardinality=143997065, avgRowSize=337.41714, numNodes=1 |
| pushAggOp=NONE |
| final projections: cs_item_sk[#1] |
| final project output tuple id: 1 |
| |
| |
| |
| ========== STATISTICS ========== |
+------------------------------------------------------------------------------+
89 rows in set (0.02 sec)
Following is the explain output when "enable_join_same_cte_child" = "false" (NEW behavior)
mysql> set enable_join_same_cte_child = "false";
Query OK, 0 rows affected (0.01 sec)
mysql> explain with cs_ui as (select cs_item_sk, sum(cs_ext_list_price) as sale from catalog_sales group by cs_item_sk) select a.cs_item_sk from cs_ui a, cs_ui b where a.cs_item_sk = b.cs_item_sk;
+------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS: |
| cs_item_sk[#78] |
| PARTITION: HASH_PARTITIONED: cs_item_sk[#73] |
| |
| HAS_COLO_PLAN_NODE: true |
| |
| VRESULT SINK |
| MYSQL_PROTOCAL |
| |
| 8:VHASH JOIN(504) |
| | join op: INNER JOIN(PARTITIONED)[] |
| | equal join conjunct: (cs_item_sk[#75] = cs_item_sk[#37]) |
| | cardinality=200,414 |
| | vec output tuple id: 11 |
| | output tuple id: 11 |
| | vIntermediate tuple ids: 10 |
| | hash output slot ids: 75 |
| | final projections: cs_item_sk[#76] |
| | final project output tuple id: 11 |
| | distribute expr lists: cs_item_sk[#75] |
| | distribute expr lists: cs_item_sk[#37] |
| | |
| |----3:VAGGREGATE (merge finalize)(496) |
| | | group by: cs_item_sk[#35] |
| | | sortByGroupKey:false |
| | | cardinality=200,414 |
| | | final projections: cs_item_sk[#36] |
| | | final project output tuple id: 4 |
| | | distribute expr lists: cs_item_sk[#35] |
| | | |
| | 2:VEXCHANGE |
| | offset: 0 |
| | distribute expr lists: cs_item_sk[#35] |
| | |
| 7:VAGGREGATE (merge finalize)(475) |
| | group by: cs_item_sk[#73] |
| | sortByGroupKey:false |
| | cardinality=200,414 |
| | final projections: cs_item_sk[#74] |
| | final project output tuple id: 9 |
| | distribute expr lists: cs_item_sk[#73] |
| | |
| 6:VEXCHANGE |
| offset: 0 |
| distribute expr lists: cs_item_sk[#73] |
| |
| PLAN FRAGMENT 1 |
| |
| PARTITION: HASH_PARTITIONED: cs_item_sk[#39], cs_order_number[#40] |
| |
| HAS_COLO_PLAN_NODE: false |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 06 |
| HASH_PARTITIONED: cs_item_sk[#73] |
| |
| 5:VAGGREGATE (update finalize)(467) |
| | STREAMING |
| | group by: cs_item_sk[#72] |
| | sortByGroupKey:false |
| | cardinality=200,414 |
| | distribute expr lists: cs_item_sk[#72] |
| | |
| 4:VOlapScanNode(459) |
| TABLE: tpcds.catalog_sales(catalog_sales), PREAGGREGATION: ON |
| partitions=1/1 (catalog_sales) |
| tablets=96/96, tabletList=1757302757630,1757302757632,1757302757634 ... |
| cardinality=143997065, avgRowSize=337.41714, numNodes=1 |
| pushAggOp=NONE |
| final projections: cs_item_sk[#39] |
| final project output tuple id: 6 |
| |
| PLAN FRAGMENT 2 |
| |
| PARTITION: HASH_PARTITIONED: cs_item_sk[#1], cs_order_number[#2] |
| |
| HAS_COLO_PLAN_NODE: false |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 02 |
| HASH_PARTITIONED: cs_item_sk[#35] |
| |
| 1:VAGGREGATE (update finalize)(488) |
| | STREAMING |
| | group by: cs_item_sk[#34] |
| | sortByGroupKey:false |
| | cardinality=200,414 |
| | distribute expr lists: cs_item_sk[#34] |
| | |
| 0:VOlapScanNode(480) |
| TABLE: tpcds.catalog_sales(catalog_sales), PREAGGREGATION: ON |
| partitions=1/1 (catalog_sales) |
| tablets=96/96, tabletList=1757302757630,1757302757632,1757302757634 ... |
| cardinality=143997065, avgRowSize=337.41714, numNodes=1 |
| pushAggOp=NONE |
| final projections: cs_item_sk[#1] |
| final project output tuple id: 1 |
| |
| |
| |
| ========== STATISTICS ========== |
+------------------------------------------------------------------------------+
102 rows in set (0.02 sec)
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)