-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](rules) fix result wrong of PushDownAggThroughJoinOnPkFk #59498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 33970 ms |
TPC-DS: Total hot run time: 173217 ms |
ClickBench: Total hot run time: 26.83 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
1 similar comment
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
ee0201a to
67d181b
Compare
|
run buildall |
TPC-H: Total hot run time: 31162 ms |
TPC-DS: Total hot run time: 173051 ms |
ClickBench: Total hot run time: 27.22 s |
FE Regression Coverage ReportIncrement line coverage |
3215de3 to
4da7874
Compare
|
run buildall |
TPC-H: Total hot run time: 32489 ms |
TPC-DS: Total hot run time: 173347 ms |
ClickBench: Total hot run time: 26.78 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
4414e9a to
7b1898b
Compare
|
run buildall |
TPC-H: Total hot run time: 31880 ms |
TPC-DS: Total hot run time: 172991 ms |
ClickBench: Total hot run time: 26.84 s |
7b1898b to
1c48d96
Compare
1c48d96 to
3fd07b2
Compare
|
run buildall |
TPC-H: Total hot run time: 32025 ms |
TPC-DS: Total hot run time: 173235 ms |
ClickBench: Total hot run time: 27.19 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run vault_p0 |
1 similar comment
|
run vault_p0 |
Related PR: #36035 Problem Summary: The key of the aggregation must include the primary key of the primary key table (or contain a unique key that can form a bijection with the primary key) to push the aggregation to the foreign key table. Before this pr, doris have wrong results in this situation: drop table if exists customer_test; drop table if exists store_sales_test; CREATE TABLE customer_test ( c_customer_sk INT not null , c_first_name VARCHAR(50), c_last_name VARCHAR(50) ); CREATE TABLE store_sales_test ( ss_customer_sk INT, ss_date DATE ); INSERT INTO customer_test VALUES (1, 'John', 'Smith'); INSERT INTO customer_test VALUES (2, 'John', 'Smith'); INSERT INTO store_sales_test VALUES (1, '2024-01-01'); INSERT INTO store_sales_test VALUES (2, '2024-01-01'); alter table customer_test add constraint c_pk primary key (c_customer_sk); alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk); show constraints from customer_test; show constraints from store_sales_test; SELECT DISTINCT c_last_name, c_first_name, ss_date FROM store_sales_test inner join customer_test on store_sales_test.ss_customer_sk = customer_test.c_customer_sk; set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK'; set disable_nereids_rules=''; Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr. This is because AGG (group by c_last_name, c_first_name, ss_date) should not be pushed down below the JOIN operation. The original transform was: Agg(group by c_last_name, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) This is an incorrect rewrite because it is not equivalent. This pr corrects the rewrite, allowing the aggregation to be pushed down below the join only when there is a bijective relationship between the group by key from the primary table and the fields in the foreign table (a functional dependency exists from a to b, and also from b to a, then a and b have a bijective relationship). For example, Agg(group by c_customer_sk, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) Since c_customer_sk is the primary key, c_first_name in the group by clause can be removed (based on functional dependencies). Furthermore, due to the equality relationship c_customer_sk = ss_customer_sk, there is a bijective relationship between c_customer_sk and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can be replaced with `group by ss_customer_sk, ss_date`. The aggregation group by key is entirely replaced with the output of the foreign table. Since a primary key-foreign key join does not expand the rows of the foreign table,In this situation, the aggregation can be pushed down.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #36035
Problem Summary:
The key of the aggregation must include the primary key of the primary key table (or contain a unique key that can form a bijection with the primary key) to push the aggregation to the foreign key table.
Before this pr, doris have wrong results in this situation:
set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK';
set disable_nereids_rules='';
Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr.
This is because AGG (group by c_last_name, c_first_name, ss_date) should not be pushed down below the JOIN operation.
The original transform was:
This is an incorrect rewrite because it is not equivalent.
This pr corrects the rewrite, allowing the aggregation to be pushed down below the join only when there is a bijective relationship between the group by key from the primary table and the fields in the foreign table (a functional dependency exists from a to b, and also from b to a, then a and b have a bijective relationship).
For example,
Since c_customer_sk is the primary key, c_first_name in the group by clause can be removed (based on functional dependencies).
Furthermore, due to the equality relationship c_customer_sk = ss_customer_sk, there is a bijective relationship between c_customer_sk and ss_customer_sk. In this case,
group by c_customer_sk, ss_datecan be replaced withgroup by ss_customer_sk, ss_date.The aggregation group by key is entirely replaced with the output of the foreign table. Since a primary key-foreign key join does not expand the rows of the foreign table,In this situation, the aggregation can be pushed down.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)