-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-4.0: [Fix](rules) fix result wrong of PushDownAggThroughJoinOnPkFk #59498 #59703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Related PR: #36035 Problem Summary: The key of the aggregation must include the primary key of the primary key table (or contain a unique key that can form a bijection with the primary key) to push the aggregation to the foreign key table. Before this pr, doris have wrong results in this situation: drop table if exists customer_test; drop table if exists store_sales_test; CREATE TABLE customer_test ( c_customer_sk INT not null , c_first_name VARCHAR(50), c_last_name VARCHAR(50) ); CREATE TABLE store_sales_test ( ss_customer_sk INT, ss_date DATE ); INSERT INTO customer_test VALUES (1, 'John', 'Smith'); INSERT INTO customer_test VALUES (2, 'John', 'Smith'); INSERT INTO store_sales_test VALUES (1, '2024-01-01'); INSERT INTO store_sales_test VALUES (2, '2024-01-01'); alter table customer_test add constraint c_pk primary key (c_customer_sk); alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk); show constraints from customer_test; show constraints from store_sales_test; SELECT DISTINCT c_last_name, c_first_name, ss_date FROM store_sales_test inner join customer_test on store_sales_test.ss_customer_sk = customer_test.c_customer_sk; set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK'; set disable_nereids_rules=''; Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr. This is because AGG (group by c_last_name, c_first_name, ss_date) should not be pushed down below the JOIN operation. The original transform was: Agg(group by c_last_name, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) This is an incorrect rewrite because it is not equivalent. This pr corrects the rewrite, allowing the aggregation to be pushed down below the join only when there is a bijective relationship between the group by key from the primary table and the fields in the foreign table (a functional dependency exists from a to b, and also from b to a, then a and b have a bijective relationship). For example, Agg(group by c_customer_sk, c_first_name, ss_date ) +--Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--scan(store_sales_test) -> Join(c_customer_sk=ss_customer_sk) +--scan(customer_test) +--Agg(group by ss_customer_sk,ss_date) +--scan(store_sales_test) Since c_customer_sk is the primary key, c_first_name in the group by clause can be removed (based on functional dependencies). Furthermore, due to the equality relationship c_customer_sk = ss_customer_sk, there is a bijective relationship between c_customer_sk and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can be replaced with `group by ss_customer_sk, ss_date`. The aggregation group by key is entirely replaced with the output of the foreign table. Since a primary key-foreign key join does not expand the rows of the foreign table,In this situation, the aggregation can be pushed down.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
yiguolei
approved these changes
Jan 12, 2026
Contributor
Author
|
PR approved by at least one committer and no changes requested. |
Contributor
Author
|
PR approved by anyone and no changes requested. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #59498