Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jan 9, 2026

Cherry-picked from #59498

Related PR: #36035

Problem Summary:
The key of the aggregation must include the primary key of the primary
key table (or contain a unique key that can form a bijection with the
primary key) to push the aggregation to the foreign key table.
Before this pr, doris have wrong results  in this situation:

drop table if exists customer_test;
drop table if exists store_sales_test;

CREATE TABLE customer_test (
    c_customer_sk INT not null ,
    c_first_name VARCHAR(50),
    c_last_name VARCHAR(50)
);

CREATE TABLE store_sales_test (
    ss_customer_sk INT,
    ss_date DATE
);

INSERT INTO customer_test VALUES (1, 'John', 'Smith');
INSERT INTO customer_test VALUES (2, 'John', 'Smith');  

INSERT INTO store_sales_test VALUES (1, '2024-01-01');
INSERT INTO store_sales_test VALUES (2, '2024-01-01');

alter table customer_test add constraint c_pk primary key (c_customer_sk);
alter table store_sales_test add constraint ss_c_fk foreign key (ss_customer_sk) references customer_test(c_customer_sk);
show constraints from customer_test;
show constraints from store_sales_test;

SELECT DISTINCT c_last_name, c_first_name, ss_date
FROM store_sales_test inner join customer_test
on store_sales_test.ss_customer_sk = customer_test.c_customer_sk;

set disable_nereids_rules='PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK';
set disable_nereids_rules='';

Turn on PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK will have different result
with turn off PUSH_DOWN_AGG_THROUGH_JOIN_ON_PKFK before this pr.
This is because AGG (group by c_last_name, c_first_name, ss_date) should
not be pushed down below the JOIN operation.
The original transform was:

Agg(group by c_last_name, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

This is an incorrect rewrite because it is not equivalent.
This pr corrects the rewrite, allowing the aggregation to be pushed down
below the join only when there is a bijective relationship between the
group by key from the primary table and the fields in the foreign table
(a functional dependency exists from a to b, and also from b to a, then
a and b have a bijective relationship).
For example,

Agg(group by c_customer_sk, c_first_name, ss_date )
  +--Join(c_customer_sk=ss_customer_sk)
     +--scan(customer_test)
     +--scan(store_sales_test)
->
Join(c_customer_sk=ss_customer_sk)
  +--scan(customer_test)
  +--Agg(group by ss_customer_sk,ss_date)
    +--scan(store_sales_test)

Since c_customer_sk is the primary key, c_first_name in the group by
clause can be removed (based on functional dependencies).
Furthermore, due to the equality relationship c_customer_sk =
ss_customer_sk, there is a bijective relationship between c_customer_sk
and ss_customer_sk. In this case, `group by c_customer_sk, ss_date` can
be replaced with `group by ss_customer_sk, ss_date`.
The aggregation group by key is entirely replaced with the output of the
foreign table. Since a primary key-foreign key join does not expand the
rows of the foreign table,In this situation, the aggregation can be
pushed down.
@github-actions github-actions bot requested a review from yiguolei as a code owner January 9, 2026 02:26
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Jan 9, 2026
@dataroaring dataroaring reopened this Jan 9, 2026
@hello-stephen
Copy link
Contributor

run buildall

@github-actions
Copy link
Contributor Author

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 12, 2026
@github-actions
Copy link
Contributor Author

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 09ed1fe into branch-4.0 Jan 12, 2026
25 of 27 checks passed
@github-actions github-actions bot deleted the auto-pick-59498-branch-4.0 branch January 12, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants