-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](nereids) Fix aggregate source repeat output is different from child repeat #57840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](nereids) Fix aggregate source repeat output is different from child repeat #57840
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 34372 ms |
TPC-DS: Total hot run time: 187623 ms |
ClickBench: Total hot run time: 27.48 s |
FE Regression Coverage ReportIncrement line coverage |
847d978 to
410eb1a
Compare
|
run buildall |
TPC-H: Total hot run time: 34023 ms |
TPC-DS: Total hot run time: 187121 ms |
ClickBench: Total hot run time: 27.16 s |
FE Regression Coverage ReportIncrement line coverage |
410eb1a to
846b10d
Compare
|
run buildall |
TPC-H: Total hot run time: 35257 ms |
TPC-DS: Total hot run time: 187624 ms |
ClickBench: Total hot run time: 27.21 s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in the deep copy logic for LogicalAggregate nodes that have a sourceRepeat reference. When CTEs containing GROUP SETS are inlined, the aggregate's sourceRepeat field needs to point to the newly copied repeat node rather than the original one. The fix ensures that after deep copying, the sourceRepeat reference is updated to point to the copied child repeat node.
Key Changes:
- Updates
LogicalPlanDeepCopier.visitLogicalAggregate()to re-establish the sourceRepeat reference after copying - Adds comprehensive test coverage for both cases (with and without sourceRepeat)
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/copier/LogicalPlanDeepCopier.java | Updates the deep copy logic to find and update the sourceRepeat reference to the copied repeat node after creating a deep copy of the aggregate |
| fe/fe-core/src/test/java/org/apache/doris/nereids/trees/copier/LogicalPlanDeepCopierTest.java | Adds two new test cases to verify correct deep copying behavior for aggregates with and without sourceRepeat references |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/copier/LogicalPlanDeepCopier.java
Outdated
Show resolved
Hide resolved
| Optional<LogicalRepeat<? extends Plan>> childRepeat = | ||
| copiedAggregate.collectFirst(LogicalRepeat.class::isInstance); | ||
| if (childRepeat.isPresent()) { | ||
| copiedAggregate = copiedAggregate.withSourceRepeat(childRepeat.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not call withXXXX twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have fixed
|
run buildall |
TPC-H: Total hot run time: 34720 ms |
TPC-DS: Total hot run time: 187803 ms |
ClickBench: Total hot run time: 27.52 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…hild repeat (#57840) if cte def has group sets, after cte inline, the aggregate source repeat output is different from child repeat the pr fix this such query as following ```sql with t as ( select t1.l_partkey, t1.l_suppkey, o_orderdate, grouping(t1.l_suppkey), grouping(o_orderdate), grouping_id(t1.l_partkey, t1.l_suppkey), grouping_id(t1.l_partkey, t1.l_suppkey, o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from (select * from lineitem where l_shipdate = '2023-12-11') t1 left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate group by GROUPING SETS ((l_shipdate, o_orderdate, l_partkey), (l_partkey, l_suppkey), (l_suppkey), ()) ) select t1.l_suppkey, t2.o_orderdate from t t1 inner join t t2 on t1.l_partkey = t2.l_partkey; ```
…hild repeat (apache#57840) if cte def has group sets, after cte inline, the aggregate source repeat output is different from child repeat the pr fix this such query as following ```sql with t as ( select t1.l_partkey, t1.l_suppkey, o_orderdate, grouping(t1.l_suppkey), grouping(o_orderdate), grouping_id(t1.l_partkey, t1.l_suppkey), grouping_id(t1.l_partkey, t1.l_suppkey, o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from (select * from lineitem where l_shipdate = '2023-12-11') t1 left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate group by GROUPING SETS ((l_shipdate, o_orderdate, l_partkey), (l_partkey, l_suppkey), (l_suppkey), ()) ) select t1.l_suppkey, t2.o_orderdate from t t1 inner join t t2 on t1.l_partkey = t2.l_partkey; ```
…hild repeat (apache#57840) if cte def has group sets, after cte inline, the aggregate source repeat output is different from child repeat the pr fix this such query as following ```sql with t as ( select t1.l_partkey, t1.l_suppkey, o_orderdate, grouping(t1.l_suppkey), grouping(o_orderdate), grouping_id(t1.l_partkey, t1.l_suppkey), grouping_id(t1.l_partkey, t1.l_suppkey, o_orderdate), sum(o_totalprice), max(o_totalprice), min(o_totalprice), count(*), count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end) from (select * from lineitem where l_shipdate = '2023-12-11') t1 left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate group by GROUPING SETS ((l_shipdate, o_orderdate, l_partkey), (l_partkey, l_suppkey), (l_suppkey), ()) ) select t1.l_suppkey, t2.o_orderdate from t t1 inner join t t2 on t1.l_partkey = t2.l_partkey; ```
What problem does this PR solve?
if cte def has group sets, after cte inline, the aggregate source repeat output is different from child repeat
the pr fix this
such query as following
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)