-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix] (agg_strategy) fix result wrong when the multi_distinct_func and count distinct multi expr exists same time #56271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run cloud_p0 |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
FE Regression Coverage ReportIncrement line coverage |
| = ImmutableList.builderWithExpectedSize(aggOutput.size()); | ||
| for (NamedExpression output : aggOutput) { | ||
| Expression rewrittenExpr = output.rewriteDownShortCircuit( | ||
| e -> e instanceof MultiDistinction ? ((MultiDistinction) e).withMustUseMultiDistinctAgg(true) : e); | ||
| newAggOutputBuilder.add((NamedExpression) rewrittenExpr); | ||
| } | ||
| newAggOutputBuilder.addAll(aggOutput); | ||
| ImmutableList<NamedExpression> normalizedAggOutput = newAggOutputBuilder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalizedAggOutput always equals to aggOutput? just use aggOutput
|
run buildall |
TPC-H: Total hot run time: 1534 ms |
TPC-DS: Total hot run time: 2823 ms |
ClickBench: Total hot run time: 0.06 s |
FE Regression Coverage ReportIncrement line coverage |
|
run cloud_p0 |
|
run nonconcurrent |
FE Regression Coverage ReportIncrement line coverage |
|
run nonConcurrent |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
…d count distinct multi expr exists same time (#56271) ### What problem does this PR solve? Related PR: #54079 Problem Summary: 1. Added a check for multi_distinct_count(a,b) in the MultiDistinctCount constructor to prevent the use of multiple columns. Because BE doesn't report an error in this case, it only uses the first argument of the multi_distinct function, resulting in incorrect results. 2. In scenarios without a group by key, when multi_distinct_func and count(distinct a,b) appear together, the original code converts count(distinct a,b) to multi_distinct_count(a), resulting in incorrect results. The correct approach is to use multi-stage splitting when count distinct multi_expr appears. 3. Removed the mustUseMultiDistinct flag. This flag is useless.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #54079
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)