Skip to content

Conversation

@feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Sep 21, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #54079

Problem Summary:

  1. Added a check for multi_distinct_count(a,b) in the MultiDistinctCount constructor to prevent the use of multiple columns. Because BE doesn't report an error in this case, it only uses the first argument of the multi_distinct function, resulting in incorrect results.
  2. In scenarios without a group by key, when multi_distinct_func and count(distinct a,b) appear together, the original code converts count(distinct a,b) to multi_distinct_count(a), resulting in incorrect results. The correct approach is to use multi-stage splitting when count distinct multi_expr appears.
  3. Removed the mustUseMultiDistinct flag. This flag is useless.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

@feiniaofeiafei
Copy link
Contributor Author

run cloud_p0

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei
Copy link
Contributor Author

run buildall

morrySnow
morrySnow previously approved these changes Sep 22, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 22, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 88.89% (8/9) 🎉
Increment coverage report
Complete coverage report

Comment on lines 274 to 276
= ImmutableList.builderWithExpectedSize(aggOutput.size());
for (NamedExpression output : aggOutput) {
Expression rewrittenExpr = output.rewriteDownShortCircuit(
e -> e instanceof MultiDistinction ? ((MultiDistinction) e).withMustUseMultiDistinctAgg(true) : e);
newAggOutputBuilder.add((NamedExpression) rewrittenExpr);
}
newAggOutputBuilder.addAll(aggOutput);
ImmutableList<NamedExpression> normalizedAggOutput = newAggOutputBuilder.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizedAggOutput always equals to aggOutput? just use aggOutput

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 22, 2025
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 1534 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8810b98ab0b872b1f335b8b0fc58b2c73a4732c8, data reload: false

------ Round 1 ----------------------------------
q1	6602	28	19	19
q2	579	25	25	25
q3	881	15	15	15
q4	957	15	14	14
q5	2252	16	17	16
q6	222	14	14	14
q7	960	23	21	21
q8	1198	15	12	12
q9	16610	13	12	12
q10	4826	16	17	16
q11	466	23	21	21
q12	335	13	12	12
q13	17603	12	11	11
q14	251	13	11	11
q15	626	12	11	11
q16	1034	994	983	983
q17	547	13	12	12
q18	7994	11	11	11
q19	1115	13	12	12
q20	344	365	254	254
q21	3999	20	20	20
q22	1096	12	12	12
Total cold run time: 70497 ms
Total hot run time: 1534 ms

----- Round 2, with runtime_filter_mode=off -----
q1	13	13	12	12
q2	24	20	21	20
q3	13	11	11	11
q4	12	12	11	11
q5	12	11	12	11
q6	12	11	11	11
q7	19	19	18	18
q8	11	10	10	10
q9	10	11	11	11
q10	11	10	11	10
q11	19	20	19	19
q12	11	10	10	10
q13	10	9	10	9
q14	10	11	11	11
q15	11	10	9	9
q16	1079	1092	1027	1027
q17	12	10	11	10
q18	10	9	12	9
q19	12	12	11	11
q20	1854	1989	1849	1849
q21	20	19	19	19
q22	10	10	10	10
Total cold run time: 3195 ms
Total hot run time: 3118 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 2823 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8810b98ab0b872b1f335b8b0fc58b2c73a4732c8, data reload: false

query1	1088	19	12	12
query2	7106	19	15	15
query3	7636	12	11	11
query4	26569	13	12	12
query5	4446	13	12	12
query6	397	11	10	10
query7	5430	11	11	11
query8	355	18	18	18
query9	9169	12	10	10
query10	737	12	12	12
query11	16061	12	11	11
query12	213	10	10	10
query13	1708	10	9	9
query14	11252	13	12	12
query15	439	13	11	11
query16	7544	10	10	10
query17	2166	10	9	9
query18	2931	11	10	10
query19	248	9	8	8
query20	143	11	10	10
query21	219	10	10	10
query22	4183	10	9	9
query23	34097	27	13	13
query24	10863	13	11	11
query25	795	11	9	9
query26	1819	12	11	11
query27	3412	12	10	10
query28	6257	11	10	10
query29	1795	10	9	9
query30	635	12	11	11
query31	1731	12	11	11
query32	119	12	10	10
query33	1342	11	11	11
query34	1630	837	547	547
query35	1068	11	10	10
query36	995	10	9	9
query37	260	10	9	9
query38	3602	10	9	9
query39	1488	744	748	744
query40	327	13	12	12
query41	94	12	11	11
query42	152	10	9	9
query43	488	10	9	9
query44	1384	11	9	9
query45	385	11	10	10
query46	1191	10	9	9
query47	1782	11	10	10
query48	429	10	11	10
query49	1318	11	11	11
query50	780	11	10	10
query51	3922	10	9	9
query52	130	11	11	11
query53	286	10	10	10
query54	754	10	9	9
query55	95	10	8	8
query56	337	10	9	9
query57	1239	9	9	9
query58	386	10	8	8
query59	2613	9	8	8
query60	370	10	9	9
query61	173	9	8	8
query62	832	9	10	9
query63	274	10	9	9
query64	5491	9	8	8
query65	4163	11	13	11
query66	1784	13	11	11
query67	16953	36	11	11
query68	3548	9	9	9
query69	495	10	9	9
query70	1441	10	10	10
query71	475	391	359	359
query72	6626	10	10	10
query73	563	14	10	10
query74	9623	11	11	11
query75	3437	9	9	9
query76	2979	10	9	9
query77	882	12	10	10
query78	9677	12	8	8
query79	1025	10	10	10
query80	747	11	9	9
query81	676	10	13	10
query82	393	11	9	9
query83	332	10	8	8
query84	285	9	9	9
query85	1646	9	11	9
query86	503	10	9	9
query87	3989	11	9	9
query88	2983	15	11	11
query89	442	9	9	9
query90	2186	10	10	10
query91	171	9	9	9
query92	104	11	9	9
query93	1066	10	9	9
query94	821	10	10	10
query95	551	10	8	8
query96	453	10	10	10
query97	3073	8	9	8
query98	230	231	247	231
query99	1555	10	11	10
Total cold run time: 293057 ms
Total hot run time: 2823 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 0.06 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8810b98ab0b872b1f335b8b0fc58b2c73a4732c8, data reload: false

query1	0.06	0.02	0.01
query2	0.10	0.01	0.00
query3	0.27	0.01	0.01
query4	1.76	0.01	0.01
query5	0.29	0.01	0.00
query6	1.65	0.01	0.01
query7	0.05	0.01	0.00
query8	0.07	0.00	0.01
query9	0.62	0.00	0.00
query10	0.59	0.00	0.00
query11	0.18	0.00	0.00
query12	0.17	0.00	0.00
query13	0.63	0.01	0.00
query14	1.07	0.01	0.01
query15	0.87	0.00	0.01
query16	0.43	0.00	0.01
query17	1.03	0.01	0.00
query18	0.23	0.00	0.00
query19	2.30	0.00	0.01
query20	0.02	0.00	0.00
query21	15.94	0.00	0.00
query22	6.81	0.00	0.01
query23	15.74	0.00	0.01
query24	1.35	0.00	0.01
query25	0.24	0.00	0.00
query26	0.17	0.01	0.01
query27	0.12	0.00	0.00
query28	1.29	0.00	0.00
query29	13.04	0.00	0.00
query30	0.30	0.00	0.00
query31	2.29	0.01	0.00
query32	5.97	0.01	0.00
query33	4.39	0.00	0.01
query34	7.60	0.00	0.01
query35	6.17	0.00	0.00
query36	0.69	0.00	0.00
query37	0.12	0.00	0.00
query38	0.08	0.00	0.01
query39	0.06	0.00	0.01
query40	0.19	0.00	0.00
query41	0.09	0.00	0.01
query42	0.06	0.01	0.00
query43	0.05	0.00	0.00
Total cold run time: 95.15 s
Total hot run time: 0.06 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 86.67% (13/15) 🎉
Increment coverage report
Complete coverage report

@feiniaofeiafei
Copy link
Contributor Author

run cloud_p0

@feiniaofeiafei
Copy link
Contributor Author

run nonconcurrent

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 86.67% (13/15) 🎉
Increment coverage report
Complete coverage report

@feiniaofeiafei
Copy link
Contributor Author

run nonConcurrent

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 86.67% (13/15) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 23, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 2148a6d into apache:master Sep 23, 2025
27 of 31 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 24, 2025
…d count distinct multi expr exists same time (#56271)

### What problem does this PR solve?

Related PR: #54079

Problem Summary:
1. Added a check for multi_distinct_count(a,b) in the MultiDistinctCount
constructor to prevent the use of multiple columns. Because BE doesn't
report an error in this case, it only uses the first argument of the
multi_distinct function, resulting in incorrect results.
2. In scenarios without a group by key, when multi_distinct_func and
count(distinct a,b) appear together, the original code converts
count(distinct a,b) to multi_distinct_count(a), resulting in incorrect
results. The correct approach is to use multi-stage splitting when count
distinct multi_expr appears.
3. Removed the mustUseMultiDistinct flag. This flag is useless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants