Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Jul 4, 2025

pick (#52654)

What problem does this PR solve?

routine load task will block in following case:

  1. The user created a job using the admin user of clusterA, and at some point deleted clusterA, and renamed clusterB to clusterA
  2. The cluster ID saved in the job is invalid and can't find any BE
  3. This task was repeatedly taken out of the queue and was put back to queue for there was no BE to execute, causing the other tasks to get stuck.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…find any BE (apache#52654)

### What problem does this PR solve?

routine load task will block in following case:
1. The user created a job using the admin user of clusterA, and at some
point deleted clusterA, and renamed clusterB to clusterA
2. The cluster ID saved in the job is invalid and can't find any BE
3. This task was repeatedly taken out of the queue and was put back to
queue for there was no BE to execute, causing the other tasks to get
stuck.
@sollhui sollhui requested a review from dataroaring as a code owner July 4, 2025 09:15
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Jul 4, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40087 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6408ff5be2b54eb0ee50a3b11ccc73d18168e15d, data reload: false

------ Round 1 ----------------------------------
q1	17573	6848	6611	6611
q2	2055	200	170	170
q3	10502	1131	1122	1122
q4	10231	768	738	738
q5	7866	2916	2875	2875
q6	228	140	140	140
q7	973	633	615	615
q8	9530	2007	2013	2007
q9	6707	6405	6420	6405
q10	6983	2306	2266	2266
q11	459	274	260	260
q12	406	216	211	211
q13	17792	3009	2991	2991
q14	241	208	207	207
q15	505	466	483	466
q16	466	397	376	376
q17	957	597	585	585
q18	7253	6648	6805	6648
q19	1397	1058	1053	1053
q20	492	202	204	202
q21	3876	3145	3282	3145
q22	1114	1031	994	994
Total cold run time: 107606 ms
Total hot run time: 40087 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6591	6577	7218	6577
q2	336	229	231	229
q3	2988	3023	3035	3023
q4	2023	1804	1786	1786
q5	5705	5747	5688	5688
q6	207	126	123	123
q7	2181	1759	1750	1750
q8	3394	3502	3488	3488
q9	8827	8875	8803	8803
q10	3612	3545	3540	3540
q11	578	491	506	491
q12	825	602	603	602
q13	3928	3157	3182	3157
q14	289	260	280	260
q15	501	451	465	451
q16	484	432	428	428
q17	1845	1599	1617	1599
q18	8276	7947	7535	7535
q19	1678	1472	1584	1472
q20	2109	1888	1868	1868
q21	5077	4980	4768	4768
q22	1085	1035	981	981
Total cold run time: 62539 ms
Total hot run time: 58619 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190200 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6408ff5be2b54eb0ee50a3b11ccc73d18168e15d, data reload: false

query1	984	368	362	362
query2	6522	1935	1821	1821
query3	6700	213	212	212
query4	34176	23950	23597	23597
query5	4342	459	464	459
query6	278	180	167	167
query7	4641	312	307	307
query8	272	215	222	215
query9	9476	2545	2555	2545
query10	487	273	266	266
query11	18086	15162	15332	15162
query12	161	106	104	104
query13	1653	425	425	425
query14	9656	7321	6672	6672
query15	264	169	182	169
query16	8085	493	481	481
query17	1618	571	571	571
query18	2129	305	312	305
query19	239	159	164	159
query20	114	111	108	108
query21	211	104	107	104
query22	4434	4307	4272	4272
query23	34319	33775	34637	33775
query24	11858	2848	2905	2848
query25	732	420	428	420
query26	1882	174	171	171
query27	3056	350	352	350
query28	7999	2090	2103	2090
query29	1097	454	443	443
query30	336	163	167	163
query31	1021	793	811	793
query32	100	60	60	60
query33	794	308	306	306
query34	973	501	539	501
query35	904	719	734	719
query36	1078	948	941	941
query37	272	75	73	73
query38	3920	3875	3766	3766
query39	1475	1408	1419	1408
query40	290	101	99	99
query41	50	49	50	49
query42	113	105	101	101
query43	523	490	473	473
query44	1228	789	769	769
query45	183	166	169	166
query46	1147	732	709	709
query47	1930	1796	1815	1796
query48	459	368	392	368
query49	1235	396	383	383
query50	816	406	420	406
query51	7189	7198	7148	7148
query52	105	91	90	90
query53	265	183	190	183
query54	1123	462	450	450
query55	79	75	78	75
query56	272	246	253	246
query57	1262	1135	1155	1135
query58	235	210	232	210
query59	3073	2960	2806	2806
query60	284	256	265	256
query61	110	106	108	106
query62	817	660	675	660
query63	225	192	203	192
query64	5211	641	631	631
query65	3305	3191	3222	3191
query66	1210	312	308	308
query67	15967	15545	15475	15475
query68	3496	598	604	598
query69	383	273	267	267
query70	1167	1142	1083	1083
query71	323	253	260	253
query72	6079	4204	4144	4144
query73	746	346	352	346
query74	9699	9154	8999	8999
query75	3380	2612	2632	2612
query76	2093	1080	1047	1047
query77	406	274	273	273
query78	10409	9634	9466	9466
query79	1120	601	611	601
query80	696	448	448	448
query81	500	225	221	221
query82	1245	93	90	90
query83	166	155	143	143
query84	240	85	82	82
query85	952	327	297	297
query86	320	258	304	258
query87	4375	4222	4226	4222
query88	3476	2351	2328	2328
query89	411	299	308	299
query90	1938	185	188	185
query91	185	155	150	150
query92	61	51	53	51
query93	1104	565	550	550
query94	680	312	294	294
query95	357	255	255	255
query96	602	269	280	269
query97	3296	3104	3116	3104
query98	227	192	195	192
query99	1510	1341	1282	1282
Total cold run time: 296692 ms
Total hot run time: 190200 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.88 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6408ff5be2b54eb0ee50a3b11ccc73d18168e15d, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.11	0.10
query5	0.51	0.53	0.51
query6	1.14	0.72	0.73
query7	0.02	0.04	0.02
query8	0.03	0.03	0.03
query9	0.56	0.50	0.49
query10	0.55	0.56	0.56
query11	0.14	0.10	0.11
query12	0.13	0.11	0.11
query13	0.61	0.60	0.59
query14	0.78	0.78	0.80
query15	0.85	0.82	0.83
query16	0.38	0.40	0.39
query17	1.00	1.01	0.99
query18	0.23	0.22	0.22
query19	1.99	1.78	1.81
query20	0.01	0.01	0.02
query21	15.40	0.59	0.60
query22	2.09	2.22	1.84
query23	17.10	0.83	0.82
query24	2.62	0.97	0.71
query25	0.19	0.12	0.11
query26	0.38	0.14	0.13
query27	0.05	0.04	0.05
query28	11.17	0.50	0.45
query29	12.56	3.23	3.21
query30	0.25	0.06	0.06
query31	2.86	0.39	0.38
query32	3.24	0.47	0.46
query33	2.95	2.96	3.03
query34	17.14	4.50	4.52
query35	4.50	4.56	4.49
query36	0.67	0.48	0.49
query37	0.08	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.02
query40	0.15	0.12	0.12
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.51 s
Total hot run time: 29.88 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2025

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2025

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 9110c77 into apache:branch-3.0 Jul 8, 2025
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants