Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #50587

…eta to avoid jitter (#50587)

### What problem does this PR solve?

Introduce black list of backend when load job fetch meta to avoid
jitter:

1. Fetching meta operation would select one node randomly, If one node
abnormal continuously, fetching meta operation will timeout and cause
load speed jitter.

2. When will one backend added to the blacklist:

- Fetch meta RPC failed.
- Retry to other backend success.

3. When will one backend removed to the blacklist:

- Two minutes automatic expiration.

Other improvement of fetching meta retry: will not choose be failed in
the same request.
@github-actions github-actions bot requested a review from dataroaring as a code owner May 19, 2025 11:38
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring
Copy link
Contributor

run buildall

@dataroaring dataroaring reopened this May 20, 2025
@github-actions
Copy link
Contributor Author

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels May 20, 2025
@github-actions
Copy link
Contributor Author

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 39596 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit df0e0e41a9c9e48281aa4fe83a86ee5d9158998d, data reload: false

------ Round 1 ----------------------------------
q1	17596	6759	6567	6567
q2	2059	164	167	164
q3	10663	1083	1154	1083
q4	10521	758	775	758
q5	7736	2825	2744	2744
q6	212	134	130	130
q7	967	605	590	590
q8	9357	1942	2011	1942
q9	6557	6365	6367	6365
q10	6980	2256	2235	2235
q11	472	267	252	252
q12	406	213	202	202
q13	17778	2934	2948	2934
q14	229	201	209	201
q15	495	465	463	463
q16	648	594	572	572
q17	965	583	505	505
q18	7187	6484	6701	6484
q19	1389	1039	1070	1039
q20	489	205	213	205
q21	3968	3274	3204	3204
q22	1136	1004	957	957
Total cold run time: 107810 ms
Total hot run time: 39596 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6589	6568	6526	6526
q2	320	233	227	227
q3	2859	2731	2771	2731
q4	2002	1752	1740	1740
q5	5764	5691	5759	5691
q6	210	132	127	127
q7	2217	1836	1791	1791
q8	3348	3555	3511	3511
q9	8898	8742	8874	8742
q10	3531	3500	3442	3442
q11	610	490	494	490
q12	811	606	636	606
q13	10013	3187	3121	3121
q14	300	286	277	277
q15	505	473	467	467
q16	712	671	642	642
q17	1826	1637	1600	1600
q18	8142	7673	7601	7601
q19	1656	1547	1579	1547
q20	2040	1828	1876	1828
q21	5472	5272	5383	5272
q22	1121	1025	1041	1025
Total cold run time: 68946 ms
Total hot run time: 59004 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 41.11% (10857/26412)
Line Coverage 31.89% (92640/290494)
Region Coverage 30.97% (47800/154361)
Branch Coverage 27.44% (24476/89212)

@doris-robot
Copy link

TPC-DS: Total hot run time: 196574 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit df0e0e41a9c9e48281aa4fe83a86ee5d9158998d, data reload: false

query1	1324	882	867	867
query2	6235	1999	1997	1997
query3	10855	4373	4220	4220
query4	61265	29485	23399	23399
query5	5203	463	443	443
query6	380	173	183	173
query7	5416	317	307	307
query8	294	230	212	212
query9	8445	2609	2604	2604
query10	487	276	266	266
query11	17304	15167	15732	15167
query12	155	103	107	103
query13	1402	443	436	436
query14	10485	7161	7145	7145
query15	207	180	175	175
query16	7072	484	512	484
query17	1179	557	563	557
query18	1817	314	308	308
query19	201	148	156	148
query20	113	110	107	107
query21	204	101	100	100
query22	4603	4480	4667	4480
query23	34604	33655	33799	33655
query24	6095	2884	2883	2883
query25	492	402	425	402
query26	648	175	167	167
query27	1882	360	363	360
query28	4117	2473	2448	2448
query29	709	471	446	446
query30	245	160	165	160
query31	1020	832	821	821
query32	72	53	59	53
query33	410	304	290	290
query34	940	511	519	511
query35	859	732	731	731
query36	1096	979	977	977
query37	122	77	71	71
query38	4159	3976	4033	3976
query39	1545	1477	1517	1477
query40	211	111	104	104
query41	52	59	50	50
query42	123	107	103	103
query43	519	485	492	485
query44	1208	816	819	816
query45	184	171	167	167
query46	1158	729	735	729
query47	2045	1937	1967	1937
query48	485	382	394	382
query49	750	413	428	413
query50	849	427	426	426
query51	7288	7345	7306	7306
query52	105	96	93	93
query53	270	189	190	189
query54	592	485	461	461
query55	80	84	82	82
query56	268	253	287	253
query57	1258	1175	1157	1157
query58	243	216	212	212
query59	3204	3080	3071	3071
query60	300	249	261	249
query61	104	111	159	111
query62	794	688	673	673
query63	213	188	191	188
query64	1449	686	660	660
query65	3256	3184	3167	3167
query66	712	287	300	287
query67	15761	15553	15449	15449
query68	4073	584	575	575
query69	424	264	266	264
query70	1175	1094	1106	1094
query71	351	255	247	247
query72	6366	4061	4087	4061
query73	747	346	345	345
query74	10570	8842	8943	8842
query75	3368	2654	2659	2654
query76	1996	1057	1088	1057
query77	482	268	269	268
query78	10729	9641	9493	9493
query79	2201	618	600	600
query80	1376	428	420	420
query81	535	241	231	231
query82	1230	89	84	84
query83	168	140	139	139
query84	284	76	74	74
query85	1015	287	291	287
query86	378	301	292	292
query87	4370	4195	4276	4195
query88	3853	2386	2384	2384
query89	419	285	286	285
query90	1995	180	184	180
query91	187	147	147	147
query92	62	48	50	48
query93	2826	570	565	565
query94	778	291	295	291
query95	350	261	259	259
query96	618	279	283	279
query97	3295	3083	3116	3083
query98	214	206	196	196
query99	1563	1289	1331	1289
Total cold run time: 315140 ms
Total hot run time: 196574 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.02 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit df0e0e41a9c9e48281aa4fe83a86ee5d9158998d, data reload: false

query1	0.03	0.03	0.02
query2	0.06	0.03	0.03
query3	0.23	0.07	0.06
query4	1.62	0.10	0.10
query5	0.51	0.50	0.52
query6	1.14	0.73	0.73
query7	0.03	0.02	0.02
query8	0.03	0.03	0.03
query9	0.57	0.50	0.50
query10	0.55	0.56	0.55
query11	0.14	0.09	0.10
query12	0.14	0.12	0.13
query13	0.61	0.59	0.60
query14	2.72	2.74	2.77
query15	0.88	0.81	0.83
query16	0.38	0.38	0.38
query17	1.03	1.07	0.99
query18	0.24	0.22	0.22
query19	1.97	1.83	2.01
query20	0.02	0.01	0.01
query21	15.35	0.59	0.58
query22	2.72	2.82	2.14
query23	16.99	1.02	0.83
query24	3.04	0.64	1.57
query25	0.32	0.11	0.06
query26	0.33	0.14	0.14
query27	0.05	0.04	0.07
query28	10.46	0.50	0.46
query29	12.54	3.25	3.22
query30	0.24	0.06	0.06
query31	2.85	0.38	0.39
query32	3.30	0.47	0.45
query33	2.96	2.94	2.96
query34	17.01	4.51	4.49
query35	4.50	4.49	4.56
query36	0.64	0.47	0.51
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.16	0.12	0.13
query41	0.07	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.68 s
Total hot run time: 32.02 s

@dataroaring dataroaring merged commit 898fe41 into branch-3.0 May 22, 2025
22 of 24 checks passed
@github-actions github-actions bot deleted the auto-pick-50587-branch-3.0 branch May 22, 2025 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants