Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](move-memtable) multi replica tables should tolerate minority failures #38003

Merged
merged 6 commits into from
Aug 8, 2024

Conversation

kaijchen
Copy link
Contributor

Proposed changes

Load job for multi replica tables shouldn't fail immediately on any single replica errors.
Errors should be recorded and reported for individual replica of tablets, and checked on commit info.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaijchen kaijchen marked this pull request as draft July 17, 2024 09:12
@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39987 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a6aba5b74dc6576031f6d8c320237b98632df786, data reload: false

------ Round 1 ----------------------------------
q1	17607	4504	4319	4319
q2	2019	187	185	185
q3	10462	1260	1140	1140
q4	10200	864	813	813
q5	7556	2731	2640	2640
q6	223	144	139	139
q7	983	603	585	585
q8	9217	2062	2111	2062
q9	8839	6579	6583	6579
q10	8827	3822	3787	3787
q11	455	238	241	238
q12	400	220	225	220
q13	17855	2956	2951	2951
q14	269	238	230	230
q15	525	489	495	489
q16	503	385	376	376
q17	973	702	627	627
q18	8127	7413	7437	7413
q19	6327	1406	1454	1406
q20	711	323	332	323
q21	4927	3178	3280	3178
q22	343	291	287	287
Total cold run time: 117348 ms
Total hot run time: 39987 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4514	4331	4321	4321
q2	366	267	257	257
q3	3027	2960	2928	2928
q4	2028	1717	1763	1717
q5	5633	5512	5495	5495
q6	231	135	142	135
q7	2219	1899	1842	1842
q8	3252	3446	3438	3438
q9	8794	8854	8831	8831
q10	4170	3813	3858	3813
q11	618	524	497	497
q12	824	645	671	645
q13	17157	3149	3222	3149
q14	330	298	283	283
q15	537	490	487	487
q16	487	438	455	438
q17	1844	1520	1522	1520
q18	8078	7991	7769	7769
q19	2532	1489	1516	1489
q20	2145	1886	1848	1848
q21	5166	5003	4886	4886
q22	570	513	518	513
Total cold run time: 74522 ms
Total hot run time: 56301 ms

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40164 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9609bb5bef82f8966c059804d71d48e7bd9e43bf, data reload: false

------ Round 1 ----------------------------------
q1	17611	4521	4419	4419
q2	2024	195	186	186
q3	10449	1226	1126	1126
q4	10191	856	949	856
q5	7581	2706	2662	2662
q6	223	135	137	135
q7	964	625	605	605
q8	9209	2085	2114	2085
q9	8832	6562	6610	6562
q10	8726	3835	3801	3801
q11	456	237	245	237
q12	394	222	228	222
q13	17758	3017	3013	3013
q14	273	242	254	242
q15	528	479	478	478
q16	496	382	376	376
q17	987	680	707	680
q18	8213	7451	7375	7375
q19	3418	1458	1309	1309
q20	715	315	339	315
q21	4939	3192	3217	3192
q22	358	288	290	288
Total cold run time: 114345 ms
Total hot run time: 40164 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4357	4275	4231	4231
q2	389	270	265	265
q3	2995	2746	2777	2746
q4	1897	1672	1613	1613
q5	5315	5336	5356	5336
q6	215	132	132	132
q7	2137	1741	1781	1741
q8	3214	3313	3303	3303
q9	8432	8468	8418	8418
q10	3908	3712	3745	3712
q11	587	511	494	494
q12	767	615	608	608
q13	16794	3001	3001	3001
q14	308	282	279	279
q15	517	486	484	484
q16	465	408	441	408
q17	1781	1479	1481	1479
q18	7689	7374	7292	7292
q19	1681	1584	1571	1571
q20	1966	1775	1776	1775
q21	4846	4652	4709	4652
q22	586	481	486	481
Total cold run time: 70846 ms
Total hot run time: 54021 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173918 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9609bb5bef82f8966c059804d71d48e7bd9e43bf, data reload: false

query1	914	383	377	377
query2	6878	1845	1829	1829
query3	6651	215	230	215
query4	28415	17771	17294	17294
query5	4194	484	474	474
query6	278	169	159	159
query7	4595	293	284	284
query8	243	199	186	186
query9	8666	2363	2392	2363
query10	445	282	259	259
query11	10660	10010	9965	9965
query12	132	84	81	81
query13	1624	362	353	353
query14	10235	7674	7045	7045
query15	229	164	167	164
query16	7723	319	321	319
query17	1545	552	534	534
query18	1902	275	269	269
query19	195	147	143	143
query20	88	83	79	79
query21	206	127	125	125
query22	4371	4075	4215	4075
query23	33652	33074	33165	33074
query24	11462	2900	2845	2845
query25	624	356	364	356
query26	1166	149	146	146
query27	2913	263	284	263
query28	7575	1976	1967	1967
query29	879	633	619	619
query30	296	149	148	148
query31	961	801	728	728
query32	103	54	54	54
query33	772	299	292	292
query34	960	481	487	481
query35	692	577	572	572
query36	1093	931	913	913
query37	145	83	77	77
query38	2839	2784	2784	2784
query39	878	835	825	825
query40	198	117	117	117
query41	49	47	45	45
query42	114	99	100	99
query43	498	497	474	474
query44	1184	718	719	718
query45	191	162	162	162
query46	1088	705	707	705
query47	1862	1758	1810	1758
query48	360	287	288	287
query49	1079	441	423	423
query50	789	397	397	397
query51	6916	6830	6823	6823
query52	101	92	96	92
query53	362	292	294	292
query54	928	449	454	449
query55	78	75	77	75
query56	283	266	275	266
query57	1132	1077	1039	1039
query58	257	238	261	238
query59	2847	2785	2906	2785
query60	310	277	285	277
query61	96	93	114	93
query62	879	649	654	649
query63	335	298	287	287
query64	9570	2202	7473	2202
query65	3151	3156	3112	3112
query66	825	339	332	332
query67	15417	14817	14887	14817
query68	4636	536	546	536
query69	487	338	331	331
query70	1105	1174	1178	1174
query71	416	286	281	281
query72	7322	5934	5947	5934
query73	728	330	328	328
query74	6112	5618	5648	5618
query75	3518	2714	2694	2694
query76	2579	1029	995	995
query77	482	320	320	320
query78	10288	9626	10388	9626
query79	3103	527	522	522
query80	2240	497	499	497
query81	597	224	223	223
query82	716	143	137	137
query83	293	173	177	173
query84	286	95	98	95
query85	2108	369	355	355
query86	488	326	317	317
query87	3294	3158	3112	3112
query88	3739	2370	2380	2370
query89	477	393	389	389
query90	1915	204	199	199
query91	146	112	111	111
query92	65	52	51	51
query93	3353	520	509	509
query94	1286	222	225	222
query95	411	333	462	333
query96	649	278	269	269
query97	3211	3019	3073	3019
query98	222	202	190	190
query99	1535	1289	1239	1239
Total cold run time: 285905 ms
Total hot run time: 173918 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9609bb5bef82f8966c059804d71d48e7bd9e43bf, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.22	0.04	0.05
query4	1.68	0.09	0.09
query5	0.49	0.48	0.48
query6	1.14	0.73	0.73
query7	0.02	0.01	0.02
query8	0.05	0.05	0.05
query9	0.54	0.48	0.48
query10	0.52	0.52	0.54
query11	0.16	0.11	0.11
query12	0.15	0.12	0.12
query13	0.59	0.58	0.58
query14	0.75	0.78	0.82
query15	0.85	0.83	0.82
query16	0.36	0.37	0.36
query17	0.97	1.04	0.99
query18	0.22	0.21	0.22
query19	1.80	1.71	1.73
query20	0.01	0.01	0.01
query21	15.45	0.80	0.67
query22	4.19	7.47	1.91
query23	18.27	1.39	1.24
query24	2.04	0.23	0.23
query25	0.16	0.10	0.08
query26	0.29	0.21	0.22
query27	0.47	0.22	0.23
query28	13.77	1.03	1.01
query29	12.62	3.24	3.30
query30	0.25	0.06	0.06
query31	2.86	0.38	0.39
query32	3.28	0.48	0.47
query33	2.90	2.87	2.86
query34	17.21	4.38	4.36
query35	4.48	4.43	4.43
query36	0.65	0.47	0.48
query37	0.19	0.16	0.15
query38	0.15	0.15	0.14
query39	0.04	0.04	0.03
query40	0.15	0.12	0.12
query41	0.09	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.04	0.03
Total cold run time: 110.24 s
Total hot run time: 30.55 s

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39787 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 693c8b7bd34c04b3e1db3b68606134be8cba73df, data reload: false

------ Round 1 ----------------------------------
q1	17630	4340	4345	4340
q2	2011	192	197	192
q3	10445	1271	1118	1118
q4	10187	793	760	760
q5	7535	2702	2647	2647
q6	225	137	134	134
q7	956	603	599	599
q8	9242	2065	2086	2065
q9	8910	6560	6551	6551
q10	8736	3750	3762	3750
q11	453	244	257	244
q12	435	220	225	220
q13	17916	2980	2999	2980
q14	276	234	248	234
q15	523	496	495	495
q16	506	382	374	374
q17	950	734	764	734
q18	7962	7546	7313	7313
q19	5593	1366	1395	1366
q20	645	316	324	316
q21	5036	3072	3188	3072
q22	347	283	284	283
Total cold run time: 116519 ms
Total hot run time: 39787 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4531	4229	4195	4195
q2	374	261	258	258
q3	3008	2784	2882	2784
q4	1991	1736	1698	1698
q5	5711	5537	5527	5527
q6	220	133	132	132
q7	2196	1865	1864	1864
q8	3249	3412	3399	3399
q9	8737	8849	8941	8849
q10	4033	3904	3751	3751
q11	572	503	508	503
q12	830	667	642	642
q13	17256	3182	3142	3142
q14	306	290	288	288
q15	528	496	484	484
q16	507	443	433	433
q17	1853	1561	1508	1508
q18	8227	7911	7813	7813
q19	1784	1652	1633	1633
q20	2899	1887	1856	1856
q21	5008	5003	4785	4785
q22	591	505	494	494
Total cold run time: 74411 ms
Total hot run time: 56038 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173924 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 693c8b7bd34c04b3e1db3b68606134be8cba73df, data reload: false

query1	915	383	359	359
query2	6470	1910	1778	1778
query3	6646	205	217	205
query4	28461	17505	17424	17424
query5	3715	483	497	483
query6	270	201	161	161
query7	4585	292	296	292
query8	253	203	220	203
query9	8426	2497	2461	2461
query10	433	279	267	267
query11	10608	10058	9983	9983
query12	118	87	82	82
query13	1622	372	364	364
query14	9610	7556	7703	7556
query15	208	164	164	164
query16	7185	324	318	318
query17	1348	562	546	546
query18	1868	279	268	268
query19	198	147	143	143
query20	89	81	82	81
query21	210	126	129	126
query22	4425	4052	3921	3921
query23	33948	33872	33713	33713
query24	11305	2969	2944	2944
query25	635	411	389	389
query26	1071	154	153	153
query27	2533	288	281	281
query28	6901	2098	2101	2098
query29	897	626	636	626
query30	258	160	159	159
query31	993	747	759	747
query32	101	55	80	55
query33	752	310	305	305
query34	976	503	523	503
query35	707	572	580	572
query36	1153	981	966	966
query37	161	88	87	87
query38	3008	2844	2832	2832
query39	898	851	864	851
query40	219	125	128	125
query41	46	46	45	45
query42	109	102	99	99
query43	498	475	469	469
query44	1215	742	730	730
query45	196	159	163	159
query46	1096	718	748	718
query47	1863	1757	1753	1753
query48	371	304	302	302
query49	865	427	430	427
query50	778	405	394	394
query51	6921	6831	6797	6797
query52	109	95	100	95
query53	371	306	301	301
query54	898	467	463	463
query55	78	75	76	75
query56	320	304	287	287
query57	1165	1078	1042	1042
query58	266	258	270	258
query59	2789	2622	2546	2546
query60	324	307	308	307
query61	120	111	111	111
query62	825	637	657	637
query63	325	303	299	299
query64	9764	2307	1754	1754
query65	3169	3120	3156	3120
query66	834	338	338	338
query67	15592	14888	15111	14888
query68	6191	566	558	558
query69	767	510	464	464
query70	1250	1135	1163	1135
query71	493	281	287	281
query72	9096	5315	5392	5315
query73	793	329	327	327
query74	6041	5669	5692	5669
query75	4023	2684	2693	2684
query76	4194	893	917	893
query77	702	324	306	306
query78	11901	10793	8993	8993
query79	12088	532	525	525
query80	928	484	516	484
query81	555	222	223	222
query82	569	138	138	138
query83	197	163	165	163
query84	280	88	86	86
query85	756	324	289	289
query86	462	310	329	310
query87	3414	3143	3147	3143
query88	5922	2455	2465	2455
query89	499	384	375	375
query90	2037	196	193	193
query91	132	111	98	98
query92	71	49	51	49
query93	4682	520	520	520
query94	1234	214	216	214
query95	417	321	320	320
query96	612	270	279	270
query97	3173	2987	3008	2987
query98	217	192	192	192
query99	1477	1264	1261	1261
Total cold run time: 300259 ms
Total hot run time: 173924 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 693c8b7bd34c04b3e1db3b68606134be8cba73df, data reload: false

query1	0.03	0.03	0.04
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.08
query5	0.50	0.48	0.50
query6	1.15	0.73	0.72
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.54	0.50	0.48
query10	0.55	0.55	0.53
query11	0.16	0.11	0.12
query12	0.15	0.12	0.12
query13	0.60	0.58	0.57
query14	0.76	0.78	0.78
query15	0.84	0.81	0.80
query16	0.38	0.37	0.35
query17	1.03	0.98	0.94
query18	0.22	0.21	0.22
query19	1.78	1.65	1.64
query20	0.01	0.01	0.01
query21	15.45	0.73	0.65
query22	4.87	6.41	2.24
query23	18.26	1.41	1.23
query24	2.09	0.22	0.25
query25	0.16	0.08	0.08
query26	0.29	0.20	0.20
query27	0.45	0.23	0.23
query28	13.28	1.01	0.98
query29	12.58	3.29	3.24
query30	0.25	0.06	0.05
query31	2.88	0.38	0.38
query32	3.29	0.48	0.47
query33	2.86	3.00	2.88
query34	17.01	4.39	4.36
query35	4.42	4.38	4.45
query36	0.66	0.47	0.47
query37	0.18	0.15	0.15
query38	0.16	0.15	0.15
query39	0.04	0.04	0.04
query40	0.15	0.11	0.12
query41	0.09	0.06	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 110.28 s
Total hot run time: 30.63 s

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39944 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 762616a58485867c7f72aab124e12a5efdee49ee, data reload: false

------ Round 1 ----------------------------------
q1	17675	4312	4231	4231
q2	2013	195	196	195
q3	10509	1160	1065	1065
q4	10194	883	820	820
q5	7533	2693	2639	2639
q6	222	145	138	138
q7	943	608	618	608
q8	9228	2065	2086	2065
q9	8752	6557	6543	6543
q10	8755	3724	3803	3724
q11	457	249	261	249
q12	466	229	230	229
q13	17776	3021	3026	3021
q14	280	240	230	230
q15	523	491	495	491
q16	499	385	391	385
q17	969	731	733	731
q18	8090	7454	7407	7407
q19	7694	1431	1379	1379
q20	661	333	330	330
q21	4968	3242	3181	3181
q22	347	283	283	283
Total cold run time: 118554 ms
Total hot run time: 39944 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4369	4226	4207	4207
q2	375	268	270	268
q3	3025	2985	2938	2938
q4	1979	1700	1784	1700
q5	5593	5602	5509	5509
q6	222	137	133	133
q7	2269	1871	1829	1829
q8	3252	3536	3404	3404
q9	8750	8844	8835	8835
q10	4138	3768	3798	3768
q11	609	505	506	505
q12	818	663	642	642
q13	17003	3181	3197	3181
q14	317	275	291	275
q15	519	492	494	492
q16	510	466	447	447
q17	1821	1524	1507	1507
q18	8004	8174	7812	7812
q19	1765	1448	1497	1448
q20	2622	1866	1870	1866
q21	8352	4873	4745	4745
q22	586	529	537	529
Total cold run time: 76898 ms
Total hot run time: 56040 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173034 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 762616a58485867c7f72aab124e12a5efdee49ee, data reload: false

query1	920	372	363	363
query2	6454	1931	1889	1889
query3	6635	207	221	207
query4	28332	17513	17397	17397
query5	3643	483	483	483
query6	271	179	163	163
query7	4593	290	287	287
query8	237	203	193	193
query9	8544	2474	2454	2454
query10	443	311	263	263
query11	11706	9925	10117	9925
query12	112	84	85	84
query13	1655	373	368	368
query14	10150	8077	7715	7715
query15	221	160	165	160
query16	7593	315	307	307
query17	1634	543	521	521
query18	1909	276	268	268
query19	186	148	155	148
query20	88	83	82	82
query21	203	129	126	126
query22	4349	4084	4034	4034
query23	34191	33689	33557	33557
query24	11174	2875	2955	2875
query25	624	414	410	410
query26	706	148	149	148
query27	2293	275	285	275
query28	6337	2116	2104	2104
query29	892	661	634	634
query30	256	151	150	150
query31	983	774	794	774
query32	93	51	52	51
query33	699	293	299	293
query34	897	489	519	489
query35	688	594	599	594
query36	1137	965	956	956
query37	150	80	83	80
query38	2962	2811	2815	2811
query39	924	788	826	788
query40	204	117	117	117
query41	45	45	42	42
query42	115	102	108	102
query43	518	475	475	475
query44	1222	739	734	734
query45	190	161	163	161
query46	1086	745	715	715
query47	1844	1767	1767	1767
query48	364	299	298	298
query49	828	404	411	404
query50	778	391	398	391
query51	6769	6777	6723	6723
query52	110	91	97	91
query53	363	288	285	285
query54	859	458	449	449
query55	74	73	73	73
query56	281	269	271	269
query57	1125	1062	1061	1061
query58	249	235	264	235
query59	2840	2504	2596	2504
query60	335	281	273	273
query61	100	95	94	94
query62	785	660	648	648
query63	317	293	286	286
query64	9115	2202	1627	1627
query65	3156	3087	3133	3087
query66	704	334	322	322
query67	15551	15087	15061	15061
query68	4498	544	549	544
query69	526	459	353	353
query70	1148	1126	1139	1126
query71	400	286	283	283
query72	6800	5748	5004	5004
query73	742	368	326	326
query74	6011	5702	5695	5695
query75	3402	2710	2678	2678
query76	2349	947	876	876
query77	455	306	310	306
query78	9529	8957	8873	8873
query79	2329	535	520	520
query80	2135	494	471	471
query81	606	222	221	221
query82	825	129	137	129
query83	293	172	173	172
query84	263	98	88	88
query85	1931	370	307	307
query86	475	325	323	323
query87	3213	3084	3121	3084
query88	3818	2379	2397	2379
query89	474	382	378	378
query90	1779	197	195	195
query91	131	100	102	100
query92	60	52	51	51
query93	2649	521	508	508
query94	1108	219	216	216
query95	404	323	330	323
query96	610	281	272	272
query97	3197	3063	3064	3063
query98	230	196	198	196
query99	1494	1284	1254	1254
Total cold run time: 278695 ms
Total hot run time: 173034 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 762616a58485867c7f72aab124e12a5efdee49ee, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.04	0.03
query3	0.23	0.05	0.05
query4	1.68	0.07	0.07
query5	0.51	0.50	0.49
query6	1.13	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.56	0.49	0.48
query10	0.54	0.55	0.54
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.59	0.58	0.58
query14	0.77	0.79	0.78
query15	0.87	0.84	0.84
query16	0.38	0.37	0.38
query17	1.09	1.09	1.09
query18	0.23	0.23	0.23
query19	1.95	1.88	1.80
query20	0.01	0.01	0.01
query21	15.43	0.75	0.65
query22	3.67	7.44	1.78
query23	18.29	1.48	1.27
query24	1.89	0.27	0.22
query25	0.18	0.09	0.08
query26	0.28	0.20	0.20
query27	0.46	0.24	0.23
query28	13.30	1.01	1.00
query29	12.64	3.32	3.34
query30	0.27	0.06	0.06
query31	2.89	0.40	0.38
query32	3.27	0.48	0.47
query33	2.91	2.90	2.90
query34	17.31	4.36	4.33
query35	4.40	4.37	4.43
query36	0.66	0.48	0.48
query37	0.18	0.15	0.16
query38	0.15	0.14	0.15
query39	0.04	0.04	0.03
query40	0.16	0.12	0.12
query41	0.09	0.05	0.04
query42	0.06	0.05	0.04
query43	0.04	0.05	0.04
Total cold run time: 109.61 s
Total hot run time: 30.69 s

@kaijchen
Copy link
Contributor Author

run buildall

@kaijchen kaijchen marked this pull request as ready for review July 31, 2024 08:54
@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41739 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b422bdaebaaf94a8a0cb404bad18b5f333e6a7e1, data reload: false

------ Round 1 ----------------------------------
q1	17604	4152	4094	4094
q2	2033	199	199	199
q3	10451	1293	1348	1293
q4	10172	833	934	833
q5	7586	2974	2990	2974
q6	221	140	143	140
q7	1055	616	627	616
q8	9451	1936	1939	1936
q9	8563	6601	6590	6590
q10	8763	3824	3827	3824
q11	433	250	256	250
q12	410	231	235	231
q13	17763	2959	2931	2931
q14	268	237	247	237
q15	528	483	491	483
q16	535	391	389	389
q17	976	936	929	929
q18	8107	7366	7236	7236
q19	1389	1227	1224	1224
q20	573	323	337	323
q21	5350	4723	4807	4723
q22	357	284	286	284
Total cold run time: 112588 ms
Total hot run time: 41739 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4112	3977	4065	3977
q2	330	220	224	220
q3	2975	2979	3142	2979
q4	2061	2037	1934	1934
q5	5650	5500	5435	5435
q6	221	129	132	129
q7	2121	1778	1842	1778
q8	3596	3378	3356	3356
q9	8719	8703	8863	8703
q10	3946	4045	3909	3909
q11	568	469	470	469
q12	813	623	578	578
q13	15578	3114	3141	3114
q14	296	281	260	260
q15	535	498	473	473
q16	479	408	424	408
q17	1781	1749	1727	1727
q18	8281	7708	7785	7708
q19	1745	1724	1737	1724
q20	2071	1835	1831	1831
q21	5666	5425	5496	5425
q22	529	483	475	475
Total cold run time: 72073 ms
Total hot run time: 56612 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169763 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b422bdaebaaf94a8a0cb404bad18b5f333e6a7e1, data reload: false

query1	909	385	372	372
query2	6464	1772	1731	1731
query3	6660	215	222	215
query4	19329	17560	17348	17348
query5	3626	510	539	510
query6	275	175	161	161
query7	4587	293	300	293
query8	248	200	200	200
query9	8490	2339	2341	2339
query10	418	268	266	266
query11	10536	9976	10086	9976
query12	120	97	90	90
query13	1640	380	380	380
query14	9970	7008	7064	7008
query15	204	163	171	163
query16	7029	443	445	443
query17	971	593	558	558
query18	1923	290	285	285
query19	196	153	147	147
query20	93	85	88	85
query21	204	103	96	96
query22	4232	3995	4135	3995
query23	33887	33676	33593	33593
query24	10038	3157	3123	3123
query25	701	421	431	421
query26	1660	154	155	154
query27	2897	283	297	283
query28	7388	2010	2013	2010
query29	1357	433	439	433
query30	236	155	154	154
query31	944	766	802	766
query32	102	68	56	56
query33	667	310	329	310
query34	925	500	524	500
query35	881	768	790	768
query36	1026	910	878	878
query37	290	88	87	87
query38	3009	2825	2872	2825
query39	914	817	810	810
query40	284	116	112	112
query41	48	44	44	44
query42	122	104	99	99
query43	487	441	420	420
query44	1186	721	741	721
query45	208	175	179	175
query46	1085	828	791	791
query47	1790	1701	1706	1701
query48	365	288	290	288
query49	993	417	426	417
query50	894	438	438	438
query51	6819	6704	6711	6704
query52	109	90	95	90
query53	269	186	181	181
query54	594	474	451	451
query55	74	72	78	72
query56	266	264	267	264
query57	1138	1024	1037	1024
query58	272	262	264	262
query59	2568	2388	2495	2388
query60	304	271	278	271
query61	94	93	93	93
query62	880	663	646	646
query63	224	184	181	181
query64	5576	1936	1868	1868
query65	3169	3128	3110	3110
query66	1306	339	327	327
query67	15256	14669	15024	14669
query68	4333	575	572	572
query69	489	303	299	299
query70	1157	1050	1095	1050
query71	420	289	283	283
query72	7069	2668	2484	2484
query73	765	330	333	330
query74	6020	5723	5594	5594
query75	3401	2696	2745	2696
query76	2689	1214	1275	1214
query77	464	312	314	312
query78	9532	9030	8925	8925
query79	2790	540	549	540
query80	1467	516	511	511
query81	559	224	221	221
query82	836	132	131	131
query83	263	171	175	171
query84	279	81	81	81
query85	1588	347	351	347
query86	509	286	287	286
query87	3217	3112	3087	3087
query88	3867	2385	2395	2385
query89	393	288	296	288
query90	1865	200	196	196
query91	128	100	99	99
query92	66	52	51	51
query93	2208	608	611	608
query94	825	300	295	295
query95	380	269	274	269
query96	612	280	278	278
query97	3252	3056	3086	3056
query98	226	201	196	196
query99	1702	1269	1309	1269
Total cold run time: 264797 ms
Total hot run time: 169763 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.9 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b422bdaebaaf94a8a0cb404bad18b5f333e6a7e1, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.03
query3	0.22	0.05	0.06
query4	1.68	0.08	0.07
query5	0.51	0.48	0.48
query6	1.15	0.72	0.71
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.58	0.51	0.52
query10	0.57	0.58	0.56
query11	0.15	0.12	0.12
query12	0.15	0.12	0.13
query13	0.62	0.61	0.60
query14	0.78	0.78	0.81
query15	0.90	0.87	0.86
query16	0.36	0.36	0.35
query17	0.97	0.99	1.02
query18	0.22	0.21	0.20
query19	1.81	1.80	1.75
query20	0.01	0.01	0.00
query21	15.43	0.73	0.65
query22	4.06	7.52	1.17
query23	18.01	1.29	1.36
query24	2.20	0.23	0.22
query25	0.18	0.08	0.08
query26	0.31	0.21	0.21
query27	0.46	0.23	0.24
query28	13.17	1.00	0.97
query29	12.52	3.31	3.34
query30	0.26	0.05	0.05
query31	2.86	0.41	0.41
query32	3.24	0.49	0.49
query33	2.93	2.98	2.93
query34	15.44	4.26	4.26
query35	4.31	4.32	4.30
query36	0.67	0.48	0.49
query37	0.19	0.16	0.16
query38	0.18	0.15	0.14
query39	0.04	0.03	0.03
query40	0.16	0.13	0.12
query41	0.10	0.05	0.05
query42	0.05	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 107.68 s
Total hot run time: 29.9 s

@kaijchen kaijchen requested a review from liaoxin01 August 7, 2024 09:20
Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Aug 7, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Aug 7, 2024
Copy link
Contributor

github-actions bot commented Aug 7, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@sollhui sollhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit 0a647ed into apache:master Aug 8, 2024
29 of 31 checks passed
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
…ilures (#38003)

Load job for multi replica tables shouldn't fail immediately on any
single replica errors.
Errors should be recorded and reported for individual replica of
tablets, and checked on commit info.
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Aug 14, 2024
…ilures (apache#38003)

Load job for multi replica tables shouldn't fail immediately on any
single replica errors.
Errors should be recorded and reported for individual replica of
tablets, and checked on commit info.
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
…ilures (#38003)

Load job for multi replica tables shouldn't fail immediately on any
single replica errors.
Errors should be recorded and reported for individual replica of
tablets, and checked on commit info.
kaijchen added a commit to kaijchen/doris that referenced this pull request Sep 6, 2024
…ilures (apache#38003)

Load job for multi replica tables shouldn't fail immediately on any
single replica errors.
Errors should be recorded and reported for individual replica of
tablets, and checked on commit info.
yiguolei pushed a commit that referenced this pull request Sep 9, 2024
@yiguolei yiguolei mentioned this pull request Nov 6, 2024
liaoxin01 pushed a commit that referenced this pull request Nov 21, 2024
…44344)


Problem Summary:

#38003 introduced a problem where the last sink node could report
success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica
failure in this step.
However, it turns out the last sink node could miss tablet reports from
downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately.
The most common error in close wait is timeout, and it should not be
fault tolerant on a replica basis anyways.
github-actions bot pushed a commit that referenced this pull request Nov 21, 2024
…44344)


Problem Summary:

#38003 introduced a problem where the last sink node could report
success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica
failure in this step.
However, it turns out the last sink node could miss tablet reports from
downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately.
The most common error in close wait is timeout, and it should not be
fault tolerant on a replica basis anyways.
github-actions bot pushed a commit that referenced this pull request Nov 21, 2024
…44344)


Problem Summary:

#38003 introduced a problem where the last sink node could report
success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica
failure in this step.
However, it turns out the last sink node could miss tablet reports from
downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately.
The most common error in close wait is timeout, and it should not be
fault tolerant on a replica basis anyways.
csun5285 pushed a commit to csun5285/doris that referenced this pull request Nov 22, 2024
…pache#44344)


Problem Summary:

apache#38003 introduced a problem where the last sink node could report
success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica
failure in this step.
However, it turns out the last sink node could miss tablet reports from
downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately.
The most common error in close wait is timeout, and it should not be
fault tolerant on a replica basis anyways.
morningman pushed a commit that referenced this pull request Nov 25, 2024
…44344)


Problem Summary:

#38003 introduced a problem where the last sink node could report
success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica
failure in this step.
However, it turns out the last sink node could miss tablet reports from
downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately.
The most common error in close wait is timeout, and it should not be
fault tolerant on a replica basis anyways.
yiguolei pushed a commit that referenced this pull request Nov 25, 2024
…44344)


Problem Summary:

#38003 introduced a problem where the last sink node could report
success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica
failure in this step.
However, it turns out the last sink node could miss tablet reports from
downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately.
The most common error in close wait is timeout, and it should not be
fault tolerant on a replica basis anyways.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.2-merged doing reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants