Skip to content

Conversation

@BePPPower
Copy link
Contributor

@BePPPower BePPPower commented Apr 21, 2025

What problem does this PR solve?

Issue Number: #50238
Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@BePPPower
Copy link
Contributor Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Apr 21, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 34091 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8569ce0dfe1856590751adb8f9877746d17b7060, data reload: false

------ Round 1 ----------------------------------
q1	26592	5175	5080	5080
q2	2089	286	186	186
q3	10507	1245	705	705
q4	10235	1065	558	558
q5	7795	2417	2366	2366
q6	189	169	132	132
q7	918	754	622	622
q8	9322	1258	1107	1107
q9	6827	5059	5166	5059
q10	6828	2296	1882	1882
q11	497	292	267	267
q12	338	349	228	228
q13	17773	3679	3111	3111
q14	232	223	215	215
q15	525	478	494	478
q16	458	442	396	396
q17	604	868	356	356
q18	7429	7206	7154	7154
q19	1876	971	563	563
q20	341	350	226	226
q21	4043	3399	2434	2434
q22	1037	1018	966	966
Total cold run time: 116455 ms
Total hot run time: 34091 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5140	5097	5051	5051
q2	234	329	237	237
q3	2141	2676	2268	2268
q4	1444	1842	1549	1549
q5	4516	4501	4321	4321
q6	210	171	126	126
q7	1943	1895	1755	1755
q8	2580	2435	2523	2435
q9	7281	7111	7236	7111
q10	2991	3203	2725	2725
q11	583	505	478	478
q12	694	774	568	568
q13	4221	3987	3196	3196
q14	282	298	281	281
q15	511	479	479	479
q16	469	511	474	474
q17	1161	1575	1335	1335
q18	7655	7526	7474	7474
q19	779	762	801	762
q20	1986	2053	1872	1872
q21	5132	4734	4620	4620
q22	1079	1036	1011	1011
Total cold run time: 53032 ms
Total hot run time: 50128 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185889 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8569ce0dfe1856590751adb8f9877746d17b7060, data reload: false

query1	1008	492	481	481
query2	6561	1798	1792	1792
query3	6754	225	216	216
query4	27103	23240	23298	23240
query5	4363	614	484	484
query6	310	211	180	180
query7	4614	513	288	288
query8	286	236	220	220
query9	8627	2557	2558	2557
query10	446	305	269	269
query11	15988	14980	14766	14766
query12	166	108	103	103
query13	1660	521	403	403
query14	8783	6115	6008	6008
query15	212	185	172	172
query16	7143	682	456	456
query17	1214	728	608	608
query18	1974	418	314	314
query19	193	188	170	170
query20	127	122	124	122
query21	210	127	110	110
query22	4166	4209	4114	4114
query23	33919	33035	32970	32970
query24	8553	2357	2438	2357
query25	530	459	379	379
query26	1235	293	150	150
query27	2747	517	325	325
query28	4329	2097	2082	2082
query29	753	580	415	415
query30	279	214	194	194
query31	921	860	802	802
query32	76	66	67	66
query33	625	372	304	304
query34	822	891	507	507
query35	792	824	738	738
query36	980	989	867	867
query37	115	104	79	79
query38	4221	4137	4181	4137
query39	1471	1391	1371	1371
query40	216	122	119	119
query41	58	52	53	52
query42	120	105	102	102
query43	477	493	461	461
query44	1301	793	773	773
query45	175	171	168	168
query46	869	1064	604	604
query47	1771	1767	1734	1734
query48	371	419	301	301
query49	785	509	425	425
query50	694	699	392	392
query51	4154	4190	4147	4147
query52	109	109	99	99
query53	229	264	176	176
query54	577	569	512	512
query55	84	80	85	80
query56	314	301	284	284
query57	1162	1184	1065	1065
query58	267	279	254	254
query59	2550	2673	2647	2647
query60	323	315	297	297
query61	138	130	129	129
query62	800	735	656	656
query63	224	181	195	181
query64	4399	1048	692	692
query65	4304	4212	4215	4212
query66	1158	417	344	344
query67	15665	15584	15440	15440
query68	8222	919	513	513
query69	482	300	250	250
query70	1184	1135	1096	1096
query71	467	324	291	291
query72	5776	4773	5034	4773
query73	712	639	335	335
query74	8947	9243	8964	8964
query75	3863	3245	2716	2716
query76	3742	1307	745	745
query77	783	400	284	284
query78	9981	10118	9275	9275
query79	2928	861	568	568
query80	656	524	442	442
query81	468	255	220	220
query82	448	141	98	98
query83	282	267	241	241
query84	297	105	88	88
query85	787	367	322	322
query86	338	307	275	275
query87	4506	4393	4335	4335
query88	3078	2158	2174	2158
query89	436	311	279	279
query90	1960	217	213	213
query91	138	143	118	118
query92	80	62	60	60
query93	1977	970	585	585
query94	679	446	321	321
query95	382	297	278	278
query96	487	618	275	275
query97	3205	3231	3114	3114
query98	243	218	200	200
query99	1468	1414	1295	1295
Total cold run time: 276572 ms
Total hot run time: 185889 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8569ce0dfe1856590751adb8f9877746d17b7060, data reload: false

query1	0.04	0.04	0.03
query2	0.11	0.10	0.10
query3	0.24	0.20	0.20
query4	1.59	0.20	0.11
query5	0.57	0.54	0.56
query6	1.16	0.72	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.58	0.52	0.52
query10	0.58	0.57	0.56
query11	0.16	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	1.20	1.16	1.17
query15	0.88	0.88	0.85
query16	0.38	0.38	0.38
query17	1.04	1.07	1.05
query18	0.21	0.20	0.20
query19	1.90	1.85	1.83
query20	0.02	0.01	0.02
query21	15.40	0.91	0.56
query22	0.75	1.15	0.64
query23	15.05	1.39	0.65
query24	6.67	1.69	0.86
query25	0.50	0.19	0.06
query26	0.60	0.17	0.13
query27	0.05	0.05	0.04
query28	9.90	0.91	0.46
query29	12.60	3.98	3.30
query30	0.25	0.09	0.06
query31	2.84	0.60	0.39
query32	3.22	0.55	0.48
query33	3.00	3.05	3.11
query34	15.72	5.17	4.53
query35	4.58	4.49	4.51
query36	0.66	0.49	0.47
query37	0.08	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 103.74 s
Total hot run time: 29.62 s

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34061 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a4e8f406343d28473c8c999732cf277ee0902fa, data reload: false

------ Round 1 ----------------------------------
q1	25687	5059	5004	5004
q2	2060	278	183	183
q3	10387	1267	683	683
q4	10217	1015	572	572
q5	7537	2416	2305	2305
q6	187	166	133	133
q7	935	765	607	607
q8	9330	1260	1121	1121
q9	6906	5147	5087	5087
q10	6864	2303	1881	1881
q11	465	281	267	267
q12	352	343	219	219
q13	17768	3697	3116	3116
q14	234	215	205	205
q15	515	486	466	466
q16	444	443	405	405
q17	596	839	360	360
q18	7414	7260	7262	7260
q19	1816	964	575	575
q20	354	350	237	237
q21	4085	3408	2426	2426
q22	1051	1053	949	949
Total cold run time: 115204 ms
Total hot run time: 34061 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5233	5048	5090	5048
q2	247	326	239	239
q3	2134	2616	2272	2272
q4	1421	1797	1343	1343
q5	4494	4408	4415	4408
q6	213	164	129	129
q7	1958	1919	1756	1756
q8	2569	2693	2577	2577
q9	7303	7304	6996	6996
q10	3028	3166	2714	2714
q11	600	515	492	492
q12	682	776	599	599
q13	3570	3904	3275	3275
q14	278	288	263	263
q15	525	478	480	478
q16	468	497	457	457
q17	1141	1547	1391	1391
q18	7678	7632	7397	7397
q19	790	812	894	812
q20	2001	2003	1792	1792
q21	5158	4901	4834	4834
q22	1122	1071	1029	1029
Total cold run time: 52613 ms
Total hot run time: 50301 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192693 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4a4e8f406343d28473c8c999732cf277ee0902fa, data reload: false

query1	1396	1096	1051	1051
query2	6124	1773	1805	1773
query3	11124	4410	4785	4410
query4	53126	24473	23397	23397
query5	5195	555	448	448
query6	344	194	194	194
query7	4920	495	298	298
query8	306	248	226	226
query9	5615	2593	2596	2593
query10	436	323	266	266
query11	15239	15017	14865	14865
query12	163	112	103	103
query13	1029	520	397	397
query14	10222	6318	6357	6318
query15	210	207	187	187
query16	6979	661	528	528
query17	1123	766	611	611
query18	1564	425	322	322
query19	203	196	176	176
query20	139	130	120	120
query21	206	125	107	107
query22	4330	4401	4468	4401
query23	34129	33278	33567	33278
query24	6438	2456	2515	2456
query25	458	465	401	401
query26	634	281	148	148
query27	2059	501	337	337
query28	2981	2131	2120	2120
query29	574	582	423	423
query30	276	217	195	195
query31	865	856	780	780
query32	74	66	63	63
query33	456	357	302	302
query34	776	862	523	523
query35	828	866	814	814
query36	985	997	894	894
query37	114	103	73	73
query38	4284	4271	4296	4271
query39	1518	1423	1447	1423
query40	217	122	105	105
query41	52	50	49	49
query42	120	121	114	114
query43	511	510	489	489
query44	1383	799	810	799
query45	183	176	170	170
query46	838	1025	662	662
query47	1852	1851	1806	1806
query48	385	412	309	309
query49	714	490	416	416
query50	671	698	404	404
query51	4214	4279	4181	4181
query52	108	107	104	104
query53	231	266	195	195
query54	603	589	518	518
query55	87	90	83	83
query56	320	307	289	289
query57	1165	1202	1126	1126
query58	268	259	275	259
query59	2732	2816	2745	2745
query60	339	337	317	317
query61	129	126	128	126
query62	747	736	680	680
query63	232	196	197	196
query64	1391	1037	693	693
query65	4433	4232	4221	4221
query66	734	397	299	299
query67	15864	15372	15361	15361
query68	7882	915	511	511
query69	530	310	264	264
query70	1237	1138	1124	1124
query71	498	317	283	283
query72	5756	4888	4899	4888
query73	1185	665	348	348
query74	9195	9272	8663	8663
query75	4018	3194	2692	2692
query76	4247	1194	768	768
query77	651	371	290	290
query78	10021	10292	9206	9206
query79	1269	817	578	578
query80	596	501	434	434
query81	475	269	220	220
query82	310	134	98	98
query83	258	243	229	229
query84	295	101	87	87
query85	827	360	313	313
query86	331	310	277	277
query87	4448	4488	4402	4402
query88	2801	2190	2175	2175
query89	397	316	285	285
query90	2073	217	219	217
query91	140	140	111	111
query92	74	58	54	54
query93	1104	970	595	595
query94	693	428	302	302
query95	368	299	279	279
query96	489	583	272	272
query97	3168	3275	3146	3146
query98	228	216	213	213
query99	1406	1404	1361	1361
Total cold run time: 294935 ms
Total hot run time: 192693 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.12 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4a4e8f406343d28473c8c999732cf277ee0902fa, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.11
query3	0.26	0.19	0.19
query4	1.60	0.18	0.11
query5	0.57	0.55	0.54
query6	1.17	0.71	0.72
query7	0.03	0.01	0.01
query8	0.04	0.04	0.04
query9	0.58	0.52	0.51
query10	0.56	0.57	0.58
query11	0.16	0.11	0.11
query12	0.14	0.11	0.12
query13	0.61	0.60	0.59
query14	1.14	1.17	1.16
query15	0.88	0.84	0.85
query16	0.39	0.38	0.38
query17	1.06	1.06	1.05
query18	0.21	0.20	0.20
query19	1.94	1.81	1.82
query20	0.02	0.02	0.01
query21	15.39	0.94	0.55
query22	0.78	1.28	0.76
query23	14.76	1.40	0.61
query24	6.96	1.53	1.16
query25	0.48	0.19	0.15
query26	0.58	0.16	0.15
query27	0.06	0.05	0.05
query28	10.31	0.89	0.45
query29	12.65	4.02	3.33
query30	0.25	0.09	0.06
query31	2.84	0.59	0.39
query32	3.23	0.56	0.47
query33	3.14	3.11	3.07
query34	15.89	5.19	4.48
query35	4.61	4.56	4.56
query36	0.66	0.50	0.49
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.03
query43	0.04	0.03	0.02
Total cold run time: 104.6 s
Total hot run time: 30.12 s

@morningman morningman changed the title [Fix](file-format) add file format configurator [feat](refactor-param) add file format configuration Apr 21, 2025
@BePPPower BePPPower force-pushed the addFileFormatConfigurator branch from 4a4e8f4 to 00c292e Compare April 23, 2025 07:06
@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33719 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 00c292e4c88bc8c59ca818194a31f5148ef8107a, data reload: false

------ Round 1 ----------------------------------
q1	25920	5027	5039	5027
q2	2072	269	183	183
q3	10415	1264	679	679
q4	10224	1007	548	548
q5	7536	2411	2299	2299
q6	180	161	134	134
q7	925	742	609	609
q8	9315	1255	1101	1101
q9	6937	5166	5076	5076
q10	6807	2323	1883	1883
q11	484	298	264	264
q12	345	359	217	217
q13	17750	3622	3053	3053
q14	226	238	209	209
q15	538	500	493	493
q16	445	449	400	400
q17	587	866	373	373
q18	7658	7185	6990	6990
q19	1223	948	554	554
q20	336	325	216	216
q21	4103	3389	2455	2455
q22	1081	1035	956	956
Total cold run time: 115107 ms
Total hot run time: 33719 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5070	5080	5051	5051
q2	245	326	230	230
q3	2140	2663	2288	2288
q4	1504	1820	1430	1430
q5	4464	4431	4381	4381
q6	211	168	127	127
q7	1974	1917	1714	1714
q8	2599	2570	2502	2502
q9	7236	7135	7056	7056
q10	2981	3178	2740	2740
q11	567	505	483	483
q12	680	748	594	594
q13	3494	3920	3288	3288
q14	290	299	272	272
q15	514	495	490	490
q16	466	514	455	455
q17	1156	1557	1361	1361
q18	7713	7485	7334	7334
q19	786	762	916	762
q20	1958	2049	1894	1894
q21	5176	4615	4678	4615
q22	1077	1043	991	991
Total cold run time: 52301 ms
Total hot run time: 50058 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184971 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 00c292e4c88bc8c59ca818194a31f5148ef8107a, data reload: false

query1	1011	476	496	476
query2	6559	1836	1869	1836
query3	6736	215	215	215
query4	26049	23207	23265	23207
query5	4384	628	467	467
query6	301	192	185	185
query7	4633	475	284	284
query8	280	243	233	233
query9	8607	2544	2584	2544
query10	486	313	274	274
query11	15508	15070	14663	14663
query12	168	110	104	104
query13	1652	530	417	417
query14	8725	6144	6173	6144
query15	217	185	172	172
query16	7147	630	491	491
query17	940	724	577	577
query18	1975	395	310	310
query19	202	188	157	157
query20	127	115	125	115
query21	211	127	107	107
query22	4222	4247	4055	4055
query23	33950	33033	33082	33033
query24	8471	2400	2403	2400
query25	531	442	392	392
query26	1248	263	143	143
query27	2768	491	327	327
query28	4314	2089	2067	2067
query29	769	558	427	427
query30	283	222	182	182
query31	927	863	791	791
query32	77	64	66	64
query33	555	362	324	324
query34	784	836	507	507
query35	785	804	740	740
query36	941	972	880	880
query37	113	101	83	83
query38	4138	4056	4045	4045
query39	1454	1360	1392	1360
query40	206	115	104	104
query41	56	55	52	52
query42	121	107	107	107
query43	504	513	454	454
query44	1270	811	803	803
query45	172	175	170	170
query46	824	999	610	610
query47	1795	1801	1735	1735
query48	376	409	305	305
query49	780	527	420	420
query50	641	671	387	387
query51	4164	4094	4162	4094
query52	121	101	95	95
query53	225	247	181	181
query54	574	569	487	487
query55	80	78	79	78
query56	300	293	279	279
query57	1121	1142	1119	1119
query58	262	255	261	255
query59	2619	2784	2593	2593
query60	343	315	303	303
query61	128	139	127	127
query62	800	739	642	642
query63	221	180	185	180
query64	4350	991	707	707
query65	4325	4270	4254	4254
query66	1159	405	296	296
query67	15548	15367	15256	15256
query68	8339	870	502	502
query69	476	298	263	263
query70	1199	1090	1065	1065
query71	433	333	298	298
query72	5636	4691	4723	4691
query73	724	603	342	342
query74	9326	9014	8628	8628
query75	3757	3215	2706	2706
query76	3628	1174	733	733
query77	784	367	277	277
query78	9918	9894	9180	9180
query79	6255	797	545	545
query80	667	507	429	429
query81	465	268	211	211
query82	716	130	97	97
query83	270	249	244	244
query84	292	109	83	83
query85	763	345	317	317
query86	404	300	317	300
query87	4397	4370	4289	4289
query88	2810	2220	2227	2220
query89	447	312	278	278
query90	1926	207	211	207
query91	140	145	111	111
query92	77	58	57	57
query93	3254	913	554	554
query94	672	417	295	295
query95	361	294	281	281
query96	484	557	275	275
query97	3163	3270	3106	3106
query98	222	205	202	202
query99	1424	1418	1294	1294
Total cold run time: 278762 ms
Total hot run time: 184971 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 00c292e4c88bc8c59ca818194a31f5148ef8107a, data reload: false

query1	0.03	0.03	0.03
query2	0.12	0.10	0.10
query3	0.26	0.19	0.20
query4	1.58	0.19	0.19
query5	0.58	0.57	0.57
query6	1.19	0.71	0.71
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.57	0.52	0.51
query10	0.57	0.56	0.58
query11	0.16	0.10	0.11
query12	0.14	0.12	0.12
query13	0.61	0.60	0.60
query14	1.17	1.20	1.15
query15	0.87	0.84	0.86
query16	0.38	0.38	0.38
query17	1.05	1.06	1.03
query18	0.21	0.20	0.19
query19	1.93	1.74	1.77
query20	0.01	0.02	0.01
query21	15.42	0.90	0.56
query22	0.75	1.11	0.68
query23	15.03	1.38	0.65
query24	7.78	0.71	1.36
query25	0.48	0.21	0.14
query26	0.57	0.15	0.14
query27	0.05	0.06	0.05
query28	9.77	0.83	0.42
query29	12.92	3.98	3.29
query30	0.25	0.10	0.06
query31	2.82	0.59	0.37
query32	3.22	0.55	0.47
query33	3.06	3.11	3.06
query34	15.76	5.12	4.59
query35	4.56	4.55	4.53
query36	0.66	0.49	0.48
query37	0.08	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 105.06 s
Total hot run time: 29.59 s

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34175 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 040aa749c767a91a95bb2ef2a56851961e2646a8, data reload: false

------ Round 1 ----------------------------------
q1	25980	5751	5064	5064
q2	2092	280	193	193
q3	10380	1226	668	668
q4	10232	998	537	537
q5	7533	2288	2360	2288
q6	176	162	133	133
q7	899	746	599	599
q8	9331	1236	1077	1077
q9	6852	5151	5106	5106
q10	6943	2330	1900	1900
q11	485	282	267	267
q12	361	367	220	220
q13	17782	3667	3132	3132
q14	232	237	209	209
q15	529	481	487	481
q16	446	452	396	396
q17	603	845	363	363
q18	7560	7194	7148	7148
q19	1249	985	598	598
q20	356	343	246	246
q21	4595	3489	2494	2494
q22	1073	1075	1056	1056
Total cold run time: 115689 ms
Total hot run time: 34175 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5124	5097	5128	5097
q2	243	328	232	232
q3	2181	2638	2325	2325
q4	1404	1807	1492	1492
q5	4589	4624	4545	4545
q6	220	176	138	138
q7	2143	1982	1839	1839
q8	2683	2693	2622	2622
q9	7576	7444	7258	7258
q10	2993	3135	2784	2784
q11	574	496	491	491
q12	678	759	660	660
q13	3594	3900	3250	3250
q14	268	313	289	289
q15	522	466	491	466
q16	465	499	447	447
q17	1126	1539	1416	1416
q18	7706	7638	7465	7465
q19	817	846	941	846
q20	1976	1964	1840	1840
q21	5229	4815	4801	4801
q22	1134	1079	1036	1036
Total cold run time: 53245 ms
Total hot run time: 51339 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193610 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 040aa749c767a91a95bb2ef2a56851961e2646a8, data reload: false

query1	1416	1087	1084	1084
query2	6138	1793	1777	1777
query3	11179	4519	4621	4519
query4	53328	24689	23825	23825
query5	5300	480	437	437
query6	356	197	197	197
query7	4946	492	286	286
query8	360	254	234	234
query9	5779	2566	2566	2566
query10	419	312	256	256
query11	15048	15529	14771	14771
query12	156	113	107	107
query13	1072	521	413	413
query14	10116	6477	6500	6477
query15	197	193	182	182
query16	7055	631	536	536
query17	1084	765	579	579
query18	1583	427	346	346
query19	206	196	174	174
query20	132	129	123	123
query21	209	136	114	114
query22	4384	4578	4347	4347
query23	34168	33446	33390	33390
query24	6582	2426	2435	2426
query25	448	474	406	406
query26	671	268	151	151
query27	2348	517	342	342
query28	3084	2133	2118	2118
query29	587	543	459	459
query30	271	217	188	188
query31	861	859	785	785
query32	74	66	67	66
query33	459	353	318	318
query34	767	883	530	530
query35	793	825	752	752
query36	963	1034	881	881
query37	114	101	77	77
query38	4199	4284	4328	4284
query39	1497	1432	1443	1432
query40	230	117	111	111
query41	55	54	54	54
query42	124	109	108	108
query43	503	511	475	475
query44	1359	817	845	817
query45	190	175	164	164
query46	850	1025	637	637
query47	1878	1896	1821	1821
query48	380	415	306	306
query49	677	498	416	416
query50	664	711	414	414
query51	4266	4342	4169	4169
query52	114	103	94	94
query53	224	252	180	180
query54	592	568	519	519
query55	86	79	79	79
query56	317	295	296	295
query57	1187	1220	1129	1129
query58	261	268	258	258
query59	2748	2778	2637	2637
query60	353	334	315	315
query61	129	144	123	123
query62	740	755	677	677
query63	221	191	200	191
query64	1496	1027	675	675
query65	4398	4212	4274	4212
query66	722	394	300	300
query67	15929	15840	15758	15758
query68	7562	887	575	575
query69	547	311	263	263
query70	1242	1093	1102	1093
query71	487	327	287	287
query72	5908	4778	5015	4778
query73	1404	810	353	353
query74	9384	9055	8916	8916
query75	3780	3215	2712	2712
query76	4238	1200	765	765
query77	633	369	285	285
query78	10099	10051	9227	9227
query79	3377	828	551	551
query80	636	517	495	495
query81	484	254	217	217
query82	458	123	94	94
query83	357	250	225	225
query84	283	109	81	81
query85	806	348	335	335
query86	409	316	270	270
query87	4365	4422	4288	4288
query88	3250	2218	2218	2218
query89	410	310	281	281
query90	1955	219	217	217
query91	144	147	108	108
query92	68	60	54	54
query93	1936	934	578	578
query94	673	398	318	318
query95	366	303	283	283
query96	492	567	273	273
query97	3242	3249	3108	3108
query98	229	214	195	195
query99	1431	1414	1278	1278
Total cold run time: 299697 ms
Total hot run time: 193610 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.8 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 040aa749c767a91a95bb2ef2a56851961e2646a8, data reload: false

query1	0.04	0.03	0.04
query2	0.13	0.10	0.12
query3	0.26	0.20	0.20
query4	1.60	0.20	0.19
query5	0.60	0.58	0.61
query6	1.22	0.72	0.72
query7	0.03	0.01	0.02
query8	0.04	0.03	0.04
query9	0.57	0.53	0.52
query10	0.56	0.57	0.57
query11	0.16	0.11	0.11
query12	0.15	0.11	0.11
query13	0.61	0.59	0.60
query14	1.16	1.17	1.19
query15	0.88	0.86	0.86
query16	0.38	0.39	0.37
query17	1.05	1.03	1.01
query18	0.22	0.20	0.20
query19	1.91	1.85	1.82
query20	0.02	0.01	0.01
query21	15.41	0.89	0.54
query22	0.76	1.17	0.74
query23	14.90	1.38	0.64
query24	7.17	0.92	0.84
query25	0.50	0.18	0.11
query26	0.65	0.17	0.15
query27	0.06	0.05	0.04
query28	9.13	0.86	0.43
query29	12.58	3.98	3.31
query30	0.26	0.09	0.08
query31	2.81	0.60	0.38
query32	3.22	0.55	0.47
query33	3.10	3.08	2.99
query34	15.87	5.13	4.50
query35	4.57	4.57	4.57
query36	0.66	0.50	0.48
query37	0.08	0.06	0.07
query38	0.05	0.03	0.03
query39	0.02	0.02	0.02
query40	0.17	0.14	0.14
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.7 s
Total hot run time: 29.8 s

@morningman morningman requested a review from Copilot April 23, 2025 20:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds file format configuration support by introducing new property classes and comprehensive unit tests for multiple file formats (Parquet, ORC, JSON, CSV, Avro, and WAL).

  • New property classes for each file format type have been implemented.
  • Extensive unit tests have been added to validate the behavior for valid and invalid property inputs.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ParquetFileFormatPropertiesTest.java Adds tests for default Parquet property values.
OrcFileFormatPropertiesTest.java Adds tests for ORC file format property analysis.
JsonFileFormatPropertiesTest.java Covers property validations for JSON input configurations.
CsvFileFormatPropertiesTest.java Introduces validations and error handling for CSV settings.
AvroFileFormatPropertiesTest.java Provides a basic test for Avro file format properties.
WalFileFormatProperties.java Implements a skeleton for WAL file format properties.
FileFormatProperties and constants Implements property creation and common defaults.
Comments suppressed due to low confidence (2)

fe/fe-core/src/main/java/org/apache/doris/datasource/property/fileformat/WalFileFormatProperties.java:53

  • The analyzeFileFormatProperties method in WalFileFormatProperties is empty, which could lead to silent failures when unexpected properties are provided. Consider adding an explicit exception or implementation to handle unsupported properties.
public void analyzeFileFormatProperties(Map<String, String> formatProperties, boolean isRemoveOriginProperty) throws AnalysisException {

fe/fe-core/src/main/java/org/apache/doris/datasource/property/fileformat/JsonFileFormatProperties.java:91

  • [nitpick] The method name 'setJsonpaths' should follow consistent camelCase naming conventions (e.g., setJsonPaths) to improve readability and maintain consistency.
fileAttributes.setJsonpaths(jsonPaths);


package org.apache.doris.datasource.property.constants;

public class JsonProperties {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge these class under constants package into classes under fileformat package.
For example, JsonProperties only defines some constants string, which can be placed in JsonFileFormatProperties?

@Override
public void analyzeFileFormatProperties(Map<String, String> formatProperties, boolean isRemoveOriginProperty)
throws AnalysisException {
// 这几个json应该移到json checker中
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English

}
}

public static FileFormatProperties createFileFormatProperties(Map<String, String> formatProperties)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we may use the suffix of a file to guess its format.
For example, the file path is /path/to/1.parquet, so we know that this is a parquet format, even if user does not specify the "file_format" property.
We should handle this.

  1. create a tool method to infer the file format from path.
  2. change the signature of this createFileFormatProperties to createFileFormatProperties(TFileFormatType fileFormat, properties)

And I think without the suffix and user specified "file_format" property, we should provide a "default" format file, eg, csv. But it depends because I am not sure what the previous logic is

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33755 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d409a9f5a667ebb42e4a4f57307c9234e0348a9d, data reload: false

------ Round 1 ----------------------------------
q1	26238	5023	4974	4974
q2	2056	270	183	183
q3	10413	1268	703	703
q4	10233	984	525	525
q5	7551	2358	2331	2331
q6	184	183	131	131
q7	899	735	614	614
q8	9305	1217	1051	1051
q9	6748	5120	5086	5086
q10	6854	2328	1859	1859
q11	482	278	259	259
q12	349	351	214	214
q13	17783	3683	3061	3061
q14	224	231	209	209
q15	550	517	487	487
q16	443	448	392	392
q17	573	856	358	358
q18	7576	7221	7164	7164
q19	1613	968	541	541
q20	328	335	222	222
q21	4018	3343	2396	2396
q22	1080	1030	995	995
Total cold run time: 115500 ms
Total hot run time: 33755 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5155	5064	5078	5064
q2	241	329	227	227
q3	2121	2653	2274	2274
q4	1461	1843	1467	1467
q5	4421	4368	4372	4368
q6	216	165	137	137
q7	2048	1906	1770	1770
q8	2584	2530	2563	2530
q9	7304	7254	7037	7037
q10	3044	3226	2757	2757
q11	569	501	476	476
q12	676	759	604	604
q13	3512	3849	3319	3319
q14	279	317	272	272
q15	533	494	489	489
q16	473	504	454	454
q17	1137	1553	1406	1406
q18	7809	7639	7493	7493
q19	782	775	894	775
q20	1988	2023	1833	1833
q21	5194	4839	4872	4839
q22	1135	1070	1030	1030
Total cold run time: 52682 ms
Total hot run time: 50621 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192764 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d409a9f5a667ebb42e4a4f57307c9234e0348a9d, data reload: false

query1	1394	1076	1075	1075
query2	6180	1822	1802	1802
query3	11160	4709	4623	4623
query4	55234	26489	23509	23509
query5	4697	579	445	445
query6	342	211	193	193
query7	4879	493	288	288
query8	289	244	227	227
query9	5498	2593	2586	2586
query10	417	332	248	248
query11	15094	15066	14730	14730
query12	154	112	102	102
query13	1018	507	398	398
query14	10129	6346	6444	6346
query15	226	195	187	187
query16	7143	696	527	527
query17	1116	780	604	604
query18	1580	417	331	331
query19	202	203	169	169
query20	128	127	121	121
query21	212	129	108	108
query22	4389	4538	4359	4359
query23	34265	33437	33544	33437
query24	6539	2417	2402	2402
query25	448	476	421	421
query26	721	279	154	154
query27	2283	508	343	343
query28	3065	2161	2158	2158
query29	631	558	442	442
query30	271	227	191	191
query31	885	846	786	786
query32	76	69	60	60
query33	448	361	315	315
query34	754	861	515	515
query35	802	855	756	756
query36	935	978	908	908
query37	111	96	79	79
query38	4214	4213	4122	4122
query39	1526	1478	1442	1442
query40	233	120	110	110
query41	58	56	52	52
query42	124	110	114	110
query43	506	501	464	464
query44	1374	837	817	817
query45	184	174	176	174
query46	841	1029	632	632
query47	1863	1896	1835	1835
query48	391	420	314	314
query49	687	528	435	435
query50	650	708	415	415
query51	4215	4234	4156	4156
query52	109	106	94	94
query53	225	253	191	191
query54	578	581	516	516
query55	84	80	83	80
query56	319	296	301	296
query57	1220	1202	1126	1126
query58	273	259	259	259
query59	2820	2800	2736	2736
query60	333	326	316	316
query61	135	138	146	138
query62	753	738	668	668
query63	224	190	180	180
query64	1723	1070	746	746
query65	4450	4338	4246	4246
query66	690	457	305	305
query67	15885	15458	15395	15395
query68	7712	834	511	511
query69	547	321	267	267
query70	1176	1106	1112	1106
query71	559	328	295	295
query72	6006	4700	4732	4700
query73	1497	637	340	340
query74	8996	9192	8763	8763
query75	4113	3218	2682	2682
query76	4274	1197	755	755
query77	657	366	288	288
query78	10052	10059	9182	9182
query79	5830	806	546	546
query80	648	504	437	437
query81	482	254	222	222
query82	631	124	97	97
query83	318	254	236	236
query84	297	119	87	87
query85	784	363	315	315
query86	349	314	282	282
query87	4394	4544	4377	4377
query88	3521	2260	2247	2247
query89	450	313	288	288
query90	1987	214	217	214
query91	137	136	110	110
query92	80	59	57	57
query93	2963	938	580	580
query94	686	429	305	305
query95	371	295	290	290
query96	476	629	272	272
query97	3126	3251	3106	3106
query98	234	210	203	203
query99	1457	1441	1289	1289
Total cold run time: 305212 ms
Total hot run time: 192764 ms

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 27, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 080ead1 into apache:master Apr 27, 2025
28 checks passed
morningman pushed a commit that referenced this pull request Apr 29, 2025
Issue Number: #50238

Problem Summary:

Previously, we refactored the code of the `fileFormat` attribute
(#50225). However, we only added the relevant code without modifying the
business code. This pull request (PR) modifies the code of the
table-valued function (tvf) that is related to the `fileformat` format.
CalvinKirs pushed a commit that referenced this pull request Apr 29, 2025
Issue Number: #50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the `SELECT INTO OUTFILE`
feature that is related to the fileformat.
morningman pushed a commit that referenced this pull request May 14, 2025
…50552)

Issue Number: #50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the `RoutineLoad` feature
that is related to the fileformat.
morningman pushed a commit that referenced this pull request May 29, 2025
…50882)

Issue Number:#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the BrokerLoad feature that
is related to the fileformat.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…0463)

Issue Number: apache#50238

Problem Summary:

Previously, we refactored the code of the `fileFormat` attribute
(apache#50225). However, we only added the relevant code without modifying the
business code. This pull request (PR) modifies the code of the
table-valued function (tvf) that is related to the `fileformat` format.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…he#50471)

Issue Number: apache#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (apache#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the `SELECT INTO OUTFILE`
feature that is related to the fileformat.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…pache#50552)

Issue Number: apache#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (apache#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the `RoutineLoad` feature
that is related to the fileformat.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…pache#50882)

Issue Number:apache#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (apache#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the BrokerLoad feature that
is related to the fileformat.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 21, 2025
morningman pushed a commit to morningman/doris that referenced this pull request Jun 21, 2025
…0463)

Issue Number: apache#50238

Problem Summary:

Previously, we refactored the code of the `fileFormat` attribute
(apache#50225). However, we only added the relevant code without modifying the
business code. This pull request (PR) modifies the code of the
table-valued function (tvf) that is related to the `fileformat` format.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 21, 2025
…he#50471)

Issue Number: apache#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (apache#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the `SELECT INTO OUTFILE`
feature that is related to the fileformat.
morrySnow pushed a commit that referenced this pull request Jun 23, 2025
…and outfile #50225 #50463 #50471 (#52101)

bp #50225 #50463 #50471

---------

Co-authored-by: Tiewei Fang <fangtiewei@selectdb.com>
CalvinKirs pushed a commit to CalvinKirs/incubator-doris that referenced this pull request Jun 24, 2025
…ileformat (apache#50552)

Issue Number: apache#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (apache#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the `RoutineLoad` feature
that is related to the fileformat.

(cherry picked from commit b3abfab)
morningman pushed a commit to morningman/doris that referenced this pull request Jul 12, 2025
…pache#50882)

Issue Number:apache#50238

Problem Summary:

Previously, we refactored the code of the fileFormat attribute (apache#50225).
However, we only added the relevant code without modifying the business
code. This pull request modifies the code of the BrokerLoad feature that
is related to the fileformat.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants