Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Apr 12, 2025

What problem does this PR solve?

In this PR #34520, we only handle HivePartitionWriter. But this should be applied to all hdfs writer.
This PR fix it, unify the logic to make it work with both hive and iceberg writer.

If the path is an absolute full path like hdfs://host/path/to/file, use hdfs://host/ as fs name,
otherwise, use default fs name.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Apr 12, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34132 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 14d9588c945ff789fc7cf8857b8863892d5e5abe, data reload: false

------ Round 1 ----------------------------------
q1	26184	5050	5034	5034
q2	2055	267	186	186
q3	10406	1235	700	700
q4	10219	1006	512	512
q5	7516	2370	2331	2331
q6	181	160	133	133
q7	912	749	612	612
q8	9306	1284	1110	1110
q9	6908	5100	5096	5096
q10	6802	2271	1886	1886
q11	459	281	268	268
q12	355	359	222	222
q13	17778	3618	3104	3104
q14	231	231	213	213
q15	528	469	481	469
q16	621	615	585	585
q17	573	847	351	351
q18	7449	7082	7124	7082
q19	1219	929	564	564
q20	341	332	232	232
q21	3984	3331	2493	2493
q22	1056	978	949	949
Total cold run time: 115083 ms
Total hot run time: 34132 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5186	5173	5158	5158
q2	243	327	228	228
q3	2221	2697	2336	2336
q4	1491	1836	1452	1452
q5	4561	4476	4374	4374
q6	211	169	132	132
q7	2001	1880	1766	1766
q8	2559	2549	2530	2530
q9	7338	7240	6965	6965
q10	3027	3164	2727	2727
q11	576	491	486	486
q12	676	798	632	632
q13	3464	3840	3333	3333
q14	283	294	299	294
q15	534	485	494	485
q16	633	685	645	645
q17	1156	1459	1429	1429
q18	7878	7584	7531	7531
q19	838	874	971	874
q20	1886	1982	1883	1883
q21	5508	4856	4861	4856
q22	1078	1049	987	987
Total cold run time: 53348 ms
Total hot run time: 51103 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193025 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 14d9588c945ff789fc7cf8857b8863892d5e5abe, data reload: false

query1	1393	1103	1053	1053
query2	6226	1923	1926	1923
query3	10988	4386	4634	4386
query4	54285	24865	23472	23472
query5	5055	540	467	467
query6	345	198	200	198
query7	4895	493	281	281
query8	310	237	231	231
query9	5631	2556	2565	2556
query10	432	325	260	260
query11	15036	14900	14866	14866
query12	158	107	104	104
query13	1037	511	387	387
query14	10119	6188	6307	6188
query15	205	200	189	189
query16	7070	664	524	524
query17	1079	779	598	598
query18	1522	437	329	329
query19	202	206	180	180
query20	146	125	122	122
query21	208	128	106	106
query22	4477	4566	4448	4448
query23	34172	33334	33682	33334
query24	7030	2425	2466	2425
query25	465	468	405	405
query26	744	269	162	162
query27	2294	501	335	335
query28	3110	2425	2408	2408
query29	629	584	409	409
query30	278	232	199	199
query31	846	890	769	769
query32	76	65	63	63
query33	469	369	332	332
query34	807	873	529	529
query35	818	852	792	792
query36	974	990	905	905
query37	118	100	86	86
query38	4114	4151	4296	4151
query39	1513	1468	1434	1434
query40	230	122	108	108
query41	53	54	50	50
query42	119	114	112	112
query43	504	515	510	510
query44	1297	804	837	804
query45	184	176	171	171
query46	874	1027	647	647
query47	1854	1860	1818	1818
query48	391	422	318	318
query49	698	548	426	426
query50	686	696	419	419
query51	4271	4272	4183	4183
query52	110	110	102	102
query53	232	269	186	186
query54	581	583	516	516
query55	80	78	83	78
query56	311	327	297	297
query57	1166	1201	1118	1118
query58	269	258	259	258
query59	2735	2857	2761	2761
query60	341	339	308	308
query61	138	137	130	130
query62	737	744	693	693
query63	220	189	195	189
query64	2114	1080	741	741
query65	4334	4340	4250	4250
query66	815	407	309	309
query67	15769	15539	15302	15302
query68	7576	890	514	514
query69	536	297	262	262
query70	1181	1133	1116	1116
query71	511	331	295	295
query72	5951	4821	5002	4821
query73	1424	697	362	362
query74	9196	8946	8666	8666
query75	3837	3171	2698	2698
query76	4247	1185	740	740
query77	632	366	273	273
query78	10035	10221	9265	9265
query79	1971	823	564	564
query80	612	503	453	453
query81	480	253	219	219
query82	457	130	96	96
query83	250	263	230	230
query84	296	117	91	91
query85	805	365	323	323
query86	354	296	308	296
query87	4451	4492	4370	4370
query88	3170	2208	2180	2180
query89	400	311	282	282
query90	1954	209	211	209
query91	145	142	120	120
query92	80	62	54	54
query93	1101	944	576	576
query94	680	420	315	315
query95	380	290	276	276
query96	493	539	271	271
query97	3201	3201	3182	3182
query98	225	202	205	202
query99	1447	1393	1294	1294
Total cold run time: 298467 ms
Total hot run time: 193025 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 14d9588c945ff789fc7cf8857b8863892d5e5abe, data reload: false

query1	0.05	0.04	0.03
query2	0.12	0.11	0.10
query3	0.26	0.19	0.19
query4	1.58	0.20	0.19
query5	0.58	0.58	0.59
query6	1.18	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.57	0.51	0.50
query10	0.57	0.57	0.57
query11	0.16	0.11	0.11
query12	0.14	0.11	0.12
query13	0.62	0.60	0.60
query14	2.75	2.69	2.69
query15	0.94	0.85	0.85
query16	0.39	0.39	0.38
query17	1.05	1.06	1.00
query18	0.21	0.19	0.20
query19	1.92	1.94	1.85
query20	0.02	0.01	0.01
query21	15.36	0.88	0.54
query22	0.75	1.25	0.77
query23	14.76	1.37	0.63
query24	7.28	1.57	1.06
query25	0.51	0.14	0.13
query26	0.59	0.16	0.13
query27	0.05	0.05	0.05
query28	10.17	0.82	0.42
query29	12.65	4.18	3.41
query30	0.24	0.09	0.06
query31	2.82	0.57	0.37
query32	3.23	0.55	0.45
query33	3.04	3.02	3.11
query34	15.83	5.10	4.48
query35	4.45	4.50	4.52
query36	0.66	0.50	0.48
query37	0.09	0.07	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.14	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 106.05 s
Total hot run time: 31.55 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/12) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.46% (14104/26883)
Line Coverage 41.26% (121899/295461)
Region Coverage 40.02% (62089/155132)
Branch Coverage 34.66% (31077/89656)

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34322 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 53af28e8913d573a609a7bb7f2686725393ced2b, data reload: false

------ Round 1 ----------------------------------
q1	26212	5060	5064	5060
q2	2074	269	184	184
q3	10400	1211	679	679
q4	10241	1027	554	554
q5	7537	2214	2396	2214
q6	191	162	132	132
q7	892	757	614	614
q8	9313	1278	1160	1160
q9	6871	5085	5096	5085
q10	6860	2290	1912	1912
q11	503	284	277	277
q12	362	367	224	224
q13	17784	3666	3121	3121
q14	232	230	224	224
q15	525	482	466	466
q16	634	620	573	573
q17	601	861	359	359
q18	7733	7267	7163	7163
q19	1832	974	587	587
q20	342	321	214	214
q21	4392	3486	2546	2546
q22	1064	1048	974	974
Total cold run time: 116595 ms
Total hot run time: 34322 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5274	5112	5095	5095
q2	242	338	236	236
q3	2181	2656	2271	2271
q4	1459	1868	1567	1567
q5	4522	4399	4376	4376
q6	220	164	124	124
q7	1960	1887	1778	1778
q8	2572	2577	2547	2547
q9	7315	7121	7254	7121
q10	2989	3167	2764	2764
q11	574	509	477	477
q12	666	753	607	607
q13	3536	3886	3376	3376
q14	290	305	273	273
q15	545	500	498	498
q16	662	673	656	656
q17	1161	1486	1430	1430
q18	7635	7609	7470	7470
q19	846	891	922	891
q20	1935	1956	1901	1901
q21	5338	4772	4832	4772
q22	1097	1071	1013	1013
Total cold run time: 53019 ms
Total hot run time: 51243 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192136 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 53af28e8913d573a609a7bb7f2686725393ced2b, data reload: false

query1	1391	1101	1077	1077
query2	6203	1893	1884	1884
query3	11035	4451	4525	4451
query4	52935	25933	23043	23043
query5	5115	542	461	461
query6	371	201	200	200
query7	5025	504	281	281
query8	314	247	230	230
query9	6192	2569	2571	2569
query10	421	323	266	266
query11	15058	15032	14926	14926
query12	161	115	110	110
query13	1120	531	428	428
query14	10115	6279	6520	6279
query15	205	193	180	180
query16	7114	676	498	498
query17	1059	740	558	558
query18	1571	414	322	322
query19	192	192	166	166
query20	126	127	121	121
query21	212	123	108	108
query22	4449	4402	4231	4231
query23	34105	33497	33495	33495
query24	6559	2434	2384	2384
query25	457	484	427	427
query26	659	295	154	154
query27	2203	505	336	336
query28	3230	2421	2477	2421
query29	606	592	484	484
query30	289	233	195	195
query31	920	862	820	820
query32	75	70	63	63
query33	469	375	322	322
query34	778	872	532	532
query35	807	825	767	767
query36	953	1006	878	878
query37	121	104	80	80
query38	4196	4302	4026	4026
query39	1512	1458	1469	1458
query40	213	126	116	116
query41	94	50	51	50
query42	125	110	109	109
query43	495	517	493	493
query44	1338	816	819	816
query45	180	173	173	173
query46	848	1047	660	660
query47	1846	1852	1798	1798
query48	402	407	325	325
query49	746	501	425	425
query50	656	721	414	414
query51	4219	4280	4145	4145
query52	116	108	99	99
query53	241	262	181	181
query54	594	580	532	532
query55	87	77	80	77
query56	303	313	303	303
query57	1155	1187	1103	1103
query58	262	265	271	265
query59	2675	2726	2575	2575
query60	332	324	314	314
query61	126	130	132	130
query62	748	730	659	659
query63	230	191	191	191
query64	1629	1050	704	704
query65	4444	4332	4200	4200
query66	712	403	304	304
query67	15893	15915	15212	15212
query68	7240	872	512	512
query69	536	299	271	271
query70	1231	1121	1080	1080
query71	522	313	291	291
query72	5554	4657	4608	4608
query73	1270	575	346	346
query74	8874	9205	8923	8923
query75	3929	3213	2696	2696
query76	4338	1187	753	753
query77	632	365	294	294
query78	10045	10079	9262	9262
query79	2292	810	569	569
query80	605	504	426	426
query81	474	259	220	220
query82	430	127	96	96
query83	362	250	226	226
query84	290	108	83	83
query85	797	341	319	319
query86	370	301	288	288
query87	4387	4446	4300	4300
query88	3397	2226	2319	2226
query89	404	310	289	289
query90	1962	222	212	212
query91	138	136	111	111
query92	77	57	55	55
query93	1442	923	576	576
query94	673	421	292	292
query95	365	284	284	284
query96	507	554	274	274
query97	3158	3264	3139	3139
query98	231	218	205	205
query99	1441	1394	1306	1306
Total cold run time: 297005 ms
Total hot run time: 192136 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 53af28e8913d573a609a7bb7f2686725393ced2b, data reload: false

query1	0.03	0.03	0.03
query2	0.13	0.11	0.11
query3	0.25	0.19	0.20
query4	1.59	0.20	0.18
query5	0.58	0.59	0.58
query6	1.19	0.71	0.73
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.54	0.52
query10	0.55	0.56	0.58
query11	0.15	0.11	0.11
query12	0.14	0.10	0.12
query13	0.61	0.60	0.61
query14	2.81	2.70	2.67
query15	0.93	0.85	0.86
query16	0.37	0.38	0.36
query17	1.02	1.02	1.01
query18	0.21	0.19	0.19
query19	1.90	1.97	1.81
query20	0.02	0.01	0.01
query21	15.36	0.89	0.54
query22	0.75	1.10	0.63
query23	15.04	1.37	0.62
query24	7.03	2.04	0.84
query25	0.47	0.21	0.14
query26	0.69	0.17	0.13
query27	0.05	0.05	0.05
query28	10.01	0.80	0.44
query29	12.53	4.00	3.34
query30	0.24	0.08	0.07
query31	2.82	0.59	0.38
query32	3.23	0.55	0.46
query33	3.02	3.05	3.09
query34	15.67	5.08	4.45
query35	4.47	4.52	4.48
query36	0.66	0.50	0.49
query37	0.08	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 105.65 s
Total hot run time: 31.07 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.45% (14102/26884)
Line Coverage 41.25% (121869/295453)
Region Coverage 40.01% (62065/155126)
Branch Coverage 34.65% (31067/89650)

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 14, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman added the usercase Important user case type label label Apr 15, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit b32f8c6 into apache:master Apr 19, 2025
29 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 19, 2025
…49998)

### What problem does this PR solve?

In this PR #34520, we only handle HivePartitionWriter. But this should
be applied to all hdfs writer.
This PR fix it, unify the logic to make it work with both hive and
iceberg writer.

If the path is an absolute full path like `hdfs://host/path/to/file`,
use `hdfs://host/` as fs name,
otherwise, use default fs name.
dataroaring pushed a commit that referenced this pull request Apr 22, 2025
…ontains fs #49998 (#50197)

Cherry-picked from #49998

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…pache#49998)

### What problem does this PR solve?

In this PR apache#34520, we only handle HivePartitionWriter. But this should
be applied to all hdfs writer.
This PR fix it, unify the logic to make it work with both hive and
iceberg writer.

If the path is an absolute full path like `hdfs://host/path/to/file`,
use `hdfs://host/` as fs name,
otherwise, use default fs name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.6-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants