Skip to content

Conversation

@CalvinKirs
Copy link
Member

What problem does this PR solve?

When inserting into a Hive partitioned table stored on oss-hdfs, the following issue occurs:

First insert succeeds: Since the partition does not exist yet, HiveTableSink#setPartitionValues does not set storage-related information for the partition.

Subsequent inserts fail: Once the partition exists, the system tries to resolve the partition’s storage information. At this stage, oss-hdfs is incorrectly treated as s3 instead of being recognized as hdfs, leading to insert failure.

This PR fixes the storage type handling logic so that oss-hdfs partitions are correctly recognized as hdfs.

How to Reproduce

Step 1: Create a Hive catalog whose storage is configured to use oss-hdfs. 

CREATE TABLE hive_partition_table
(
  `ts` DATETIME COMMENT 'ts',
  `col1` BOOLEAN COMMENT 'col1',
  `col2` INT COMMENT 'col2',
  `col3` BIGINT COMMENT 'col3',
  `col4` FLOAT COMMENT 'col4',
  `col5` DOUBLE COMMENT 'col5',
  `col6` DECIMAL(9,4) COMMENT 'col6',
  `col7` STRING COMMENT 'col7',
  `col8` DATE COMMENT 'col8',
  `col9` DATETIME COMMENT 'col9',
  `pt1` STRING COMMENT 'pt1',
  `pt2` STRING COMMENT 'pt2'
)
PARTITION BY LIST (day(ts), pt1, pt2) ()
PROPERTIES (
  'write-format'='orc',
  'compression-codec'='zlib'
);

-- First insert (works fine)
INSERT INTO hive_partition_table VALUES
  ('2023-01-01 00:00:00', true, 1, 1, 1.0, 1.0, 1.0000, '1', '2023-01-01', '2023-01-01 00:00:00', 'a', '1'),
  ('2023-01-02 00:00:00', false, 2, 2, 2.0, 2.0, 2.0000, '2', '2023-01-02', '2023-01-02 00:00:00', 'b', '2'),
  ('2023-01-03 00:00:00', true, 3, 3, 3.0, 3.0, 3.0000, '3', '2023-01-03', '2023-01-03 00:00:00', 'c', '3');

-- Second insert (fails)
INSERT INTO hive_partition_table VALUES
  ('2023-01-01 00:00:00', true, 1, 1, 1.0, 1.0, 1.0000, '1', '2023-01-01', '2023-01-01 00:00:00', 'a', '1'),
  ('2023-01-02 00:00:00', false, 2, 2, 2.0, 2.0, 2.0000, '2', '2023-01-02', '2023-01-02 00:00:00', 'b', '2'),
  ('2023-01-03 00:00:00', true, 3, 3, 3.0, 3.0, 3.0000, '3', '2023-01-03', '2023-01-03 00:00:00', 'c', '3');


Error message on the second insert:

[INVALID_ARGUMENT] Invalid S3 URI: oss://emr-ssss-oss.cn-beijing.oss-dls.aliyuncs.com/tmp/.sss/root/4118a835d5d948f8adc34107230c9b9b/pt1=a/pt2=1/727bd17a7b9541db-8f4bb2fbfda35b86_6ec0a4b4-cacc-4dd3-b3fc-b130cadcd508-0.zlib.orc

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…hdfs

When inserting into a Hive partitioned table stored on oss-hdfs, the following issue occurs:

First insert succeeds: Since the partition does not exist yet, HiveTableSink#setPartitionValues does not set storage-related information for the partition.

Subsequent inserts fail: Once the partition exists, the system tries to resolve the partition’s storage information. At this stage, oss-hdfs is incorrectly treated as s3 instead of being recognized as hdfs, leading to insert failure.

This PR fixes the storage type handling logic so that oss-hdfs partitions are correctly recognized as hdfs.
@Thearas
Copy link
Contributor

Thearas commented Sep 2, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39647 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5346e7da00cf986762a139ca28206238f7389ca3, data reload: false

------ Round 1 ----------------------------------
q1	17594	7328	6658	6658
q2	2045	180	165	165
q3	10527	1123	1149	1123
q4	10222	760	734	734
q5	7731	2799	2833	2799
q6	211	134	135	134
q7	984	602	604	602
q8	9349	1927	2018	1927
q9	6719	6426	6393	6393
q10	7049	2253	2278	2253
q11	462	260	256	256
q12	435	210	213	210
q13	17776	2956	2976	2956
q14	226	213	210	210
q15	513	459	466	459
q16	492	370	370	370
q17	978	609	567	567
q18	7111	6557	6711	6557
q19	1411	1115	987	987
q20	458	215	202	202
q21	3814	3121	3119	3119
q22	1087	966	990	966
Total cold run time: 107194 ms
Total hot run time: 39647 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6777	6715	6648	6648
q2	332	246	247	246
q3	2977	3014	2984	2984
q4	2069	1837	1854	1837
q5	5767	5799	5791	5791
q6	216	132	132	132
q7	2308	1833	1879	1833
q8	3476	3558	3565	3558
q9	8910	9020	8950	8950
q10	3573	3561	3503	3503
q11	591	489	490	489
q12	789	589	586	586
q13	9647	3132	3144	3132
q14	309	280	277	277
q15	509	460	459	459
q16	489	439	435	435
q17	1854	1650	1602	1602
q18	8319	7752	7678	7678
q19	1668	1613	1533	1533
q20	2097	1868	1870	1868
q21	5183	5091	4960	4960
q22	1189	1045	1028	1028
Total cold run time: 69049 ms
Total hot run time: 59529 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192289 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5346e7da00cf986762a139ca28206238f7389ca3, data reload: false

query1	930	399	395	395
query2	6287	1932	1871	1871
query3	8694	209	205	205
query4	33810	23811	23412	23412
query5	3614	466	453	453
query6	278	185	172	172
query7	4209	305	310	305
query8	283	217	226	217
query9	9557	2593	2597	2593
query10	502	266	267	266
query11	18130	15688	15220	15220
query12	169	104	100	100
query13	1556	440	440	440
query14	8990	6588	6543	6543
query15	234	173	173	173
query16	7916	435	482	435
query17	1652	623	584	584
query18	2141	344	330	330
query19	219	165	165	165
query20	120	122	124	122
query21	207	112	106	106
query22	4593	4421	4430	4421
query23	34877	34324	34192	34192
query24	13258	2979	2987	2979
query25	765	445	441	441
query26	1807	182	175	175
query27	2976	361	365	361
query28	7418	2156	2152	2152
query29	1090	461	461	461
query30	286	163	158	158
query31	1070	828	854	828
query32	102	58	57	57
query33	792	311	329	311
query34	1004	578	544	544
query35	829	753	759	753
query36	1147	986	986	986
query37	268	66	69	66
query38	4126	3901	3938	3901
query39	1514	1473	1492	1473
query40	253	100	100	100
query41	54	48	54	48
query42	110	96	103	96
query43	522	471	483	471
query44	1309	824	824	824
query45	190	176	171	171
query46	1179	742	727	727
query47	1981	1930	1934	1930
query48	496	399	378	378
query49	1104	406	402	402
query50	839	429	450	429
query51	7425	7270	7378	7270
query52	103	97	95	95
query53	262	190	188	188
query54	1288	483	492	483
query55	84	74	79	74
query56	273	258	253	253
query57	1327	1216	1187	1187
query58	221	208	226	208
query59	3374	2998	3010	2998
query60	286	263	248	248
query61	125	135	138	135
query62	871	710	688	688
query63	222	191	191	191
query64	4903	679	647	647
query65	3400	3355	3273	3273
query66	1420	301	339	301
query67	16451	15773	15549	15549
query68	4496	586	594	586
query69	427	263	269	263
query70	1202	1088	1095	1088
query71	414	273	248	248
query72	6180	3994	4006	3994
query73	766	352	358	352
query74	10356	8950	8961	8950
query75	3458	2643	2697	2643
query76	2774	1077	1017	1017
query77	401	274	275	274
query78	10492	9622	9572	9572
query79	1195	595	597	595
query80	842	435	434	434
query81	508	220	220	220
query82	415	88	85	85
query83	168	140	144	140
query84	280	83	78	78
query85	1374	320	298	298
query86	448	285	304	285
query87	4408	4309	4293	4293
query88	4536	2439	2411	2411
query89	408	292	297	292
query90	2024	188	191	188
query91	184	153	179	153
query92	60	48	47	47
query93	1854	565	564	564
query94	750	304	298	298
query95	356	265	263	263
query96	633	283	283	283
query97	3309	3170	3137	3137
query98	221	199	195	195
query99	1631	1313	1317	1313
Total cold run time: 304062 ms
Total hot run time: 192289 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.75 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5346e7da00cf986762a139ca28206238f7389ca3, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.03
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.50	0.51	0.51
query6	1.13	0.74	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.57	0.51	0.49
query10	0.56	0.55	0.55
query11	0.14	0.11	0.10
query12	0.14	0.11	0.11
query13	0.61	0.60	0.58
query14	0.77	0.82	0.79
query15	0.84	0.83	0.83
query16	0.42	0.40	0.38
query17	1.03	1.07	1.06
query18	0.24	0.23	0.22
query19	1.95	1.84	1.83
query20	0.02	0.01	0.01
query21	15.40	0.59	0.58
query22	2.29	2.49	1.72
query23	16.85	0.88	0.82
query24	3.22	0.50	0.74
query25	0.22	0.10	0.06
query26	0.32	0.14	0.13
query27	0.04	0.04	0.05
query28	11.18	0.48	0.53
query29	12.60	3.18	3.19
query30	0.25	0.06	0.06
query31	2.85	0.40	0.39
query32	3.23	0.47	0.45
query33	3.03	3.05	3.03
query34	17.10	4.52	4.50
query35	4.58	4.57	4.59
query36	0.66	0.48	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.25 s
Total hot run time: 29.75 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39958 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 002d64dd54e4d36973b41bfd0ae896526f47f8c6, data reload: false

------ Round 1 ----------------------------------
q1	17616	6797	6664	6664
q2	2059	206	166	166
q3	10486	1127	1182	1127
q4	10217	761	763	761
q5	7751	2892	2833	2833
q6	214	130	129	129
q7	982	598	593	593
q8	9396	1996	2037	1996
q9	7486	6413	6411	6411
q10	7002	2287	2320	2287
q11	457	269	264	264
q12	411	213	208	208
q13	17806	2972	2975	2972
q14	231	214	203	203
q15	508	462	450	450
q16	449	374	380	374
q17	966	612	574	574
q18	7295	6633	6676	6633
q19	1401	1122	1027	1027
q20	492	206	209	206
q21	3850	3109	3168	3109
q22	1118	1014	971	971
Total cold run time: 108193 ms
Total hot run time: 39958 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6576	6587	6553	6553
q2	333	243	230	230
q3	2866	3003	2920	2920
q4	2051	1855	1812	1812
q5	5857	5568	5682	5568
q6	206	122	126	122
q7	2168	1789	1771	1771
q8	3336	3501	3525	3501
q9	8901	8947	8910	8910
q10	3572	3524	3587	3524
q11	574	491	500	491
q12	828	580	616	580
q13	7785	3202	3159	3159
q14	303	286	285	285
q15	498	472	477	472
q16	517	447	457	447
q17	1860	1617	1605	1605
q18	8170	7642	7757	7642
q19	1696	1478	1476	1476
q20	2082	1885	1853	1853
q21	5231	5097	5011	5011
q22	1158	1083	1044	1044
Total cold run time: 66568 ms
Total hot run time: 58976 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191277 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 002d64dd54e4d36973b41bfd0ae896526f47f8c6, data reload: false

query1	802	369	388	369
query2	6242	1956	1909	1909
query3	8689	203	204	203
query4	33723	23755	23464	23464
query5	3687	465	438	438
query6	298	181	193	181
query7	4209	323	328	323
query8	293	224	218	218
query9	9558	2571	2569	2569
query10	483	260	268	260
query11	17955	15180	15084	15084
query12	153	105	104	104
query13	1529	415	410	410
query14	8753	6635	7306	6635
query15	237	180	183	180
query16	8049	402	473	402
query17	1576	622	567	567
query18	2089	327	319	319
query19	246	165	153	153
query20	121	113	118	113
query21	212	105	106	105
query22	4729	4521	4478	4478
query23	34727	34257	33869	33869
query24	11798	2929	2887	2887
query25	697	411	403	403
query26	1808	175	172	172
query27	2932	368	360	360
query28	7993	2152	2158	2152
query29	1104	482	478	478
query30	288	160	158	158
query31	1044	840	819	819
query32	99	66	66	66
query33	786	312	316	312
query34	1257	516	530	516
query35	871	759	733	733
query36	1103	946	929	929
query37	266	76	73	73
query38	4118	3964	3992	3964
query39	1522	1485	1465	1465
query40	282	105	104	104
query41	51	51	54	51
query42	118	112	101	101
query43	520	477	484	477
query44	1258	819	826	819
query45	185	173	173	173
query46	1191	771	752	752
query47	2057	1947	1897	1897
query48	492	411	398	398
query49	1156	418	427	418
query50	833	437	435	435
query51	7392	7144	7083	7083
query52	106	103	89	89
query53	254	187	179	179
query54	1220	488	477	477
query55	83	81	81	81
query56	289	247	251	247
query57	1256	1198	1197	1197
query58	234	228	219	219
query59	3023	2943	2861	2861
query60	306	258	254	254
query61	109	107	113	107
query62	851	661	685	661
query63	218	181	185	181
query64	4945	630	607	607
query65	3271	3218	3194	3194
query66	1364	295	283	283
query67	15852	15417	15615	15417
query68	4840	591	587	587
query69	426	273	268	268
query70	1192	1043	1072	1043
query71	333	247	259	247
query72	6150	4022	3939	3939
query73	744	351	356	351
query74	10331	9052	9209	9052
query75	3344	2643	2651	2643
query76	2661	1108	1150	1108
query77	380	298	268	268
query78	10369	9579	9620	9579
query79	1407	598	605	598
query80	893	424	424	424
query81	532	218	222	218
query82	337	87	91	87
query83	239	147	148	147
query84	251	78	88	78
query85	1448	288	304	288
query86	465	302	298	298
query87	4443	4326	4277	4277
query88	4249	2537	2390	2390
query89	400	298	295	295
query90	2216	185	186	185
query91	180	168	174	168
query92	62	53	52	52
query93	1853	573	569	569
query94	1032	299	305	299
query95	364	274	267	267
query96	615	282	286	282
query97	3296	3166	3141	3141
query98	234	201	201	201
query99	1492	1326	1326	1326
Total cold run time: 301740 ms
Total hot run time: 191277 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.13 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 002d64dd54e4d36973b41bfd0ae896526f47f8c6, data reload: false

query1	0.03	0.03	0.04
query2	0.06	0.04	0.03
query3	0.24	0.06	0.06
query4	1.62	0.10	0.10
query5	0.53	0.52	0.53
query6	1.13	0.73	0.72
query7	0.04	0.01	0.01
query8	0.04	0.03	0.04
query9	0.58	0.51	0.51
query10	0.56	0.54	0.54
query11	0.14	0.11	0.11
query12	0.13	0.10	0.10
query13	0.60	0.59	0.58
query14	0.81	0.83	0.77
query15	0.85	0.82	0.83
query16	0.38	0.39	0.39
query17	1.00	1.05	1.04
query18	0.24	0.24	0.22
query19	1.96	1.90	1.89
query20	0.02	0.02	0.01
query21	15.39	0.58	0.58
query22	2.57	2.00	1.81
query23	16.88	0.98	0.86
query24	3.68	0.84	1.63
query25	0.25	0.18	0.04
query26	0.43	0.14	0.13
query27	0.04	0.04	0.03
query28	9.74	0.54	0.45
query29	12.63	3.19	3.18
query30	0.25	0.06	0.06
query31	2.87	0.39	0.37
query32	3.26	0.47	0.45
query33	2.98	3.03	3.03
query34	17.05	4.51	4.54
query35	4.60	4.58	4.54
query36	0.69	0.47	0.47
query37	0.09	0.07	0.06
query38	0.05	0.03	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.76 s
Total hot run time: 30.13 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@morningman morningman changed the title [Fix](oss-hdfs)Fix insert failure on Hive partitioned table with oss-hdfs branch-3.0: [Fix](oss-hdfs)Fix insert failure on Hive partitioned table with oss-hdfs Sep 3, 2025
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Sep 4, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2025

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 32710fd into apache:branch-3.0 Sep 5, 2025
25 of 26 checks passed
yiguolei pushed a commit that referenced this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.12-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants