Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jul 8, 2025

What problem does this PR solve?

Related PR: #51341

Problem Summary:
In pr #51341, hudiOrcReader was deleted, and this pr reintroduced it to read hudi orc table.
Although I encountered this error when testing spark-hudi to read orc, the orc file was indeed generated by spark-hudi.

java.lang.UnsupportedOperationException: Base file format is not currently supported (ORC)
        at org.apache.hudi.HoodieBaseRelation.createBaseFileReader(HoodieBaseRelation.scala:574) ~[hudi-spark3.4-bundle_2.12-0.14.0-1.jar:0.14.0-1]
        at org.apache.hudi.BaseFileOnlyRelation.composeRDD(BaseFileOnlyRelation.scala:96) ~[hudi-spark3.4-bundle_2.12-0.14.0-1.jar:0.14.0-1]
        at org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:381) ~[hudi-spark3.4-bundle_2.12-0.14.0-1.jar:0.14.0-1]
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$apply$4(DataSourceStrategy.scala:329) ~[spark-sql_2.12-3.4.2.jar:0.14.0-1]

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2025

run buildall

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33026 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e345146bd3057a819374d0cb5990509d66cb5196, data reload: false

------ Round 1 ----------------------------------
q1	17636	5220	5093	5093
q2	1939	280	183	183
q3	10529	1280	698	698
q4	10298	997	525	525
q5	8939	2298	2412	2298
q6	198	163	130	130
q7	868	772	592	592
q8	9308	1302	1100	1100
q9	6798	5131	5057	5057
q10	6876	2353	1939	1939
q11	465	280	267	267
q12	348	353	213	213
q13	17765	3667	3074	3074
q14	217	229	214	214
q15	546	480	490	480
q16	417	432	384	384
q17	587	873	351	351
q18	7522	7170	7091	7091
q19	1365	944	545	545
q20	330	343	215	215
q21	3576	2538	2285	2285
q22	356	308	292	292
Total cold run time: 106883 ms
Total hot run time: 33026 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5146	5144	5038	5038
q2	237	319	217	217
q3	2150	2663	2260	2260
q4	1371	1757	1281	1281
q5	4157	4306	4632	4306
q6	216	178	129	129
q7	1957	1919	1826	1826
q8	2615	2719	2639	2639
q9	7394	7435	7118	7118
q10	3053	3361	2842	2842
q11	578	537	493	493
q12	717	769	647	647
q13	3576	4005	3372	3372
q14	284	314	259	259
q15	511	527	485	485
q16	468	475	443	443
q17	1196	1571	1365	1365
q18	7942	7683	7633	7633
q19	844	808	889	808
q20	2094	2031	1968	1968
q21	5041	4560	4579	4560
q22	622	604	562	562
Total cold run time: 52169 ms
Total hot run time: 50251 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184737 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e345146bd3057a819374d0cb5990509d66cb5196, data reload: false

query1	997	386	386	386
query2	6529	1708	1758	1708
query3	6736	212	207	207
query4	26267	23239	23133	23133
query5	4359	570	427	427
query6	273	195	180	180
query7	4612	480	276	276
query8	274	210	205	205
query9	8593	2595	2608	2595
query10	462	323	283	283
query11	15297	14920	14680	14680
query12	159	103	101	101
query13	1659	527	408	408
query14	8489	5633	5587	5587
query15	198	187	176	176
query16	7141	641	476	476
query17	1040	698	559	559
query18	2018	393	293	293
query19	188	185	174	174
query20	118	113	109	109
query21	207	121	102	102
query22	4008	4180	4344	4180
query23	34733	33944	33428	33428
query24	8473	2379	2341	2341
query25	526	451	377	377
query26	1224	262	140	140
query27	2777	504	333	333
query28	4309	2102	2098	2098
query29	741	552	429	429
query30	283	212	179	179
query31	895	819	732	732
query32	68	64	58	58
query33	550	396	289	289
query34	801	833	508	508
query35	761	835	732	732
query36	925	978	857	857
query37	107	96	72	72
query38	4068	4177	3965	3965
query39	1482	1392	1384	1384
query40	206	116	103	103
query41	57	57	49	49
query42	123	104	112	104
query43	473	506	464	464
query44	1298	850	831	831
query45	214	164	166	164
query46	827	991	617	617
query47	1745	1796	1712	1712
query48	376	414	319	319
query49	732	444	391	391
query50	634	688	398	398
query51	4136	4225	4125	4125
query52	114	106	100	100
query53	217	245	179	179
query54	583	545	493	493
query55	83	82	82	82
query56	290	290	276	276
query57	1178	1169	1125	1125
query58	260	248	246	246
query59	2710	2725	2712	2712
query60	316	313	317	313
query61	153	148	149	148
query62	793	726	645	645
query63	218	190	183	183
query64	4497	1010	641	641
query65	4259	4165	4187	4165
query66	1135	396	303	303
query67	15842	15493	15404	15404
query68	7802	868	533	533
query69	482	300	265	265
query70	1181	1119	1099	1099
query71	406	316	308	308
query72	5828	4762	4825	4762
query73	657	642	347	347
query74	8980	9114	8919	8919
query75	3171	3155	2638	2638
query76	3244	1133	728	728
query77	469	372	367	367
query78	9931	10106	9285	9285
query79	2217	883	578	578
query80	655	505	451	451
query81	485	252	217	217
query82	176	131	95	95
query83	253	239	221	221
query84	278	96	93	93
query85	758	415	310	310
query86	370	277	281	277
query87	4366	4408	4334	4334
query88	3450	2247	2235	2235
query89	370	303	280	280
query90	1994	208	200	200
query91	132	136	110	110
query92	75	62	57	57
query93	2204	927	573	573
query94	682	420	312	312
query95	364	312	282	282
query96	483	551	276	276
query97	2703	2738	2643	2643
query98	236	209	203	203
query99	1330	1375	1280	1280
Total cold run time: 271548 ms
Total hot run time: 184737 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e345146bd3057a819374d0cb5990509d66cb5196, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.24	0.08	0.07
query4	1.62	0.11	0.10
query5	0.43	0.42	0.40
query6	1.17	0.67	0.66
query7	0.03	0.02	0.02
query8	0.04	0.03	0.04
query9	0.60	0.50	0.50
query10	0.58	0.57	0.57
query11	0.16	0.11	0.11
query12	0.16	0.11	0.11
query13	0.62	0.61	0.61
query14	0.80	0.81	0.80
query15	0.89	0.88	0.85
query16	0.39	0.40	0.38
query17	1.06	1.07	1.10
query18	0.22	0.21	0.21
query19	1.99	1.88	1.78
query20	0.01	0.02	0.01
query21	15.42	0.89	0.54
query22	0.75	1.10	0.71
query23	15.07	1.37	0.63
query24	6.80	0.82	1.14
query25	0.51	0.15	0.09
query26	0.55	0.16	0.13
query27	0.06	0.05	0.05
query28	9.37	0.92	0.44
query29	12.53	3.87	3.26
query30	0.25	0.09	0.06
query31	2.85	0.61	0.38
query32	3.24	0.56	0.47
query33	3.05	3.06	3.08
query34	16.14	5.43	4.79
query35	4.84	4.88	4.89
query36	0.70	0.49	0.49
query37	0.09	0.07	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.16	0.15	0.13
query41	0.09	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.74 s
Total hot run time: 29.47 s

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32767 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f6f44bb67e0bb52266a6b58504c478a561545898, data reload: false

------ Round 1 ----------------------------------
q1	17583	5213	5027	5027
q2	1922	277	185	185
q3	10299	1265	701	701
q4	10227	1033	500	500
q5	7508	2915	2303	2303
q6	173	159	128	128
q7	882	732	584	584
q8	9305	1276	1027	1027
q9	6755	5104	5117	5104
q10	6883	2366	1957	1957
q11	468	293	270	270
q12	346	341	214	214
q13	17783	3660	3011	3011
q14	223	226	206	206
q15	553	490	472	472
q16	424	410	372	372
q17	594	833	373	373
q18	7471	7034	6978	6978
q19	1504	925	522	522
q20	330	348	221	221
q21	3719	2514	2329	2329
q22	358	314	283	283
Total cold run time: 105310 ms
Total hot run time: 32767 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5096	5407	5298	5298
q2	242	320	215	215
q3	2157	2666	2233	2233
q4	1354	1750	1300	1300
q5	4166	4096	4535	4096
q6	221	166	127	127
q7	1998	1972	1767	1767
q8	2618	2554	2591	2554
q9	7318	7290	7229	7229
q10	3093	3282	2851	2851
q11	572	536	516	516
q12	654	771	644	644
q13	3611	3976	3314	3314
q14	278	293	304	293
q15	516	477	479	477
q16	461	482	455	455
q17	1157	1550	1375	1375
q18	8009	7779	7748	7748
q19	767	788	927	788
q20	1951	1970	1873	1873
q21	4750	4321	4283	4283
q22	639	635	551	551
Total cold run time: 51628 ms
Total hot run time: 49987 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184789 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f6f44bb67e0bb52266a6b58504c478a561545898, data reload: false

query1	1023	376	409	376
query2	6544	1761	1738	1738
query3	6752	212	208	208
query4	27503	23372	23601	23372
query5	4729	566	433	433
query6	309	209	192	192
query7	4632	493	294	294
query8	273	216	214	214
query9	8601	2572	2583	2572
query10	491	323	271	271
query11	15676	15030	14758	14758
query12	153	107	98	98
query13	1632	500	393	393
query14	8753	5735	5844	5735
query15	201	183	165	165
query16	7463	417	253	253
query17	1317	713	561	561
query18	1973	397	288	288
query19	186	182	145	145
query20	120	123	108	108
query21	205	120	103	103
query22	4015	4188	3964	3964
query23	33838	33133	33032	33032
query24	8392	2349	2345	2345
query25	520	490	387	387
query26	1225	260	144	144
query27	2778	498	334	334
query28	4360	2096	2069	2069
query29	727	554	448	448
query30	290	223	193	193
query31	920	817	735	735
query32	68	61	63	61
query33	561	342	308	308
query34	794	854	521	521
query35	614	667	591	591
query36	952	964	891	891
query37	117	103	80	80
query38	4140	4170	4076	4076
query39	1503	1389	1440	1389
query40	213	110	100	100
query41	54	54	51	51
query42	117	107	104	104
query43	501	507	469	469
query44	1296	827	808	808
query45	176	169	166	166
query46	823	1028	623	623
query47	1720	1797	1713	1713
query48	379	407	306	306
query49	784	476	388	388
query50	616	685	414	414
query51	4111	4120	4124	4120
query52	105	103	96	96
query53	221	245	192	192
query54	563	549	511	511
query55	81	80	80	80
query56	381	300	275	275
query57	1189	1197	1104	1104
query58	256	245	252	245
query59	2643	2680	2581	2581
query60	330	312	297	297
query61	127	122	123	122
query62	807	705	637	637
query63	220	179	193	179
query64	4409	1356	1015	1015
query65	4271	4176	4147	4147
query66	1139	485	310	310
query67	15497	15386	15314	15314
query68	8560	884	521	521
query69	501	307	265	265
query70	1208	1148	1082	1082
query71	484	328	292	292
query72	5612	4663	4728	4663
query73	688	595	354	354
query74	9143	8816	8798	8798
query75	3939	3167	2675	2675
query76	3656	1131	699	699
query77	783	382	290	290
query78	10920	11051	10110	10110
query79	1930	810	563	563
query80	583	515	433	433
query81	471	251	221	221
query82	437	125	99	99
query83	249	252	233	233
query84	252	98	87	87
query85	844	360	315	315
query86	330	285	292	285
query87	4491	4396	4223	4223
query88	3351	2278	2261	2261
query89	384	322	300	300
query90	1931	211	199	199
query91	139	138	108	108
query92	78	56	58	56
query93	1197	952	593	593
query94	688	313	192	192
query95	372	287	283	283
query96	497	563	272	272
query97	2672	2783	2644	2644
query98	230	211	198	198
query99	1451	1415	1263	1263
Total cold run time: 275450 ms
Total hot run time: 184789 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f6f44bb67e0bb52266a6b58504c478a561545898, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.04
query3	0.25	0.08	0.07
query4	1.62	0.11	0.11
query5	0.45	0.45	0.41
query6	1.16	0.67	0.65
query7	0.03	0.02	0.02
query8	0.04	0.03	0.04
query9	0.61	0.52	0.53
query10	0.58	0.56	0.58
query11	0.15	0.11	0.11
query12	0.15	0.12	0.11
query13	0.62	0.61	0.62
query14	0.78	0.81	0.82
query15	0.89	0.86	0.86
query16	0.39	0.39	0.40
query17	1.10	1.05	1.03
query18	0.22	0.21	0.20
query19	1.95	1.79	1.86
query20	0.01	0.01	0.01
query21	15.39	0.90	0.53
query22	0.75	1.25	0.72
query23	14.84	1.42	0.65
query24	6.90	1.41	0.59
query25	0.52	0.16	0.09
query26	0.58	0.17	0.14
query27	0.06	0.06	0.05
query28	9.78	0.88	0.43
query29	12.58	3.93	3.26
query30	0.26	0.09	0.07
query31	2.85	0.58	0.40
query32	3.24	0.56	0.47
query33	3.07	3.08	3.19
query34	16.06	5.40	4.77
query35	4.84	4.82	4.80
query36	0.71	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.14	0.14
query41	0.07	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.04 s
Total hot run time: 29.22 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 32.35% (11/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.32% (15507/27053)
Line Coverage 46.30% (140933/304359)
Region Coverage 45.56% (71265/156414)
Branch Coverage 40.28% (37551/93226)

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2025

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 9, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2025

PR approved by anyone and no changes requested.

@morningman morningman merged commit 052105b into apache:master Jul 9, 2025
27 of 28 checks passed
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 15, 2025
…e#52964)

Related PR: apache#51341

Problem Summary:
In pr apache#51341, hudiOrcReader was deleted, and this pr reintroduced it to
read hudi orc table.
Although I encountered this error when testing spark-hudi to read orc,
the orc file was indeed generated by spark-hudi.

```
java.lang.UnsupportedOperationException: Base file format is not currently supported (ORC)
        at org.apache.hudi.HoodieBaseRelation.createBaseFileReader(HoodieBaseRelation.scala:574) ~[hudi-spark3.4-bundle_2.12-0.14.0-1.jar:0.14.0-1]
        at org.apache.hudi.BaseFileOnlyRelation.composeRDD(BaseFileOnlyRelation.scala:96) ~[hudi-spark3.4-bundle_2.12-0.14.0-1.jar:0.14.0-1]
        at org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:381) ~[hudi-spark3.4-bundle_2.12-0.14.0-1.jar:0.14.0-1]
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$apply$4(DataSourceStrategy.scala:329) ~[spark-sql_2.12-3.4.2.jar:0.14.0-1]
```
morrySnow pushed a commit that referenced this pull request Jul 16, 2025
…ables after schema changes. #51341 #52964 #52954 #53055 (#53170)

bp #51341: support read hudi/paimon/iceberg schema change
bp #52964: add hudi orc  reader
bp #52954: support timestamp to bigint
bp #53055: fix paimon docker version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants