Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Mar 13, 2025

What problem does this PR solve?

Similar to pr #48723
Problem Summary:

  1. Supports native reader reading tables after the top-level schema of hudi is changed, but does not support tables after the internal schema of struct is changed.
    change internal schema of struct schema(not support, will support in the next PR).

  2. Unify the logic of iceberg/paimon/hudi native reader to handle schema change's table.

Release note

Supports native reader reading tables after the top-level schema of hudi is changed.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.88% (1075/1297)
Line Coverage: 65.85% (17731/26927)
Region Coverage: 65.19% (8729/13391)
Branch Coverage: 55.13% (4706/8536)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f356e2dea1324da74b85948729797c2cc76c9ef1_f356e2dea1324da74b85948729797c2cc76c9ef1_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32485 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f356e2dea1324da74b85948729797c2cc76c9ef1, data reload: false

------ Round 1 ----------------------------------
q1	17585	5262	5069	5069
q2	2048	288	164	164
q3	10416	1350	768	768
q4	10207	1048	532	532
q5	7540	2409	2795	2409
q6	196	161	132	132
q7	926	745	616	616
q8	9317	1450	1140	1140
q9	5022	4799	4638	4638
q10	6824	2309	1870	1870
q11	477	277	257	257
q12	356	359	211	211
q13	17785	3748	3124	3124
q14	235	236	208	208
q15	549	503	480	480
q16	654	604	589	589
q17	612	878	345	345
q18	6776	6433	6247	6247
q19	2591	969	549	549
q20	318	327	194	194
q21	2872	2137	1951	1951
q22	1060	1064	992	992
Total cold run time: 104366 ms
Total hot run time: 32485 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5319	5149	5137	5137
q2	239	327	240	240
q3	2164	2694	2299	2299
q4	1469	1851	1387	1387
q5	4257	4115	4196	4115
q6	207	162	124	124
q7	2035	1954	1768	1768
q8	2635	2573	2645	2573
q9	7278	7259	7219	7219
q10	3025	3212	2765	2765
q11	592	515	498	498
q12	715	728	565	565
q13	3457	3926	3303	3303
q14	276	297	275	275
q15	547	483	465	465
q16	641	671	648	648
q17	1164	1581	1374	1374
q18	7930	7748	7513	7513
q19	847	854	961	854
q20	1943	2051	1879	1879
q21	5581	5082	4716	4716
q22	1157	1056	1048	1048
Total cold run time: 53478 ms
Total hot run time: 50765 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192091 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f356e2dea1324da74b85948729797c2cc76c9ef1, data reload: false

query1	1404	996	1013	996
query2	6152	1933	1862	1862
query3	11172	4618	4624	4618
query4	26182	23652	22976	22976
query5	3988	644	468	468
query6	326	206	191	191
query7	4006	504	296	296
query8	290	271	246	246
query9	8506	2605	2600	2600
query10	473	310	267	267
query11	15791	15283	15013	15013
query12	167	110	114	110
query13	1579	548	391	391
query14	9841	6095	6307	6095
query15	211	188	175	175
query16	7638	663	490	490
query17	1187	767	592	592
query18	2024	418	347	347
query19	198	200	175	175
query20	128	125	123	123
query21	210	128	111	111
query22	4500	4513	4272	4272
query23	34361	33500	33388	33388
query24	7952	2403	2436	2403
query25	511	475	411	411
query26	1164	276	156	156
query27	2491	507	347	347
query28	4425	2477	2436	2436
query29	703	575	438	438
query30	273	220	190	190
query31	918	875	811	811
query32	74	67	65	65
query33	521	358	303	303
query34	795	877	531	531
query35	809	855	775	775
query36	963	1016	898	898
query37	119	103	69	69
query38	4261	4277	4173	4173
query39	1507	1440	1495	1440
query40	222	143	102	102
query41	56	56	48	48
query42	124	105	108	105
query43	517	513	470	470
query44	1312	782	797	782
query45	188	168	169	168
query46	850	1034	646	646
query47	1854	1897	1771	1771
query48	398	429	316	316
query49	766	496	443	443
query50	742	752	426	426
query51	4276	4318	4258	4258
query52	108	100	100	100
query53	245	268	213	213
query54	491	510	414	414
query55	81	83	81	81
query56	277	278	267	267
query57	1185	1200	1112	1112
query58	250	243	250	243
query59	2726	2908	2839	2839
query60	292	275	253	253
query61	121	123	115	115
query62	806	731	696	696
query63	228	193	202	193
query64	4168	1023	733	733
query65	4614	4530	4507	4507
query66	1105	395	287	287
query67	16266	15602	15419	15419
query68	8913	874	496	496
query69	478	296	258	258
query70	1203	1111	1103	1103
query71	467	299	258	258
query72	5241	3628	3757	3628
query73	784	738	357	357
query74	9234	8929	8785	8785
query75	3921	3177	2689	2689
query76	3727	1195	737	737
query77	781	378	326	326
query78	9985	10362	9373	9373
query79	2510	821	594	594
query80	614	532	463	463
query81	491	261	223	223
query82	724	126	96	96
query83	174	175	158	158
query84	247	96	73	73
query85	822	365	311	311
query86	383	327	291	291
query87	4601	4478	4425	4425
query88	3677	2301	2291	2291
query89	429	370	287	287
query90	1930	214	213	213
query91	144	139	108	108
query92	78	59	56	56
query93	1828	1102	575	575
query94	656	413	300	300
query95	360	276	267	267
query96	490	569	280	280
query97	3327	3380	3305	3305
query98	237	212	203	203
query99	1315	1394	1307	1307
Total cold run time: 281859 ms
Total hot run time: 192091 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.49 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f356e2dea1324da74b85948729797c2cc76c9ef1, data reload: false

query1	0.03	0.03	0.04
query2	0.10	0.05	0.05
query3	0.27	0.05	0.05
query4	1.61	0.07	0.08
query5	0.55	0.57	0.56
query6	1.20	0.71	0.73
query7	0.04	0.02	0.02
query8	0.06	0.05	0.04
query9	0.62	0.53	0.53
query10	0.58	0.57	0.58
query11	0.25	0.12	0.12
query12	0.25	0.13	0.13
query13	0.64	0.62	0.62
query14	2.69	2.72	2.81
query15	1.02	0.89	0.87
query16	0.37	0.38	0.37
query17	1.04	1.03	1.05
query18	0.18	0.18	0.19
query19	2.05	1.97	1.84
query20	0.02	0.01	0.01
query21	15.35	0.97	0.68
query22	0.92	1.05	0.78
query23	14.69	1.51	0.76
query24	5.27	0.61	0.31
query25	0.17	0.10	0.09
query26	0.56	0.21	0.18
query27	0.09	0.08	0.09
query28	11.04	1.15	0.58
query29	12.52	4.02	3.37
query30	0.28	0.09	0.06
query31	2.84	0.62	0.42
query32	3.23	0.60	0.49
query33	3.09	3.13	3.08
query34	16.61	5.23	4.44
query35	4.51	4.53	4.50
query36	0.63	0.50	0.49
query37	0.20	0.17	0.17
query38	0.17	0.17	0.16
query39	0.05	0.04	0.04
query40	0.19	0.15	0.15
query41	0.12	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.05	0.04
Total cold run time: 106.21 s
Total hot run time: 31.49 s

@hubgeter hubgeter changed the title [hudi](schema change)support hudi top level schema change. [enhancement](hudi)support read hudi top level schema change table. Mar 14, 2025
@hubgeter hubgeter changed the title [enhancement](hudi)support read hudi top level schema change table. [enhancement](hudi)support native read hudi top level schema change table. Mar 14, 2025
@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.88% (1075/1297)
Line Coverage: 65.88% (17737/26923)
Region Coverage: 65.21% (8728/13385)
Branch Coverage: 55.20% (4710/8532)
Coverage Report: http://coverage.selectdb-in.cc/coverage/48fcb9db5b65a868a67043dd4d02867af1f599d5_48fcb9db5b65a868a67043dd4d02867af1f599d5_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32527 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 48fcb9db5b65a868a67043dd4d02867af1f599d5, data reload: false

------ Round 1 ----------------------------------
q1	17584	5209	5061	5061
q2	2047	303	164	164
q3	10415	1285	747	747
q4	10215	1020	547	547
q5	7537	2392	2377	2377
q6	186	163	133	133
q7	935	755	626	626
q8	9335	1331	1111	1111
q9	4965	4737	4863	4737
q10	6857	2309	1898	1898
q11	484	283	279	279
q12	355	353	220	220
q13	17785	3692	3096	3096
q14	229	225	214	214
q15	527	475	484	475
q16	617	613	590	590
q17	581	885	355	355
q18	6954	6478	6232	6232
q19	2134	955	570	570
q20	308	331	199	199
q21	2915	2303	1925	1925
q22	1031	1023	971	971
Total cold run time: 103996 ms
Total hot run time: 32527 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5281	5584	5124	5124
q2	238	324	234	234
q3	2176	2710	2314	2314
q4	1440	1855	1397	1397
q5	4239	4168	4192	4168
q6	203	163	123	123
q7	1958	1969	1805	1805
q8	2606	2563	2589	2563
q9	7233	7239	7139	7139
q10	3060	3268	2781	2781
q11	574	507	501	501
q12	678	748	590	590
q13	3484	3913	3275	3275
q14	290	301	278	278
q15	517	466	495	466
q16	650	725	642	642
q17	1174	1603	1383	1383
q18	7892	7578	7453	7453
q19	840	865	834	834
q20	1998	2036	1904	1904
q21	5677	5100	4814	4814
q22	1116	1055	1037	1037
Total cold run time: 53324 ms
Total hot run time: 50825 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 48fcb9db5b65a868a67043dd4d02867af1f599d5, data reload: false

query1	1422	1002	977	977
query2	6154	1940	1930	1930
query3	10995	4635	4535	4535
query4	53932	25421	23635	23635
query5	5258	537	491	491
query6	376	205	182	182
query7	5183	498	295	295
query8	327	243	239	239
query9	6901	2614	2612	2612
query10	421	313	247	247
query11	15396	15014	14956	14956
query12	166	107	106	106
query13	1190	546	431	431
query14	11053	6500	7022	6500
query15	207	212	178	178
query16	7027	647	486	486
query17	1112	769	580	580
query18	1566	411	343	343
query19	205	198	170	170
query20	130	128	130	128
query21	214	123	104	104
query22	4261	4428	4510	4428
query23	34275	33464	33503	33464
query24	5865	2471	2495	2471
query25	488	452	419	419
query26	769	285	157	157
query27	1833	511	334	334
query28	2838	2473	2469	2469
query29	624	560	428	428
query30	274	224	184	184
query31	920	884	819	819
query32	82	65	62	62
query33	481	363	304	304
query34	757	892	511	511
query35	788	833	775	775
query36	958	983	944	944
query37	121	105	74	74
query38	4283	4273	4241	4241
query39	1497	1457	1451	1451
query40	211	115	104	104
query41	51	53	49	49
query42	121	103	113	103
query43	499	517	480	480
query44	1331	826	833	826
query45	180	177	166	166
query46	860	1026	652	652
query47	1822	1909	1780	1780
query48	378	424	334	334
query49	715	529	447	447
query50	711	762	417	417
query51	4251	4337	4278	4278
query52	122	109	99	99
query53	235	258	194	194
query54	502	487	412	412
query55	94	90	80	80
query56	285	260	260	260
query57	1160	1205	1100	1100
query58	257	245	240	240
query59	2852	2975	2798	2798
query60	286	291	257	257
query61	119	119	119	119
query62	763	750	676	676
query63	235	188	191	188
query64	2236	1020	681	681
query65	4563	4444	4365	4365
query66	759	397	288	288
query67	15805	15716	15187	15187
query68	7737	873	508	508
query69	536	321	259	259
query70	1205	1091	1139	1091
query71	489	292	271	271
query72	5647	3585	3700	3585
query73	1197	742	344	344
query74	9047	9180	9097	9097
query75	3648	3148	2700	2700
query76	4308	1206	760	760
query77	637	367	278	278
query78	10074	10210	9310	9310
query79	2226	826	595	595
query80	646	518	445	445
query81	473	254	228	228
query82	687	132	93	93
query83	177	170	147	147
query84	287	97	73	73
query85	847	345	302	302
query86	407	307	279	279
query87	4409	4546	4346	4346
query88	3300	2242	2270	2242
query89	405	305	275	275
query90	1901	208	209	208
query91	143	136	105	105
query92	75	60	55	55
query93	1479	1064	580	580
query94	662	418	303	303
query95	345	269	260	260
query96	487	560	278	278
query97	3421	3425	3335	3335
query98	221	202	204	202
query99	1419	1385	1241	1241
Total cold run time: 300294 ms
Total hot run time: 193035 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.45 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 48fcb9db5b65a868a67043dd4d02867af1f599d5, data reload: false

query1	0.03	0.03	0.03
query2	0.11	0.04	0.05
query3	0.28	0.05	0.05
query4	1.61	0.08	0.08
query5	0.56	0.55	0.55
query6	1.19	0.73	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.62	0.53	0.53
query10	0.58	0.58	0.58
query11	0.27	0.13	0.13
query12	0.25	0.12	0.14
query13	0.63	0.61	0.61
query14	2.81	2.79	2.75
query15	0.99	0.87	0.86
query16	0.38	0.38	0.38
query17	1.05	1.00	1.06
query18	0.18	0.19	0.18
query19	2.14	1.87	1.88
query20	0.02	0.01	0.01
query21	15.37	0.97	0.66
query22	0.93	1.05	0.81
query23	14.72	1.52	0.75
query24	5.48	0.58	0.28
query25	0.16	0.10	0.09
query26	0.55	0.22	0.18
query27	0.08	0.08	0.09
query28	10.95	1.15	0.58
query29	12.52	4.07	3.42
query30	0.27	0.08	0.06
query31	2.84	0.62	0.42
query32	3.24	0.59	0.50
query33	3.00	3.04	3.05
query34	16.39	5.15	4.42
query35	4.53	4.47	4.57
query36	0.64	0.51	0.50
query37	0.20	0.18	0.17
query38	0.16	0.15	0.15
query39	0.05	0.04	0.05
query40	0.19	0.19	0.15
query41	0.12	0.06	0.05
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 106.27 s
Total hot run time: 31.45 s

@hubgeter
Copy link
Contributor Author

run buildall

@hubgeter hubgeter marked this pull request as ready for review March 17, 2025 07:19
@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 82.88% (1075/1297)
Line Coverage: 65.88% (17760/26959)
Region Coverage: 65.19% (8745/13414)
Branch Coverage: 55.08% (4716/8562)
Coverage Report: http://coverage.selectdb-in.cc/coverage/af7202943ee8ddd46cb3621d0a01391473e3d92a_af7202943ee8ddd46cb3621d0a01391473e3d92a_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32573 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit af7202943ee8ddd46cb3621d0a01391473e3d92a, data reload: false

------ Round 1 ----------------------------------
q1	24330	5081	5278	5081
q2	2052	313	186	186
q3	10427	1281	681	681
q4	10232	1051	528	528
q5	7840	2516	2405	2405
q6	190	166	134	134
q7	914	744	606	606
q8	9319	1354	1074	1074
q9	4953	4901	4697	4697
q10	6812	2331	1901	1901
q11	493	279	254	254
q12	357	357	219	219
q13	17775	3742	3082	3082
q14	229	237	210	210
q15	530	487	480	480
q16	646	610	574	574
q17	589	874	349	349
q18	7006	6387	6379	6379
q19	2527	981	587	587
q20	327	331	189	189
q21	2766	2207	1960	1960
q22	1058	1027	997	997
Total cold run time: 111372 ms
Total hot run time: 32573 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5310	5137	5101	5101
q2	242	334	229	229
q3	2153	2670	2241	2241
q4	1425	1846	1371	1371
q5	4247	4337	4448	4337
q6	228	171	131	131
q7	2065	1944	1795	1795
q8	2617	2561	2569	2561
q9	7376	7293	7226	7226
q10	2924	3159	2836	2836
q11	586	513	510	510
q12	735	773	636	636
q13	3527	3980	3380	3380
q14	272	298	271	271
q15	522	493	468	468
q16	639	688	666	666
q17	1177	1573	1334	1334
q18	7875	7550	7603	7550
q19	870	850	954	850
q20	1982	2045	1908	1908
q21	5352	5038	4948	4948
q22	1063	1056	1018	1018
Total cold run time: 53187 ms
Total hot run time: 51367 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191672 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit af7202943ee8ddd46cb3621d0a01391473e3d92a, data reload: false

query1	1394	1072	1026	1026
query2	6145	1884	1880	1880
query3	10994	4506	4400	4400
query4	57465	26210	23289	23289
query5	4932	501	489	489
query6	330	194	190	190
query7	4882	501	298	298
query8	321	255	246	246
query9	5503	2651	2649	2649
query10	427	307	247	247
query11	15170	14999	14857	14857
query12	161	109	107	107
query13	1032	532	400	400
query14	11095	6757	6344	6344
query15	215	219	180	180
query16	7213	648	441	441
query17	1093	721	577	577
query18	1693	407	314	314
query19	198	197	158	158
query20	129	121	127	121
query21	214	125	103	103
query22	4434	4473	4511	4473
query23	34282	33228	33160	33160
query24	5702	2426	2399	2399
query25	446	476	405	405
query26	699	275	155	155
query27	1733	514	339	339
query28	2768	2495	2486	2486
query29	595	573	433	433
query30	282	221	191	191
query31	895	892	829	829
query32	79	65	67	65
query33	476	401	312	312
query34	783	853	508	508
query35	810	863	760	760
query36	975	994	908	908
query37	123	106	78	78
query38	4175	4265	4335	4265
query39	1482	1431	1479	1431
query40	225	124	113	113
query41	58	98	51	51
query42	125	111	113	111
query43	489	516	484	484
query44	1317	814	813	813
query45	177	176	163	163
query46	843	1034	645	645
query47	1835	1902	1823	1823
query48	390	413	313	313
query49	695	523	424	424
query50	696	749	413	413
query51	4356	4434	4247	4247
query52	108	99	92	92
query53	227	277	196	196
query54	490	495	410	410
query55	76	77	80	77
query56	274	257	284	257
query57	1176	1180	1128	1128
query58	244	236	242	236
query59	2693	2762	2885	2762
query60	279	297	254	254
query61	121	126	124	124
query62	736	738	678	678
query63	231	194	194	194
query64	1877	1054	762	762
query65	4556	4439	4426	4426
query66	713	397	302	302
query67	15969	15537	15243	15243
query68	8303	813	498	498
query69	599	293	271	271
query70	1198	1130	1092	1092
query71	481	303	265	265
query72	5792	3628	3697	3628
query73	1283	742	352	352
query74	9279	9128	8703	8703
query75	3662	3131	2696	2696
query76	4289	1180	734	734
query77	603	371	274	274
query78	10210	10004	9255	9255
query79	3298	825	582	582
query80	762	529	447	447
query81	488	265	220	220
query82	713	127	94	94
query83	316	165	153	153
query84	366	97	79	79
query85	787	359	301	301
query86	387	301	312	301
query87	4410	4618	4382	4382
query88	3622	2263	2288	2263
query89	422	316	277	277
query90	1761	204	206	204
query91	140	138	111	111
query92	79	56	55	55
query93	2159	1068	571	571
query94	674	407	304	304
query95	348	267	250	250
query96	479	571	273	273
query97	3361	3407	3327	3327
query98	229	201	203	201
query99	1436	1412	1255	1255
Total cold run time: 304257 ms
Total hot run time: 191672 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.87 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit af7202943ee8ddd46cb3621d0a01391473e3d92a, data reload: false

query1	0.03	0.03	0.03
query2	0.14	0.11	0.11
query3	0.33	0.19	0.20
query4	1.60	0.19	0.20
query5	0.61	0.60	0.61
query6	1.18	0.74	0.71
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.63	0.52	0.52
query10	0.58	0.59	0.57
query11	0.25	0.13	0.13
query12	0.25	0.14	0.12
query13	0.64	0.63	0.62
query14	2.67	2.89	2.75
query15	0.99	0.88	0.87
query16	0.37	0.37	0.38
query17	1.06	1.08	1.02
query18	0.19	0.19	0.18
query19	2.01	2.00	1.81
query20	0.01	0.02	0.01
query21	15.36	0.99	0.67
query22	0.93	1.06	0.82
query23	14.70	1.65	0.80
query24	5.46	0.56	0.29
query25	0.17	0.08	0.08
query26	0.55	0.21	0.18
query27	0.08	0.08	0.08
query28	11.13	1.23	0.58
query29	12.53	4.06	3.42
query30	0.27	0.08	0.06
query31	2.82	0.65	0.43
query32	3.23	0.59	0.50
query33	3.03	3.16	3.12
query34	16.44	5.19	4.39
query35	4.47	4.52	4.46
query36	0.64	0.51	0.49
query37	0.20	0.17	0.18
query38	0.17	0.15	0.16
query39	0.05	0.04	0.04
query40	0.20	0.15	0.17
query41	0.12	0.05	0.04
query42	0.07	0.05	0.05
query43	0.05	0.06	0.04
Total cold run time: 106.28 s
Total hot run time: 31.87 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 17.56% (36/205) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 48.45% (12981/26794)
Line Coverage 37.91% (111249/293453)
Region Coverage 36.87% (56785/154020)
Branch Coverage 32.04% (28564/89140)

std::vector<uint64_t>* col_attributes,
std::string attribute) {
std::vector<int32_t>* col_attributes,
std::string attribute, bool& exist_attribute) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string attribute, bool& exist_attribute) {
std::string& attribute, bool* exist_attribute) {

Better use pointer for passout param

// Used in from_thrift, marking the next schema position that should be parsed
size_t _next_schema_pos;
std::unordered_map<uint64_t, std::string> _field_id_name_mapping;
std::map<int32_t, std::string> _field_id_name_mapping;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change to std::map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be compatible with TableSchemaChangeHelper interface.

return Status::OK();
}

Status TableSchemaChangeHelper::get_next_block_after(Block* block) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a timer here, to see how it may cost when we face to schema change situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary. It doesn't bring much performance overhead. It just replaces the name of the column.

}
};

class TableSchemaChangeHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need unit test for this class

super(schema, partitionColumns);
this.enableSchemaEvolution = enableSchemaEvolution;
if (enableSchemaEvolution) {
historySchemaCache = InternalSchemaCache.getHistoricalSchemas(hudiClient);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support to get this historySchemaCache outside the HudiSchemaCacheValue's contructor.
Let this cache value simple

} else {
HudiSchemaCacheValue hudiSchemaCacheValue = HudiUtils.getSchemaCacheValue(hmsTable);
if (hudiSchemaCacheValue.isEnableSchemaEvolution()) {
long commitInstantTime = Long.parseLong(FSUtils.getCommitTime(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is too heavy to call this for each split

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity cloud ut coverage result:
Function Coverage: 83.09% (1081/1301)
Line Coverage: 66.07% (18007/27253)
Region Coverage: 65.44% (8868/13551)
Branch Coverage: 55.30% (4776/8636)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1c04ffcd1ac52fa77432b3baad61bcf20be6269c_1c04ffcd1ac52fa77432b3baad61bcf20be6269c_cloud/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32616 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1c04ffcd1ac52fa77432b3baad61bcf20be6269c, data reload: false

------ Round 1 ----------------------------------
q1	24336	5684	5034	5034
q2	2044	309	185	185
q3	10379	1263	691	691
q4	10237	997	525	525
q5	7502	2392	2372	2372
q6	196	164	133	133
q7	919	750	619	619
q8	9317	1302	1121	1121
q9	4892	4825	4783	4783
q10	6813	2302	1905	1905
q11	480	289	270	270
q12	360	352	228	228
q13	17773	3668	3114	3114
q14	225	222	214	214
q15	538	475	494	475
q16	629	604	581	581
q17	600	874	351	351
q18	7011	6375	6402	6375
q19	1803	960	550	550
q20	307	323	193	193
q21	2774	2205	1954	1954
q22	1030	1026	943	943
Total cold run time: 110165 ms
Total hot run time: 32616 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5225	5152	5132	5132
q2	247	329	232	232
q3	2170	2662	2298	2298
q4	1406	1817	1408	1408
q5	4268	4195	4504	4195
q6	216	173	138	138
q7	2001	1932	1730	1730
q8	2639	2635	2591	2591
q9	7185	7224	7166	7166
q10	3056	3244	2801	2801
q11	584	517	503	503
q12	697	758	599	599
q13	3465	3812	3165	3165
q14	314	284	285	284
q15	506	472	457	457
q16	634	710	669	669
q17	1164	1575	1356	1356
q18	7825	7652	7472	7472
q19	805	831	861	831
q20	1955	2028	1843	1843
q21	5435	4949	4654	4654
q22	1045	1058	1001	1001
Total cold run time: 52842 ms
Total hot run time: 50525 ms

koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…able. (apache#49051)

### What problem does this PR solve?
Similar to pr apache#48723
Problem Summary:

1. Supports native reader reading tables after the top-level schema of
hudi is changed, but does not support tables after the internal schema
of struct is changed.
change internal schema of struct schema(not support, will support in the
next PR).

2. Unify the logic of iceberg/paimon/hudi native reader to handle schema
change's table.
hubgeter added a commit to hubgeter/doris that referenced this pull request Jun 28, 2025
…able. (apache#49051)

Similar to pr apache#48723
Problem Summary:

1. Supports native reader reading tables after the top-level schema of
hudi is changed, but does not support tables after the internal schema
of struct is changed.
change internal schema of struct schema(not support, will support in the
next PR).

2. Unify the logic of iceberg/paimon/hudi native reader to handle schema
change's table.
morrySnow pushed a commit that referenced this pull request Jun 30, 2025
morningman pushed a commit that referenced this pull request Jul 4, 2025
…schema changes. (#51341)

### What problem does this PR solve?
Related PR: #49051

Problem Summary:

Support reading Hudi and Paimon Iceberg tables after the internal schema
of struct is changed.
1. Introduce `hive_reader` to avoid confusion between `hive` and
`parquet/orc` reader
2. Before this, support for reading tables after schema changes of
ordinary columns relied on changing the column name in block, so that
parquet/orc reader can read specific file columns when `get_next_block`,
and `hudi/iceberg/paimon reader` will mix `file column names` with
`table column names` when using parquet/orc reader.
This pr clarifies that all calls to `parquet/orc reader` are based on
the concept of `table column names`, and then introduces
`TableSchemaChangeHelper::Node` to help `parquet/orc reader` find the
specific file columns to be read.
koarz pushed a commit to koarz/doris that referenced this pull request Jul 4, 2025
…schema changes. (apache#51341)

### What problem does this PR solve?
Related PR: apache#49051

Problem Summary:

Support reading Hudi and Paimon Iceberg tables after the internal schema
of struct is changed.
1. Introduce `hive_reader` to avoid confusion between `hive` and
`parquet/orc` reader
2. Before this, support for reading tables after schema changes of
ordinary columns relied on changing the column name in block, so that
parquet/orc reader can read specific file columns when `get_next_block`,
and `hudi/iceberg/paimon reader` will mix `file column names` with
`table column names` when using parquet/orc reader.
This pr clarifies that all calls to `parquet/orc reader` are based on
the concept of `table column names`, and then introduces
`TableSchemaChangeHelper::Node` to help `parquet/orc reader` find the
specific file columns to be read.
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 4, 2025
…schema changes. (apache#51341)

### What problem does this PR solve?
Related PR: apache#49051

Problem Summary:

Support reading Hudi and Paimon Iceberg tables after the internal schema
of struct is changed.
1. Introduce `hive_reader` to avoid confusion between `hive` and
`parquet/orc` reader
2. Before this, support for reading tables after schema changes of
ordinary columns relied on changing the column name in block, so that
parquet/orc reader can read specific file columns when `get_next_block`,
and `hudi/iceberg/paimon reader` will mix `file column names` with
`table column names` when using parquet/orc reader.
This pr clarifies that all calls to `parquet/orc reader` are based on
the concept of `table column names`, and then introduces
`TableSchemaChangeHelper::Node` to help `parquet/orc reader` find the
specific file columns to be read.
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 13, 2025
…schema changes. (apache#51341)

Related PR: apache#49051

Problem Summary:

Support reading Hudi and Paimon Iceberg tables after the internal schema
of struct is changed.
1. Introduce `hive_reader` to avoid confusion between `hive` and
`parquet/orc` reader
2. Before this, support for reading tables after schema changes of
ordinary columns relied on changing the column name in block, so that
parquet/orc reader can read specific file columns when `get_next_block`,
and `hudi/iceberg/paimon reader` will mix `file column names` with
`table column names` when using parquet/orc reader.
This pr clarifies that all calls to `parquet/orc reader` are based on
the concept of `table column names`, and then introduces
`TableSchemaChangeHelper::Node` to help `parquet/orc reader` find the
specific file columns to be read.
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 15, 2025
…schema changes. (apache#51341)

Related PR: apache#49051

Problem Summary:

Support reading Hudi and Paimon Iceberg tables after the internal schema
of struct is changed.
1. Introduce `hive_reader` to avoid confusion between `hive` and
`parquet/orc` reader
2. Before this, support for reading tables after schema changes of
ordinary columns relied on changing the column name in block, so that
parquet/orc reader can read specific file columns when `get_next_block`,
and `hudi/iceberg/paimon reader` will mix `file column names` with
`table column names` when using parquet/orc reader.
This pr clarifies that all calls to `parquet/orc reader` are based on
the concept of `table column names`, and then introduces
`TableSchemaChangeHelper::Node` to help `parquet/orc reader` find the
specific file columns to be read.
dataroaring pushed a commit that referenced this pull request Aug 12, 2025
…rg.id orc file.(#49051) (#54167)

### What problem does this PR solve?
pick #49051
but only fix:
```
terminate called after throwing an instance of 'std::range_error'
  what():  Key not found: iceberg.id
*** Query id: 6a93d7cdc9f44370-a40b07934a14c81b ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1753842428 (unix time) try "date -d @1753842428" if you are using GNU date ***
*** Current BE git commitID: 910c424 ***
*** SIGABRT unknown detail explain (@0x5a46f) received by PID 369775 (TID 371694 OR 0x7fad067ef640) from PID 369775; stack trace: ***
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
 1# 0x00007FB12263EBF0 in /lib64/libc.so.6
 2# __pthread_kill_implementation in /lib64/libc.so.6
 3# gsignal in /lib64/libc.so.6
 4# abort in /lib64/libc.so.6
 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
 7# 0x000055C047B28EC1 in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
 8# 0x000055C047B29014 in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
 9# orc::TypeImpl::getAttributeValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
10# doris::vectorized::OrcReader::get_schema_col_name_attribute(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::vector<unsigned long, std::allocator<unsigned long> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:332
11# doris::vectorized::IcebergOrcReader::_gen_col_name_maps(doris::vectorized::OrcReader*) at 
```
mrhhsg pushed a commit to mrhhsg/doris that referenced this pull request Aug 21, 2025
…rg.id orc file.(apache#49051) (apache#54167)

pick apache#49051
but only fix:
```
terminate called after throwing an instance of 'std::range_error'
  what():  Key not found: iceberg.id
*** Query id: 6a93d7cdc9f44370-a40b07934a14c81b ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1753842428 (unix time) try "date -d @1753842428" if you are using GNU date ***
*** Current BE git commitID: 910c424 ***
*** SIGABRT unknown detail explain (@0x5a46f) received by PID 369775 (TID 371694 OR 0x7fad067ef640) from PID 369775; stack trace: ***
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
 1# 0x00007FB12263EBF0 in /lib64/libc.so.6
 2# __pthread_kill_implementation in /lib64/libc.so.6
 3# gsignal in /lib64/libc.so.6
 4# abort in /lib64/libc.so.6
 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
 7# 0x000055C047B28EC1 in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
 8# 0x000055C047B29014 in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
 9# orc::TypeImpl::getAttributeValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
10# doris::vectorized::OrcReader::get_schema_col_name_attribute(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::vector<unsigned long, std::allocator<unsigned long> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:332
11# doris::vectorized::IcebergOrcReader::_gen_col_name_maps(doris::vectorized::OrcReader*) at
```
mrhhsg pushed a commit to mrhhsg/doris that referenced this pull request Aug 21, 2025
…rg.id orc file.(apache#49051) (apache#54167)

pick apache#49051
but only fix:
```
terminate called after throwing an instance of 'std::range_error'
  what():  Key not found: iceberg.id
*** Query id: 6a93d7cdc9f44370-a40b07934a14c81b ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1753842428 (unix time) try "date -d @1753842428" if you are using GNU date ***
*** Current BE git commitID: 910c424 ***
*** SIGABRT unknown detail explain (@0x5a46f) received by PID 369775 (TID 371694 OR 0x7fad067ef640) from PID 369775; stack trace: ***
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
 1# 0x00007FB12263EBF0 in /lib64/libc.so.6
 2# __pthread_kill_implementation in /lib64/libc.so.6
 3# gsignal in /lib64/libc.so.6
 4# abort in /lib64/libc.so.6
 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
 7# 0x000055C047B28EC1 in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
 8# 0x000055C047B29014 in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
 9# orc::TypeImpl::getAttributeValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const in /opt/apache-doris-3.0.6.2-bin-x64/be/lib/doris_be
10# doris::vectorized::OrcReader::get_schema_col_name_attribute(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::vector<unsigned long, std::allocator<unsigned long> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/orc/vorc_reader.cpp:332
11# doris::vectorized::IcebergOrcReader::_gen_col_name_maps(doris::vectorized::OrcReader*) at
```
@gavinchou gavinchou mentioned this pull request Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants