Skip to content

Conversation

@BePPPower
Copy link
Contributor

Problem Summary:

Support asynchronous materialized view partition refresh feature for Hudi external tables.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@BePPPower
Copy link
Contributor Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Apr 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 33994 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3125e03a467e028863e77f6a629d92636745023e, data reload: false

------ Round 1 ----------------------------------
q1	26107	5040	5011	5011
q2	2074	281	177	177
q3	10393	1249	676	676
q4	10223	990	523	523
q5	7550	2365	2289	2289
q6	203	161	133	133
q7	906	736	615	615
q8	9319	1275	1052	1052
q9	6927	5116	5100	5100
q10	6805	2308	1903	1903
q11	483	291	285	285
q12	354	354	219	219
q13	17754	3601	3096	3096
q14	225	226	215	215
q15	536	486	485	485
q16	604	615	609	609
q17	617	853	378	378
q18	7450	7101	7034	7034
q19	1237	946	553	553
q20	351	342	234	234
q21	4211	3370	2443	2443
q22	1073	1027	964	964
Total cold run time: 115402 ms
Total hot run time: 33994 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5293	5234	5169	5169
q2	238	336	227	227
q3	2240	2748	2396	2396
q4	1462	1848	1480	1480
q5	4628	4447	4398	4398
q6	224	175	138	138
q7	2003	1877	1801	1801
q8	2572	2635	2521	2521
q9	7280	7062	7182	7062
q10	2973	3137	2728	2728
q11	571	519	493	493
q12	659	788	619	619
q13	3514	3920	3345	3345
q14	292	296	276	276
q15	510	480	465	465
q16	669	685	668	668
q17	1187	1581	1379	1379
q18	7682	7528	7332	7332
q19	844	797	901	797
q20	1943	1966	1851	1851
q21	5229	4869	4858	4858
q22	1049	1022	1022	1022
Total cold run time: 53062 ms
Total hot run time: 51025 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 194215 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3125e03a467e028863e77f6a629d92636745023e, data reload: false

query1	1376	1071	1049	1049
query2	6316	1995	1960	1960
query3	11020	4586	4534	4534
query4	25492	23459	23528	23459
query5	4449	594	459	459
query6	296	197	192	192
query7	3993	490	285	285
query8	286	234	232	232
query9	8499	2559	2594	2559
query10	484	343	258	258
query11	15261	15116	14948	14948
query12	154	108	107	107
query13	1567	511	410	410
query14	8934	6320	6340	6320
query15	204	188	173	173
query16	7350	631	456	456
query17	1190	708	563	563
query18	2019	401	337	337
query19	191	205	190	190
query20	127	125	118	118
query21	204	124	119	119
query22	4573	4530	4255	4255
query23	34434	33987	33587	33587
query24	8557	2459	2412	2412
query25	525	463	395	395
query26	1198	276	151	151
query27	2965	509	335	335
query28	4868	2468	2448	2448
query29	764	590	440	440
query30	275	227	205	205
query31	934	900	806	806
query32	80	61	61	61
query33	551	354	329	329
query34	820	881	501	501
query35	823	866	773	773
query36	990	975	913	913
query37	122	100	77	77
query38	4275	4284	4253	4253
query39	1515	1431	1440	1431
query40	220	115	108	108
query41	53	54	52	52
query42	122	109	109	109
query43	512	522	499	499
query44	1338	834	847	834
query45	180	180	175	175
query46	854	1037	658	658
query47	1838	1893	1807	1807
query48	380	408	311	311
query49	780	500	403	403
query50	669	688	418	418
query51	4350	4295	4259	4259
query52	110	104	101	101
query53	231	258	188	188
query54	589	565	492	492
query55	87	82	77	77
query56	299	293	298	293
query57	1180	1195	1176	1176
query58	271	267	268	267
query59	2806	2927	2786	2786
query60	333	320	313	313
query61	135	176	131	131
query62	806	744	695	695
query63	220	190	187	187
query64	4343	1068	718	718
query65	4420	4428	4359	4359
query66	1066	411	315	315
query67	15961	15435	15429	15429
query68	7275	879	517	517
query69	477	309	260	260
query70	1215	1137	1063	1063
query71	410	319	301	301
query72	6003	4924	5075	4924
query73	701	663	351	351
query74	9559	9164	8828	8828
query75	3200	3178	2703	2703
query76	3202	1180	766	766
query77	477	367	276	276
query78	10023	10255	9321	9321
query79	2788	813	568	568
query80	843	505	445	445
query81	501	249	222	222
query82	744	126	94	94
query83	253	360	231	231
query84	251	103	84	84
query85	773	349	318	318
query86	409	300	296	296
query87	4463	4558	4392	4392
query88	4184	2232	2216	2216
query89	406	347	298	298
query90	1846	216	218	216
query91	147	147	122	122
query92	79	62	57	57
query93	2594	938	591	591
query94	772	420	304	304
query95	377	298	292	292
query96	496	577	275	275
query97	3171	3203	3146	3146
query98	241	213	199	199
query99	1347	1381	1279	1279
Total cold run time: 281061 ms
Total hot run time: 194215 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.93 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3125e03a467e028863e77f6a629d92636745023e, data reload: false

query1	0.04	0.04	0.02
query2	0.14	0.10	0.11
query3	0.24	0.19	0.20
query4	1.60	0.19	0.19
query5	0.58	0.57	0.59
query6	1.18	0.71	0.72
query7	0.02	0.02	0.02
query8	0.03	0.04	0.03
query9	0.59	0.53	0.50
query10	0.62	0.60	0.57
query11	0.15	0.11	0.11
query12	0.14	0.11	0.12
query13	0.63	0.60	0.60
query14	2.71	2.69	2.84
query15	0.94	0.85	0.84
query16	0.38	0.38	0.39
query17	1.03	1.00	1.03
query18	0.22	0.21	0.20
query19	1.91	1.91	1.87
query20	0.01	0.01	0.02
query21	15.35	0.88	0.52
query22	0.76	1.15	0.66
query23	14.99	1.36	0.62
query24	7.44	1.35	0.62
query25	0.47	0.18	0.16
query26	0.53	0.17	0.13
query27	0.06	0.06	0.04
query28	8.74	0.86	0.43
query29	12.54	3.93	3.31
query30	0.24	0.09	0.07
query31	2.82	0.59	0.39
query32	3.23	0.54	0.48
query33	3.02	3.07	3.04
query34	15.86	5.08	4.48
query35	4.52	4.49	4.50
query36	0.67	0.50	0.48
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.15	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.02	0.02
Total cold run time: 104.87 s
Total hot run time: 30.93 s

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33977 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f3f2ca7a6f608a144701063a8730b43b8e9dcf19, data reload: false

------ Round 1 ----------------------------------
q1	26351	5079	5034	5034
q2	2077	285	188	188
q3	10387	1247	705	705
q4	10258	1002	523	523
q5	7540	2317	2377	2317
q6	189	162	134	134
q7	903	739	613	613
q8	9305	1258	1116	1116
q9	6828	5187	5136	5136
q10	6857	2294	1870	1870
q11	492	292	259	259
q12	361	358	216	216
q13	17792	3652	3069	3069
q14	224	225	208	208
q15	542	489	478	478
q16	450	451	405	405
q17	591	863	364	364
q18	7573	7196	7210	7196
q19	1903	964	551	551
q20	328	337	214	214
q21	3926	3388	2413	2413
q22	1054	1008	968	968
Total cold run time: 115931 ms
Total hot run time: 33977 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5141	5046	5065	5046
q2	235	327	227	227
q3	2137	2634	2304	2304
q4	1392	1786	1444	1444
q5	4418	4435	4354	4354
q6	215	187	132	132
q7	2003	1899	1767	1767
q8	2584	2548	2539	2539
q9	7289	7218	7033	7033
q10	3009	3196	2719	2719
q11	584	513	484	484
q12	672	765	596	596
q13	3516	3842	3274	3274
q14	281	300	263	263
q15	539	481	488	481
q16	482	502	460	460
q17	1163	1525	1419	1419
q18	7780	7614	7460	7460
q19	795	790	875	790
q20	1988	2060	1835	1835
q21	5222	4864	4737	4737
q22	1096	1038	991	991
Total cold run time: 52541 ms
Total hot run time: 50355 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192263 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f3f2ca7a6f608a144701063a8730b43b8e9dcf19, data reload: false

query1	1355	1102	1072	1072
query2	6270	1792	1782	1782
query3	11009	4534	4491	4491
query4	54429	25933	23436	23436
query5	5044	512	444	444
query6	319	195	179	179
query7	4895	486	281	281
query8	310	246	230	230
query9	5620	2548	2560	2548
query10	427	328	288	288
query11	15496	14969	14835	14835
query12	162	110	104	104
query13	1040	507	396	396
query14	10162	6397	6278	6278
query15	216	204	175	175
query16	7125	645	501	501
query17	1075	764	623	623
query18	1632	421	321	321
query19	203	200	182	182
query20	139	126	126	126
query21	210	131	113	113
query22	4339	4552	4370	4370
query23	34208	33584	33056	33056
query24	6769	2427	2437	2427
query25	464	491	404	404
query26	723	287	149	149
query27	2327	521	340	340
query28	3095	2158	2119	2119
query29	567	563	448	448
query30	267	237	189	189
query31	866	832	787	787
query32	77	63	62	62
query33	454	367	348	348
query34	790	868	523	523
query35	813	853	748	748
query36	947	1036	914	914
query37	122	102	75	75
query38	4224	4257	4220	4220
query39	1530	1457	1435	1435
query40	210	115	104	104
query41	51	52	50	50
query42	127	106	100	100
query43	488	518	495	495
query44	1362	831	807	807
query45	179	173	171	171
query46	848	1049	645	645
query47	1833	1913	1786	1786
query48	382	403	299	299
query49	699	518	413	413
query50	664	711	409	409
query51	4320	4290	4247	4247
query52	103	105	104	104
query53	236	262	197	197
query54	588	580	514	514
query55	86	85	81	81
query56	309	343	300	300
query57	1160	1200	1161	1161
query58	263	258	262	258
query59	2738	2684	2609	2609
query60	335	330	303	303
query61	135	126	125	125
query62	761	738	680	680
query63	227	186	190	186
query64	1815	1063	689	689
query65	4444	4323	4221	4221
query66	708	402	297	297
query67	15755	15658	15392	15392
query68	7032	884	509	509
query69	525	294	265	265
query70	1227	1116	1118	1116
query71	524	311	285	285
query72	5742	4806	4957	4806
query73	1451	657	345	345
query74	9048	8820	8826	8820
query75	3838	3209	2690	2690
query76	4300	1187	747	747
query77	628	361	298	298
query78	10013	10115	9311	9311
query79	3275	821	553	553
query80	633	591	435	435
query81	496	246	214	214
query82	484	131	96	96
query83	355	252	237	237
query84	289	96	82	82
query85	798	350	315	315
query86	378	302	287	287
query87	4498	4435	4326	4326
query88	3377	2213	2188	2188
query89	414	317	288	288
query90	1921	217	212	212
query91	142	145	115	115
query92	76	57	52	52
query93	2293	925	579	579
query94	661	396	292	292
query95	363	290	285	285
query96	483	568	272	272
query97	3201	3183	3114	3114
query98	222	207	239	207
query99	1433	1393	1260	1260
Total cold run time: 300666 ms
Total hot run time: 192263 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f3f2ca7a6f608a144701063a8730b43b8e9dcf19, data reload: false

query1	0.04	0.03	0.03
query2	0.13	0.10	0.11
query3	0.25	0.19	0.19
query4	1.60	0.19	0.20
query5	0.58	0.59	0.59
query6	1.20	0.72	0.70
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.52	0.51
query10	0.55	0.56	0.57
query11	0.17	0.11	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	1.14	1.20	1.16
query15	0.88	0.84	0.84
query16	0.39	0.39	0.39
query17	1.03	1.03	1.02
query18	0.21	0.20	0.20
query19	1.95	1.81	1.80
query20	0.02	0.01	0.01
query21	15.40	0.90	0.57
query22	0.75	1.14	0.68
query23	15.01	1.38	0.61
query24	7.19	1.18	0.88
query25	0.47	0.21	0.15
query26	0.49	0.16	0.14
query27	0.05	0.05	0.05
query28	10.06	0.85	0.43
query29	12.54	4.04	3.34
query30	0.26	0.09	0.07
query31	2.81	0.57	0.38
query32	3.22	0.54	0.46
query33	3.10	3.00	3.08
query34	15.71	5.12	4.50
query35	4.54	4.54	4.48
query36	0.68	0.49	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.16	0.14	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.02	0.02
Total cold run time: 104.31 s
Total hot run time: 29.69 s

…nvolved in transparent rewriting (apache#49513)

It's because when Hudi performs transparent rewriting, the timestamp of
the base table obtained by `loadSnapshot` is inconsistent with the
timestamp stored after partition refresh, which causes the comparison of
`tableSnapshot` to fail, resulting in the materialized view not being
hit.

Currently, the logic of `loadSnapshot` to obtain the timestamp of the
base table is a bit strange and doesn't quite meet expectations. Further
in - depth research is needed on how to modify it.

For now, in the `getTableSnapshot` function, simply return `0L`
constantly, indicating that the `tableSnapshot` is always synchronized,
to bypass this problem. This modification is consistent with the
original expectation of manual refresh. We'll deal with this issue
later.
@BePPPower BePPPower force-pushed the ftw-branch-hudi-mtmv branch from f3f2ca7 to a789d11 Compare April 21, 2025 04:29
@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34134 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a789d1112cd39f84ac87bb79c1f797f0b96edd5d, data reload: false

------ Round 1 ----------------------------------
q1	26287	5497	5030	5030
q2	2106	286	183	183
q3	10384	1273	699	699
q4	10219	1003	545	545
q5	7560	2311	2386	2311
q6	178	163	132	132
q7	916	763	620	620
q8	9324	1307	1146	1146
q9	6868	5118	5089	5089
q10	6817	2304	1923	1923
q11	473	284	291	284
q12	352	358	235	235
q13	17770	3686	3082	3082
q14	231	227	211	211
q15	536	482	482	482
q16	454	458	393	393
q17	602	861	364	364
q18	7688	7295	7186	7186
q19	1362	964	585	585
q20	326	333	225	225
q21	4342	3341	2420	2420
q22	1042	1007	989	989
Total cold run time: 115837 ms
Total hot run time: 34134 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5076	5081	5082	5081
q2	253	330	249	249
q3	2157	2619	2307	2307
q4	1396	1878	1459	1459
q5	4420	4417	4368	4368
q6	222	165	126	126
q7	1977	1905	1776	1776
q8	2641	2573	2522	2522
q9	7315	7200	7120	7120
q10	3006	3186	2732	2732
q11	594	515	487	487
q12	679	775	637	637
q13	3556	3868	3352	3352
q14	286	298	285	285
q15	529	482	474	474
q16	468	503	469	469
q17	1192	1532	1422	1422
q18	7727	7582	7493	7493
q19	805	852	983	852
q20	2035	1988	1867	1867
q21	5235	4888	4751	4751
q22	1143	1052	1051	1051
Total cold run time: 52712 ms
Total hot run time: 50880 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192911 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a789d1112cd39f84ac87bb79c1f797f0b96edd5d, data reload: false

query1	1396	1103	1049	1049
query2	6102	1774	1802	1774
query3	11029	4779	4744	4744
query4	55106	25382	23214	23214
query5	5189	532	502	502
query6	353	189	182	182
query7	4882	497	291	291
query8	316	244	241	241
query9	5710	2548	2558	2548
query10	431	319	292	292
query11	15169	14990	14805	14805
query12	167	111	106	106
query13	1056	507	389	389
query14	10114	6429	6421	6421
query15	209	198	182	182
query16	7125	648	458	458
query17	1073	726	571	571
query18	1576	406	315	315
query19	190	186	176	176
query20	137	123	122	122
query21	202	127	105	105
query22	4463	4435	4333	4333
query23	34276	33512	33454	33454
query24	6545	2439	2412	2412
query25	455	468	404	404
query26	675	282	150	150
query27	2143	487	339	339
query28	3046	2125	2127	2125
query29	587	579	437	437
query30	270	223	192	192
query31	882	873	768	768
query32	75	62	68	62
query33	452	357	344	344
query34	752	878	513	513
query35	818	862	757	757
query36	957	986	892	892
query37	114	106	72	72
query38	4204	4299	4332	4299
query39	1492	1446	1429	1429
query40	214	129	110	110
query41	53	56	53	53
query42	127	114	102	102
query43	495	497	492	492
query44	1359	822	842	822
query45	181	171	167	167
query46	830	1024	654	654
query47	1824	1863	1781	1781
query48	394	406	302	302
query49	681	500	412	412
query50	643	691	410	410
query51	4422	4313	4156	4156
query52	107	110	104	104
query53	247	270	188	188
query54	582	578	516	516
query55	81	81	86	81
query56	318	288	291	288
query57	1173	1202	1122	1122
query58	278	264	261	261
query59	2667	2737	2631	2631
query60	315	334	326	326
query61	152	153	165	153
query62	736	747	680	680
query63	225	195	209	195
query64	1624	1132	819	819
query65	4374	4195	4243	4195
query66	716	398	306	306
query67	15894	15674	15439	15439
query68	7030	883	510	510
query69	543	298	262	262
query70	1176	1115	1084	1084
query71	508	308	297	297
query72	5860	4879	4928	4879
query73	1496	689	345	345
query74	8957	8868	8990	8868
query75	3893	3225	2708	2708
query76	4243	1199	758	758
query77	599	357	286	286
query78	10150	10104	9248	9248
query79	3043	821	560	560
query80	816	499	496	496
query81	496	254	220	220
query82	525	127	97	97
query83	375	255	255	255
query84	299	98	78	78
query85	781	351	307	307
query86	381	311	278	278
query87	4419	4420	4288	4288
query88	3317	2234	2201	2201
query89	405	312	279	279
query90	1786	208	217	208
query91	138	137	110	110
query92	71	61	57	57
query93	2460	920	578	578
query94	661	413	300	300
query95	366	293	285	285
query96	474	564	269	269
query97	3167	3213	3080	3080
query98	221	207	205	205
query99	1418	1405	1281	1281
Total cold run time: 300524 ms
Total hot run time: 192911 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.08 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a789d1112cd39f84ac87bb79c1f797f0b96edd5d, data reload: false

query1	0.04	0.03	0.04
query2	0.14	0.11	0.11
query3	0.25	0.20	0.19
query4	1.61	0.18	0.20
query5	0.60	0.58	0.61
query6	1.18	0.71	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.58	0.53	0.50
query10	0.57	0.57	0.56
query11	0.16	0.12	0.11
query12	0.14	0.11	0.12
query13	0.62	0.60	0.60
query14	1.15	1.18	1.17
query15	0.88	0.84	0.86
query16	0.40	0.38	0.38
query17	1.00	1.07	1.06
query18	0.21	0.19	0.19
query19	1.91	1.82	1.77
query20	0.01	0.02	0.01
query21	15.39	0.96	0.57
query22	0.76	1.13	0.69
query23	14.99	1.41	0.66
query24	6.77	1.32	1.45
query25	0.46	0.20	0.08
query26	0.63	0.17	0.13
query27	0.06	0.05	0.04
query28	9.37	0.89	0.44
query29	12.53	3.98	3.29
query30	0.25	0.09	0.06
query31	2.82	0.61	0.40
query32	3.28	0.55	0.46
query33	2.97	3.10	3.02
query34	15.77	5.13	4.45
query35	4.52	4.52	4.52
query36	0.66	0.50	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 103.24 s
Total hot run time: 30.08 s

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 23, 2025
@morningman morningman merged commit 7f40959 into apache:master Apr 23, 2025
26 checks passed
morningman pushed a commit that referenced this pull request Apr 29, 2025
### What problem does this PR solve?

Followup #49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit that referenced this pull request May 22, 2025
…50979)

Problem Summary:

related pr: #48172

This pr(#48172) had changed the code logical of method
`beforeMTMVRefresh`, but this pr(#49956) added the code back.
So we delete this code.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…resh feature for Hudi external tables. (apache#49956)

Problem Summary:

Support asynchronous materialized view partition refresh feature for
Hudi external tables.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…pache#50979)

Problem Summary:

related pr: apache#48172

This pr(apache#48172) had changed the code logical of method
`beforeMTMVRefresh`, but this pr(apache#49956) added the code back.
So we delete this code.
zddr pushed a commit to zddr/incubator-doris that referenced this pull request Jun 19, 2025
…resh feature for Hudi external tables. (apache#49956)

Problem Summary:

Support asynchronous materialized view partition refresh feature for
Hudi external tables.
zddr pushed a commit to zddr/incubator-doris that referenced this pull request Jun 19, 2025
…pache#50979)

Problem Summary:

related pr: apache#48172

This pr(apache#48172) had changed the code logical of method
`beforeMTMVRefresh`, but this pr(apache#49956) added the code back.
So we delete this code.
zddr pushed a commit to zddr/incubator-doris that referenced this pull request Jun 19, 2025
…resh feature for Hudi external tables. (apache#49956)

Problem Summary:

Support asynchronous materialized view partition refresh feature for
Hudi external tables.
zddr pushed a commit to zddr/incubator-doris that referenced this pull request Jun 19, 2025
…pache#50979)

Problem Summary:

related pr: apache#48172

This pr(apache#48172) had changed the code logical of method
`beforeMTMVRefresh`, but this pr(apache#49956) added the code back.
So we delete this code.
morrySnow pushed a commit that referenced this pull request Jun 23, 2025
Cherry-pick from
#43959
#44419
#44415
#44567
#44673
#44998
#45273
#44911
#44726
#45652
#45659
#46257
#46641
#47026
#47166
#48172
#49956
#50979

---------

Co-authored-by: James <lijibing@selectdb.com>
Co-authored-by: Tiewei Fang <fangtiewei@selectdb.com>
morningman pushed a commit to morningman/doris that referenced this pull request Jun 24, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 25, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit that referenced this pull request Jun 26, 2025
…1152)

### What problem does this PR solve?

Related PR: #49956

Problem Summary:
In pr #49956, the concept of `HudiMvccSnapshot` is introduced to
implement `hudi asynchronous materialized view partition refresh`. This
pr uses the `LastUpdateTimestamp` of `TablePartitionValues` ​​in
`HudiMvccSnapshot` to obtain the hudi schema, which will cause the
`LastUpdateTimestamp` value to be always 0 if the table is not a
partitioned table. This will result in the actual hudischema not being
obtained. This pr refers to `IcebergMvccSnapshot` and adds the concept
of `timestamp` in `HudiMvccSnapshot` to obtain the correct hudi schema.
Correct hudi schema: It contains information such as column unique id
morningman pushed a commit to morningman/doris that referenced this pull request Jun 30, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 30, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 2, 2025
…ache#51152)

### What problem does this PR solve?

Related PR: apache#49956

Problem Summary:
In pr apache#49956, the concept of `HudiMvccSnapshot` is introduced to
implement `hudi asynchronous materialized view partition refresh`. This
pr uses the `LastUpdateTimestamp` of `TablePartitionValues` ​​in
`HudiMvccSnapshot` to obtain the hudi schema, which will cause the
`LastUpdateTimestamp` value to be always 0 if the table is not a
partitioned table. This will result in the actual hudischema not being
obtained. This pr refers to `IcebergMvccSnapshot` and adds the concept
of `timestamp` in `HudiMvccSnapshot` to obtain the correct hudi schema.
Correct hudi schema: It contains information such as column unique id
morrySnow pushed a commit that referenced this pull request Jan 15, 2026
github-actions bot pushed a commit that referenced this pull request Jan 15, 2026
morningman pushed a commit that referenced this pull request Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants