[enhance](multi-catalog) Runtime Filter Partition Pruning for Data Lake Tables #53399

suxiaogang223 · 2025-07-16T11:49:40Z

What problem does this PR solve?

PR Overview

This PR implements dynamic partition pruning based on runtime filters for Iceberg, Paimon, and Hudi data lake tables, extending and enhancing the previous PR #47025.

Background

In PR #47025, we implemented runtime filter-based dynamic partition pruning for Hive tables. However, due to significant differences in partition metadata formats between Iceberg, Paimon, Hudi and traditional Hive tables, specialized adaptation and implementation are required for these data lake formats.

Main Features

1. Core Implementation

Frontend (FE) Changes

During split generation in scan nodes, when enable_runtime_filter_partition_prune is enabled, call corresponding partition value extraction functions
Pass extracted partition values to backend through TFileRangeDesc.data_lake_partition_values field
Store partition values in Map<String, String> format, with keys as partition column names and values as serialized partition values

Backend (BE) Changes

Process partition column information in FileScanner::_generate_data_lake_partition_columns()
Runtime filters can perform partition pruning based on this partition value information, avoiding scanning of non-matching partition files

2. Supported Query Types

Dynamic partition pruning supports the following types of queries:

-- Equality queries
SELECT count(*) FROM iceberg_table 
WHERE partition_col = (
    SELECT partition_col FROM iceberg_table 
    GROUP BY partition_col 
    HAVING count(*) > 0 
    ORDER BY partition_col DESC 
    LIMIT 1
);

-- IN queries
SELECT count(*) FROM paimon_table 
WHERE partition_col IN (
    SELECT partition_col FROM paimon_table 
    GROUP BY partition_col 
    HAVING count(*) > 0 
    ORDER BY partition_col DESC 
    LIMIT 2
);

-- Function expression queries
SELECT count(*) FROM hudi_table 
WHERE abs(partition_col) = (
    SELECT partition_col FROM hudi_table 
    GROUP BY partition_col 
    HAVING count(*) > 0 
    ORDER BY partition_col DESC 
    LIMIT 1
);

3. Supported Data Types

Partition data types supported by each format:

Common Support:

Numeric types: INT, BIGINT, DECIMAL, FLOAT, DOUBLE, TINYINT, SMALLINT
String types: STRING, VARCHAR, CHAR
Date/time types: DATE, TIMESTAMP
Boolean type: BOOLEAN
Binary types: BINARY (except for Paimon)

Format-specific Support:

Iceberg: Additionally supports TIMESTAMP_NTZ type for timezone-free timestamps
Paimon: Does not support BINARY as partition key (currently binary as partition key causes issues in Spark)
Hudi: Based on Hive partition format, supports all Hive-compatible types

Notes:

TIME and UUID types are supported at the code level, but since Spark does not support these types as partition keys, test cases do not include related test scenarios
In actual production environments, if these types are used, the dynamic partition pruning feature can still work normally

Release note

Impl runtime filter partition pruning for Iceberg/Paimon

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2025-07-16T11:50:07Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

suxiaogang223 · 2025-07-22T14:38:39Z

run buildall

suxiaogang223 · 2025-07-22T15:28:05Z

run buildall

doris-robot · 2025-07-22T15:41:51Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	80.42% (1302/1619)
Line Coverage	65.77% (21797/33140)
Region Coverage	67.09% (10955/16328)
Branch Coverage	56.61% (5761/10176)

suxiaogang223 · 2025-07-22T16:40:57Z

run buildall

doris-robot · 2025-07-22T16:51:17Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	80.42% (1302/1619)
Line Coverage	65.81% (21810/33140)
Region Coverage	67.12% (10960/16328)
Branch Coverage	56.69% (5769/10176)

doris-robot · 2025-07-22T17:22:36Z

TPC-H: Total hot run time: 34111 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c04b77a2d1aed59b4f164ef452f8deb7cc06adc2, data reload: false

------ Round 1 ----------------------------------
q1	17562	5238	5136	5136
q2	1959	278	187	187
q3	10565	1281	704	704
q4	10302	1012	504	504
q5	8609	2494	2332	2332
q6	184	161	129	129
q7	903	744	599	599
q8	9298	1325	1122	1122
q9	7080	5156	5156	5156
q10	6960	2361	1969	1969
q11	478	285	263	263
q12	354	352	219	219
q13	17791	3743	3087	3087
q14	232	230	213	213
q15	558	491	475	475
q16	431	432	377	377
q17	618	878	356	356
q18	7431	7214	7194	7194
q19	1295	943	568	568
q20	344	342	226	226
q21	3965	3216	2338	2338
q22	1068	1045	957	957
Total cold run time: 107987 ms
Total hot run time: 34111 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5225	5101	5095	5095
q2	246	333	216	216
q3	2195	2665	2331	2331
q4	1362	1862	1354	1354
q5	4463	4622	4602	4602
q6	218	171	123	123
q7	2037	1984	1780	1780
q8	2692	2527	2708	2527
q9	7371	7360	7256	7256
q10	3074	3305	2822	2822
q11	570	575	583	575
q12	691	774	631	631
q13	3491	4013	3400	3400
q14	325	309	287	287
q15	529	463	473	463
q16	470	509	440	440
q17	1200	1635	1404	1404
q18	7893	7688	7686	7686
q19	787	784	780	780
q20	1891	1966	1788	1788
q21	4880	4353	4359	4353
q22	1082	1091	980	980
Total cold run time: 52692 ms
Total hot run time: 50893 ms

doris-robot · 2025-07-22T17:34:16Z

TPC-DS: Total hot run time: 187424 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c04b77a2d1aed59b4f164ef452f8deb7cc06adc2, data reload: false

query1	1038	383	396	383
query2	6513	1735	1729	1729
query3	6748	220	220	220
query4	26340	23693	23025	23025
query5	4365	600	488	488
query6	320	219	205	205
query7	4628	505	300	300
query8	269	245	215	215
query9	8554	2857	2826	2826
query10	463	327	298	298
query11	15953	14931	14924	14924
query12	153	111	107	107
query13	1657	507	397	397
query14	8786	5867	5873	5867
query15	203	188	166	166
query16	7148	648	415	415
query17	1195	698	578	578
query18	1989	394	305	305
query19	187	181	165	165
query20	118	124	117	117
query21	215	118	101	101
query22	4116	4136	4061	4061
query23	34101	33266	33142	33142
query24	8120	2336	2358	2336
query25	553	509	441	441
query26	1243	273	158	158
query27	2753	511	355	355
query28	4388	2210	2168	2168
query29	797	602	486	486
query30	288	220	188	188
query31	930	842	779	779
query32	85	77	76	76
query33	551	388	337	337
query34	788	839	572	572
query35	808	807	726	726
query36	973	983	901	901
query37	126	99	83	83
query38	4105	4147	4079	4079
query39	1484	1419	1435	1419
query40	226	124	109	109
query41	61	59	53	53
query42	119	108	111	108
query43	511	496	465	465
query44	1314	851	836	836
query45	175	173	166	166
query46	839	1007	637	637
query47	1816	1819	1725	1725
query48	377	429	297	297
query49	746	490	381	381
query50	702	679	408	408
query51	5469	5687	5589	5589
query52	112	108	103	103
query53	228	254	188	188
query54	598	589	523	523
query55	86	85	86	85
query56	323	324	307	307
query57	1202	1204	1122	1122
query58	283	269	268	268
query59	2664	2765	2514	2514
query60	350	342	320	320
query61	128	124	151	124
query62	801	721	622	622
query63	225	187	194	187
query64	4353	1007	726	726
query65	4235	4185	4176	4176
query66	1177	412	332	332
query67	15848	15620	15718	15620
query68	7896	902	553	553
query69	470	324	280	280
query70	1179	1106	1147	1106
query71	458	343	311	311
query72	5579	4866	5041	4866
query73	728	693	353	353
query74	8911	9137	8665	8665
query75	3772	3113	2703	2703
query76	3475	1172	722	722
query77	795	381	325	325
query78	10069	10059	9296	9296
query79	2616	785	607	607
query80	592	524	477	477
query81	503	254	219	219
query82	486	137	106	106
query83	250	253	253	253
query84	264	100	91	91
query85	852	376	325	325
query86	397	312	323	312
query87	4418	4441	4235	4235
query88	3667	2220	2199	2199
query89	419	312	277	277
query90	1872	215	218	215
query91	136	137	109	109
query92	88	71	65	65
query93	2004	966	627	627
query94	647	396	296	296
query95	400	309	303	303
query96	477	561	273	273
query97	2736	2815	2605	2605
query98	247	215	210	210
query99	1314	1411	1311	1311
Total cold run time: 275736 ms
Total hot run time: 187424 ms

doris-robot · 2025-07-22T17:39:53Z

ClickBench: Total hot run time: 33.38 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c04b77a2d1aed59b4f164ef452f8deb7cc06adc2, data reload: false

query1	0.03	0.03	0.04
query2	0.11	0.06	0.06
query3	0.29	0.06	0.08
query4	1.59	0.08	0.08
query5	0.42	0.40	0.40
query6	1.18	0.66	0.65
query7	0.03	0.01	0.01
query8	0.06	0.05	0.05
query9	0.65	0.53	0.53
query10	0.58	0.59	0.58
query11	0.25	0.13	0.13
query12	0.25	0.13	0.13
query13	0.64	0.63	0.63
query14	0.81	0.83	0.88
query15	0.99	0.91	0.88
query16	0.38	0.38	0.39
query17	1.09	1.08	1.08
query18	0.22	0.21	0.22
query19	2.04	1.84	1.91
query20	0.01	0.02	0.02
query21	15.35	1.01	0.67
query22	0.94	1.14	0.99
query23	14.69	1.47	0.77
query24	5.46	0.57	0.30
query25	0.17	0.10	0.08
query26	0.56	0.22	0.18
query27	0.09	0.08	0.09
query28	10.97	1.19	0.57
query29	12.60	4.08	3.38
query30	3.12	3.07	2.96
query31	2.80	0.62	0.43
query32	3.24	0.60	0.50
query33	3.15	3.12	3.20
query34	16.35	5.44	4.71
query35	4.82	4.89	4.86
query36	0.66	0.52	0.51
query37	0.21	0.18	0.18
query38	0.17	0.16	0.16
query39	0.05	0.04	0.04
query40	0.20	0.17	0.17
query41	0.10	0.06	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 107.43 s
Total hot run time: 33.38 s

hello-stephen · 2025-07-22T17:45:44Z

FE UT Coverage Report

Increment line coverage 7.22% (7/97) 🎉
Increment coverage report
Complete coverage report

fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/PaimonUtil.java

morningman · 2025-07-24T01:01:14Z

fe/fe-core/src/main/java/org/apache/doris/datasource/hudi/source/HudiScanNode.java

                        new String[0], partition.getPartitionValues());
                hudiSplit.setTableFormatType(TableFormatType.HUDI);
+                if (sessionVariable.isEnableRuntimeFilterPartitionPrune()) {
+                    hudiSplit.setHudiPartitionValues(HudiUtils.getPartitionInfoMap(hmsTable, partition));


the HudiUtils.getPartitionInfoMap(hmsTable, partition) should only be called once for one partition.

This is unresolved?

suxiaogang223 · 2025-07-24T09:04:46Z

run buildall

hello-stephen · 2025-07-24T09:25:13Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	80.42% (1302/1619)
Line Coverage	65.78% (21799/33140)
Region Coverage	67.03% (10944/16328)
Branch Coverage	56.60% (5760/10176)

hello-stephen · 2025-07-24T12:18:11Z

BE UT Coverage Report

Increment line coverage 33.33% (16/48) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	57.55% (15936/27690)
Line Coverage	46.35% (143271/309140)
Region Coverage	35.77% (107943/301733)
Branch Coverage	38.30% (47643/124395)

suxiaogang223 · 2025-07-24T14:20:11Z

run buildall

doris-robot · 2025-07-24T14:41:36Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	80.42% (1302/1619)
Line Coverage	65.78% (21801/33140)
Region Coverage	67.11% (10958/16328)
Branch Coverage	56.65% (5765/10176)

hello-stephen · 2025-07-24T15:19:42Z

FE UT Coverage Report

Increment line coverage 9.43% (10/106) 🎉
Increment coverage report
Complete coverage report

doris-robot · 2025-07-24T15:44:08Z

TPC-H: Total hot run time: 33900 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 511964bf10ea517185f81405d7b06bafb3857acd, data reload: false

------ Round 1 ----------------------------------
q1	17654	5272	5100	5100
q2	1922	279	180	180
q3	10628	1313	677	677
q4	10305	1005	509	509
q5	8460	2426	2366	2366
q6	182	158	128	128
q7	909	754	588	588
q8	9315	1334	1058	1058
q9	7087	5168	5109	5109
q10	6887	2393	1962	1962
q11	477	279	282	279
q12	341	350	217	217
q13	17770	3714	3062	3062
q14	230	239	210	210
q15	550	486	477	477
q16	428	422	378	378
q17	596	872	382	382
q18	7687	7179	7123	7123
q19	1214	946	549	549
q20	354	353	229	229
q21	3922	3215	2333	2333
q22	1070	1049	984	984
Total cold run time: 107988 ms
Total hot run time: 33900 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5146	5108	5098	5098
q2	236	315	217	217
q3	2164	2663	2315	2315
q4	1411	1835	1342	1342
q5	4363	4487	4495	4487
q6	213	170	125	125
q7	2014	1960	1798	1798
q8	2679	2474	2508	2474
q9	7339	7241	7221	7221
q10	3108	3281	2882	2882
q11	598	503	507	503
q12	688	792	606	606
q13	3648	4008	3467	3467
q14	276	303	272	272
q15	528	476	484	476
q16	443	464	430	430
q17	1213	1637	1604	1604
q18	7792	7838	7517	7517
q19	783	780	854	780
q20	1949	1941	1818	1818
q21	4776	4264	4365	4264
q22	1085	1051	975	975
Total cold run time: 52452 ms
Total hot run time: 50671 ms

doris-robot · 2025-07-24T15:55:48Z

TPC-DS: Total hot run time: 186951 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 511964bf10ea517185f81405d7b06bafb3857acd, data reload: false

query1	1000	407	401	401
query2	6513	1685	1632	1632
query3	6736	226	223	223
query4	26558	23302	23092	23092
query5	4323	614	454	454
query6	299	210	209	209
query7	4616	496	285	285
query8	263	233	216	216
query9	8657	2893	2910	2893
query10	487	328	281	281
query11	15966	15070	14766	14766
query12	159	117	110	110
query13	1664	561	437	437
query14	9265	5848	5929	5848
query15	216	201	173	173
query16	7423	627	436	436
query17	1203	742	601	601
query18	2019	419	328	328
query19	198	195	163	163
query20	124	113	116	113
query21	213	130	111	111
query22	4187	4137	4042	4042
query23	33981	33038	32829	32829
query24	8130	2379	2403	2379
query25	536	459	392	392
query26	1236	313	155	155
query27	2718	507	358	358
query28	4360	2239	2207	2207
query29	759	557	438	438
query30	283	238	191	191
query31	903	835	753	753
query32	81	79	74	74
query33	559	366	329	329
query34	790	835	512	512
query35	798	832	752	752
query36	959	1018	921	921
query37	125	101	87	87
query38	4217	4117	4078	4078
query39	1459	1420	1405	1405
query40	241	127	115	115
query41	60	54	54	54
query42	123	115	110	110
query43	490	490	478	478
query44	1315	868	852	852
query45	177	164	164	164
query46	835	1006	630	630
query47	1769	1832	1741	1741
query48	381	427	314	314
query49	744	482	384	384
query50	620	703	398	398
query51	5403	5511	5511	5511
query52	115	110	101	101
query53	227	255	187	187
query54	603	610	548	548
query55	90	86	82	82
query56	319	314	303	303
query57	1178	1199	1104	1104
query58	272	262	272	262
query59	2612	2685	2508	2508
query60	348	324	321	321
query61	126	122	123	122
query62	804	703	660	660
query63	217	193	228	193
query64	4349	1023	663	663
query65	4266	4167	4149	4149
query66	1124	411	335	335
query67	15805	15783	15553	15553
query68	8651	906	564	564
query69	482	369	291	291
query70	1258	1104	1053	1053
query71	454	334	317	317
query72	5313	4697	4599	4599
query73	709	564	356	356
query74	8898	9214	8992	8992
query75	4023	3075	2605	2605
query76	3689	1136	718	718
query77	807	379	330	330
query78	9908	9951	9225	9225
query79	2810	830	607	607
query80	649	525	534	525
query81	473	256	223	223
query82	469	141	115	115
query83	290	251	233	233
query84	284	111	79	79
query85	801	367	330	330
query86	342	321	307	307
query87	4362	4509	4376	4376
query88	3083	2290	2268	2268
query89	402	310	282	282
query90	1948	217	217	217
query91	143	135	112	112
query92	89	69	62	62
query93	1583	959	627	627
query94	679	397	305	305
query95	450	316	322	316
query96	486	579	283	283
query97	2702	2697	2591	2591
query98	244	214	217	214
query99	1444	1400	1313	1313
Total cold run time: 276521 ms
Total hot run time: 186951 ms

doris-robot · 2025-07-24T16:01:24Z

ClickBench: Total hot run time: 33.55 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 511964bf10ea517185f81405d7b06bafb3857acd, data reload: false

query1	0.04	0.04	0.04
query2	0.12	0.05	0.05
query3	0.29	0.06	0.06
query4	1.60	0.09	0.09
query5	0.41	0.39	0.40
query6	1.15	0.65	0.64
query7	0.03	0.02	0.01
query8	0.07	0.05	0.05
query9	0.66	0.52	0.53
query10	0.59	0.57	0.57
query11	0.25	0.13	0.13
query12	0.25	0.14	0.14
query13	0.65	0.64	0.61
query14	0.81	0.84	0.84
query15	0.97	0.89	0.90
query16	0.39	0.38	0.38
query17	1.06	1.08	1.05
query18	0.24	0.22	0.23
query19	2.03	1.85	1.86
query20	0.02	0.01	0.01
query21	15.37	0.97	0.67
query22	0.93	1.00	0.93
query23	14.72	1.52	0.76
query24	5.16	0.56	0.32
query25	0.17	0.10	0.09
query26	0.55	0.22	0.18
query27	0.09	0.10	0.09
query28	11.06	1.18	0.58
query29	12.59	4.16	3.45
query30	3.05	3.04	3.05
query31	2.82	0.61	0.42
query32	3.25	0.61	0.50
query33	3.16	3.16	3.16
query34	16.52	5.45	4.76
query35	4.81	4.86	4.96
query36	0.64	0.53	0.50
query37	0.20	0.19	0.17
query38	0.18	0.16	0.17
query39	0.05	0.05	0.05
query40	0.20	0.17	0.17
query41	0.11	0.05	0.05
query42	0.07	0.06	0.06
query43	0.07	0.04	0.05
Total cold run time: 107.4 s
Total hot run time: 33.55 s

suxiaogang223 · 2025-07-25T08:31:27Z

run buildall

hello-stephen · 2025-07-25T08:51:55Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	80.42% (1302/1619)
Line Coverage	65.77% (21798/33142)
Region Coverage	67.03% (10945/16328)
Branch Coverage	56.59% (5759/10176)

hello-stephen · 2025-07-25T09:26:40Z

FE UT Coverage Report

Increment line coverage 9.43% (10/106) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2025-08-05T11:51:24Z

BE UT Coverage Report

Increment line coverage 18.87% (10/53) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	58.19% (16339/28080)
Line Coverage	47.09% (147476/313165)
Region Coverage	36.11% (110375/305638)
Branch Coverage	38.95% (48984/125773)

hello-stephen · 2025-08-05T13:23:56Z

BE Regression && UT Coverage Report

Increment line coverage 18.87% (10/53) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	81.26% (22396/27562)
Line Coverage	74.10% (231731/312741)
Region Coverage	61.50% (192752/313405)
Branch Coverage	65.42% (83232/127221)

morningman

LGTM

github-actions · 2025-08-07T06:06:42Z

PR approved by at least one committer and no changes requested.

…ke Tables (apache#53399) follow: apache#47025 This PR implements dynamic partition pruning based on runtime filters for Iceberg, Paimon, and Hudi data lake tables, extending and enhancing the previous PR [apache#47025](apache#47025). In PR [apache#47025](apache#47025), we implemented runtime filter-based dynamic partition pruning for Hive tables. However, due to significant differences in partition metadata formats between Iceberg, Paimon, Hudi and traditional Hive tables, specialized adaptation and implementation are required for these data lake formats. - During split generation in scan nodes, when `enable_runtime_filter_partition_prune` is enabled, call corresponding partition value extraction functions - Pass extracted partition values to backend through `TFileRangeDesc.data_lake_partition_values` field - Store partition values in `Map<String, String>` format, with keys as partition column names and values as serialized partition values - Process partition column information in `FileScanner::_generate_data_lake_partition_columns()` - Runtime filters can perform partition pruning based on this partition value information, avoiding scanning of non-matching partition files Dynamic partition pruning supports the following types of queries: ```sql -- Equality queries SELECT count(*) FROM iceberg_table WHERE partition_col = ( SELECT partition_col FROM iceberg_table GROUP BY partition_col HAVING count(*) > 0 ORDER BY partition_col DESC LIMIT 1 ); -- IN queries SELECT count(*) FROM paimon_table WHERE partition_col IN ( SELECT partition_col FROM paimon_table GROUP BY partition_col HAVING count(*) > 0 ORDER BY partition_col DESC LIMIT 2 ); -- Function expression queries SELECT count(*) FROM hudi_table WHERE abs(partition_col) = ( SELECT partition_col FROM hudi_table GROUP BY partition_col HAVING count(*) > 0 ORDER BY partition_col DESC LIMIT 1 ); ``` Partition data types supported by each format: **Common Support**: - **Numeric types**: INT, BIGINT, DECIMAL, FLOAT, DOUBLE, TINYINT, SMALLINT - **String types**: STRING, VARCHAR, CHAR - **Date/time types**: DATE, TIMESTAMP - **Boolean type**: BOOLEAN - **Binary types**: BINARY (except for Paimon) **Format-specific Support**: - **Iceberg**: Additionally supports TIMESTAMP_NTZ type for timezone-free timestamps - **Paimon**: Does not support BINARY as partition key (currently binary as partition key causes issues in Spark) - **Hudi**: Based on Hive partition format, supports all Hive-compatible types **Notes**: - TIME and UUID types are supported at the code level, but since Spark does not support these types as partition keys, test cases do not include related test scenarios - In actual production environments, if these types are used, the dynamic partition pruning feature can still work normally

… Data Lake Tables (apache#53399)" This reverts commit 8d98908.

…ke Tables (apache#53399) follow: apache#47025 This PR implements dynamic partition pruning based on runtime filters for Iceberg, Paimon, and Hudi data lake tables, extending and enhancing the previous PR [apache#47025](apache#47025). In PR [apache#47025](apache#47025), we implemented runtime filter-based dynamic partition pruning for Hive tables. However, due to significant differences in partition metadata formats between Iceberg, Paimon, Hudi and traditional Hive tables, specialized adaptation and implementation are required for these data lake formats. - During split generation in scan nodes, when `enable_runtime_filter_partition_prune` is enabled, call corresponding partition value extraction functions - Pass extracted partition values to backend through `TFileRangeDesc.data_lake_partition_values` field - Store partition values in `Map<String, String>` format, with keys as partition column names and values as serialized partition values - Process partition column information in `FileScanner::_generate_data_lake_partition_columns()` - Runtime filters can perform partition pruning based on this partition value information, avoiding scanning of non-matching partition files Dynamic partition pruning supports the following types of queries: ```sql -- Equality queries SELECT count(*) FROM iceberg_table WHERE partition_col = ( SELECT partition_col FROM iceberg_table GROUP BY partition_col HAVING count(*) > 0 ORDER BY partition_col DESC LIMIT 1 ); -- IN queries SELECT count(*) FROM paimon_table WHERE partition_col IN ( SELECT partition_col FROM paimon_table GROUP BY partition_col HAVING count(*) > 0 ORDER BY partition_col DESC LIMIT 2 ); -- Function expression queries SELECT count(*) FROM hudi_table WHERE abs(partition_col) = ( SELECT partition_col FROM hudi_table GROUP BY partition_col HAVING count(*) > 0 ORDER BY partition_col DESC LIMIT 1 ); ``` Partition data types supported by each format: **Common Support**: - **Numeric types**: INT, BIGINT, DECIMAL, FLOAT, DOUBLE, TINYINT, SMALLINT - **String types**: STRING, VARCHAR, CHAR - **Date/time types**: DATE, TIMESTAMP - **Boolean type**: BOOLEAN - **Binary types**: BINARY (except for Paimon) **Format-specific Support**: - **Iceberg**: Additionally supports TIMESTAMP_NTZ type for timezone-free timestamps - **Paimon**: Does not support BINARY as partition key (currently binary as partition key causes issues in Spark) - **Hudi**: Based on Hive partition format, supports all Hive-compatible types **Notes**: - TIME and UUID types are supported at the code level, but since Spark does not support these types as partition keys, test cases do not include related test scenarios - In actual production environments, if these types are used, the dynamic partition pruning feature can still work normally

… for Data Lake Tables #53399 (#55040) bp: #53399

### What problem does this PR solve? Issue Number: close #xxx Related PR: ##53399

### What problem does this PR solve? Issue Number: close #xxx Related PR: #apache#53399

Issue Number: close #xxx Related PR: #apache#53399

… type (#59564) ### What problem does this PR solve? Related PR: #53399 Problem Summary: the serializePartitionValue function will return String value. But the binary type use String with utf8 will be cause data corrupted， and it is not same with origin data.

… type (apache#59564) ### What problem does this PR solve? Related PR: apache#53399 Problem Summary: the serializePartitionValue function will return String value. But the binary type use String with utf8 will be cause data corrupted， and it is not same with origin data.

suxiaogang223 marked this pull request as draft July 16, 2025 12:35

suxiaogang223 force-pushed the rf_partition_pruning_data_lake branch from 31cfbd2 to abeecff Compare July 18, 2025 09:36

suxiaogang223 marked this pull request as ready for review July 22, 2025 14:38

suxiaogang223 force-pushed the rf_partition_pruning_data_lake branch from a28e5d9 to 28966e7 Compare July 22, 2025 15:25

suxiaogang223 changed the title ~~[enhance](multi-catalog) impl runtime filter partition pruning for Iceberg~~ [enhance](multi-catalog) Runtime Filter Partition Pruning for Data Lake Tables Jul 22, 2025

morningman added the dev/3.1.x label Jul 23, 2025

morningman reviewed Jul 24, 2025

View reviewed changes

suxiaogang223 force-pushed the rf_partition_pruning_data_lake branch from ff87bdb to 511964b Compare July 24, 2025 14:19

morningman approved these changes Aug 7, 2025

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 7, 2025

CalvinKirs approved these changes Aug 7, 2025

View reviewed changes

hubgeter approved these changes Aug 7, 2025

View reviewed changes

morningman merged commit 33f96c4 into apache:master Aug 7, 2025
27 of 30 checks passed

github-actions bot added the dev/3.1.x-conflict label Aug 7, 2025

suxiaogang223 mentioned this pull request Aug 20, 2025

branch-3.1: [enhance](multi-catalog) Runtime Filter Partition Pruning for Data Lake Tables (#53399) #55040

Merged

suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Aug 25, 2025

Revert "[enhance](multi-catalog) Runtime Filter Partition Pruning for…

2d194af

… Data Lake Tables (apache#53399)" This reverts commit 8d98908.

morrySnow pushed a commit that referenced this pull request Aug 26, 2025

branch-3.1: [enhance](multi-catalog) Runtime Filter Partition Pruning…

2495fbc

… for Data Lake Tables #53399 (#55040) bp: #53399

morningman added dev/3.1.0-merged and removed dev/3.1.x dev/3.1.x-conflict labels Aug 26, 2025

morningman mentioned this pull request Sep 12, 2025

Support dynamic partition pruning: Optimize query efficiency for partitioned tables. #55997

Closed

suxiaogang223 deleted the rf_partition_pruning_data_lake branch September 23, 2025 03:18

0AyanamiRei mentioned this pull request Oct 30, 2025

[fix](broker-load) Fix the COLUMNS FROM PATH feature #57309

Merged

16 tasks

dataroaring pushed a commit that referenced this pull request Nov 6, 2025

[fix](broker-load) Fix the COLUMNS FROM PATH feature (#57309)

ed8ccaf

### What problem does this PR solve? Issue Number: close #xxx Related PR: ##53399

github-actions bot pushed a commit that referenced this pull request Nov 6, 2025

[fix](broker-load) Fix the COLUMNS FROM PATH feature (#57309)

6e82b7b

### What problem does this PR solve? Issue Number: close #xxx Related PR: ##53399

wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Nov 18, 2025

[fix](broker-load) Fix the COLUMNS FROM PATH feature (apache#57309)

eaa7a5c

### What problem does this PR solve? Issue Number: close #xxx Related PR: #apache#53399

0AyanamiRei added a commit to 0AyanamiRei/doris that referenced this pull request Nov 25, 2025

[fix](broker-load) Fix the COLUMNS FROM PATH feature (apache#57309)

e909696

Issue Number: close #xxx Related PR: #apache#53399

0AyanamiRei mentioned this pull request Nov 25, 2025

branch-3.1: [fix](broker-load) Fix the COLUMNS FROM PATH feature #57309 #58351

Merged

zhangstar333 mentioned this pull request Jan 5, 2026

[Bug](catalog) fix runtime filter partition pruning error with binary type #59564

Merged

16 tasks

[enhance](multi-catalog) Runtime Filter Partition Pruning for Data Lake Tables #53399

[enhance](multi-catalog) Runtime Filter Partition Pruning for Data Lake Tables #53399

Uh oh!

Conversation

suxiaogang223 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

PR Overview

Background

Main Features

1. Core Implementation

Frontend (FE) Changes

Backend (BE) Changes

2. Supported Query Types

3. Supported Data Types

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Jul 16, 2025

Uh oh!

suxiaogang223 commented Jul 22, 2025

Uh oh!

suxiaogang223 commented Jul 22, 2025

Uh oh!

doris-robot commented Jul 22, 2025

Cloud UT Coverage Report

Uh oh!

suxiaogang223 commented Jul 22, 2025

Uh oh!

doris-robot commented Jul 22, 2025

Cloud UT Coverage Report

Uh oh!

doris-robot commented Jul 22, 2025

Uh oh!

doris-robot commented Jul 22, 2025

Uh oh!

doris-robot commented Jul 22, 2025

Uh oh!

hello-stephen commented Jul 22, 2025

FE UT Coverage Report

Uh oh!

Uh oh!

morningman Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

morningman Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

suxiaogang223 commented Jul 24, 2025

Uh oh!

hello-stephen commented Jul 24, 2025

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Jul 24, 2025

BE UT Coverage Report

Uh oh!

suxiaogang223 commented Jul 24, 2025

Uh oh!

doris-robot commented Jul 24, 2025

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Jul 24, 2025

FE UT Coverage Report

Uh oh!

doris-robot commented Jul 24, 2025

Uh oh!

doris-robot commented Jul 24, 2025

Uh oh!

doris-robot commented Jul 24, 2025

Uh oh!

suxiaogang223 commented Jul 25, 2025

Uh oh!

hello-stephen commented Jul 25, 2025

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Jul 25, 2025

FE UT Coverage Report

Uh oh!

hello-stephen commented Aug 5, 2025

suxiaogang223 commented Jul 16, 2025 •

edited

Loading