Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Sep 2, 2025

bp #54591

Problem Summary:
This PR includes three changes:
1. Support for file meta cache for ORC files.
2. Changed the file meta cache key from `file name + modification time`
to `file name + modification time / file size` , reduce the chance of
reading old meta.
3. Removed some unused code in the parquet meta.
4.  Users can use profile to observe whether the cache hits or not.
         `FileFooterHitCache`:  hit cache.
         `FileFooterReadCalls`: not hit cache  or disable cache.

BTW : disable cache : be conf `max_external_file_meta_cache_num` <= 0
@hubgeter hubgeter requested a review from morrySnow as a code owner September 2, 2025 08:30
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

hubgeter commented Sep 2, 2025

run buildall

@hubgeter
Copy link
Contributor Author

hubgeter commented Sep 3, 2025

run buildall

@hubgeter
Copy link
Contributor Author

hubgeter commented Sep 3, 2025

run buildall

@hubgeter
Copy link
Contributor Author

hubgeter commented Sep 3, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32366 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 04144d7ffccf93aa30de56ffb9dbf4e49c1d3b13, data reload: false

------ Round 1 ----------------------------------
q1	17566	5371	5431	5371
q2	2021	392	281	281
q3	12089	1227	736	736
q4	10557	873	443	443
q5	9756	2400	2099	2099
q6	200	163	130	130
q7	902	737	596	596
q8	9363	1418	1160	1160
q9	5179	4873	4878	4873
q10	6756	2257	1810	1810
q11	480	270	262	262
q12	326	349	207	207
q13	17791	3565	3004	3004
q14	234	229	208	208
q15	525	452	464	452
q16	417	434	362	362
q17	590	860	352	352
q18	6762	6359	6324	6324
q19	1211	942	556	556
q20	335	325	199	199
q21	2801	2103	1923	1923
q22	1070	1048	1018	1018
Total cold run time: 106931 ms
Total hot run time: 32366 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5495	5503	5505	5503
q2	231	325	233	233
q3	2232	2592	2320	2320
q4	1338	1797	1317	1317
q5	4400	4924	5011	4924
q6	164	163	130	130
q7	2049	1969	1819	1819
q8	2610	2824	2698	2698
q9	7277	7400	7320	7320
q10	3054	3332	2754	2754
q11	563	513	505	505
q12	722	771	619	619
q13	3488	3902	3243	3243
q14	298	307	287	287
q15	521	460	452	452
q16	469	495	428	428
q17	1247	1744	1287	1287
q18	7772	7459	7245	7245
q19	795	1165	1079	1079
q20	1997	2053	1881	1881
q21	5274	4893	4515	4515
q22	1112	1051	998	998
Total cold run time: 53108 ms
Total hot run time: 51557 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192218 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 04144d7ffccf93aa30de56ffb9dbf4e49c1d3b13, data reload: false

query1	941	412	408	408
query2	6197	1849	1861	1849
query3	8678	203	200	200
query4	33447	23775	23389	23389
query5	3591	586	455	455
query6	303	198	189	189
query7	4220	490	315	315
query8	292	239	228	228
query9	9383	2598	2573	2573
query10	490	334	261	261
query11	18701	15614	15166	15166
query12	170	108	109	108
query13	1573	556	436	436
query14	9803	7325	7206	7206
query15	245	195	175	175
query16	8023	653	476	476
query17	1563	787	603	603
query18	2182	433	329	329
query19	227	205	168	168
query20	128	122	123	122
query21	218	125	108	108
query22	4521	4759	4455	4455
query23	34925	34038	34123	34038
query24	7402	2682	2740	2682
query25	583	501	450	450
query26	862	320	175	175
query27	2048	494	351	351
query28	5545	2232	2203	2203
query29	712	605	460	460
query30	244	205	164	164
query31	985	921	817	817
query32	84	62	64	62
query33	548	387	324	324
query34	749	867	555	555
query35	834	815	753	753
query36	1027	1094	950	950
query37	117	102	63	63
query38	4025	4009	3921	3921
query39	1540	1456	1482	1456
query40	214	124	107	107
query41	51	53	47	47
query42	124	106	104	104
query43	514	529	493	493
query44	1349	838	813	813
query45	184	191	170	170
query46	889	1048	717	717
query47	1995	2025	1885	1885
query48	424	432	362	362
query49	731	491	432	432
query50	678	683	423	423
query51	7437	7312	7203	7203
query52	103	109	99	99
query53	247	266	203	203
query54	575	579	479	479
query55	83	82	78	78
query56	288	276	243	243
query57	1286	1266	1188	1188
query58	233	218	227	218
query59	2996	3125	2998	2998
query60	295	288	270	270
query61	115	124	126	124
query62	787	743	683	683
query63	228	201	192	192
query64	3801	998	663	663
query65	3357	3281	3354	3281
query66	784	413	314	314
query67	16567	15635	15485	15485
query68	7710	842	563	563
query69	498	304	274	274
query70	1124	1146	1090	1090
query71	417	291	256	256
query72	5191	3700	3820	3700
query73	633	747	352	352
query74	10163	9224	9113	9113
query75	3749	3166	2681	2681
query76	3384	1166	782	782
query77	759	376	284	284
query78	10371	10546	9653	9653
query79	3575	846	588	588
query80	669	531	436	436
query81	488	259	219	219
query82	608	120	86	86
query83	163	170	147	147
query84	242	100	84	84
query85	770	361	300	300
query86	400	314	265	265
query87	4319	4279	4283	4279
query88	4889	2427	2412	2412
query89	408	337	296	296
query90	1770	193	192	192
query91	141	142	109	109
query92	62	55	51	51
query93	2124	917	551	551
query94	637	406	298	298
query95	335	288	273	273
query96	508	611	286	286
query97	3169	3334	3175	3175
query98	229	204	210	204
query99	1317	1377	1302	1302
Total cold run time: 294283 ms
Total hot run time: 192218 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 44.13% (79/179) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.54% (12739/27972)
Line Coverage 36.41% (113579/311953)
Region Coverage 34.03% (64998/191016)
Branch Coverage 31.07% (34120/109808)

@doris-robot
Copy link

ClickBench: Total hot run time: 28.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 04144d7ffccf93aa30de56ffb9dbf4e49c1d3b13, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.07	0.06
query4	1.63	0.11	0.10
query5	0.53	0.49	0.51
query6	1.13	0.72	0.73
query7	0.03	0.02	0.02
query8	0.05	0.03	0.04
query9	0.56	0.50	0.50
query10	0.55	0.55	0.55
query11	0.16	0.10	0.11
query12	0.14	0.11	0.10
query13	0.62	0.60	0.58
query14	0.78	0.79	0.78
query15	0.85	0.84	0.82
query16	0.38	0.38	0.39
query17	1.07	1.04	1.07
query18	0.22	0.22	0.22
query19	1.77	1.74	1.86
query20	0.01	0.01	0.01
query21	15.43	0.92	0.58
query22	0.74	0.76	0.63
query23	15.11	1.39	0.55
query24	2.86	0.91	1.79
query25	0.25	0.13	0.07
query26	0.20	0.15	0.14
query27	0.06	0.05	0.04
query28	13.65	1.03	0.43
query29	12.56	3.94	3.27
query30	0.26	0.09	0.07
query31	2.81	0.59	0.39
query32	3.22	0.53	0.46
query33	2.99	3.01	3.08
query34	16.45	5.21	4.55
query35	4.58	4.58	4.61
query36	0.64	0.50	0.48
query37	0.08	0.06	0.06
query38	0.04	0.04	0.03
query39	0.04	0.02	0.03
query40	0.18	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 103.09 s
Total hot run time: 28.76 s

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 63.84% (113/177) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 76.41% (21014/27502)
Line Coverage 69.77% (216902/310902)
Region Coverage 67.72% (129861/191754)
Branch Coverage 61.24% (67550/110302)

@morningman morningman changed the title [support](orc)support orc file meta cache. (#54591) branch-3.1: [support](orc)support orc file meta cache. (#54591) Sep 4, 2025
@morrySnow morrySnow changed the title branch-3.1: [support](orc)support orc file meta cache. (#54591) branch-3.1: [support](orc)support orc file meta cache. #54591) Sep 4, 2025
@morrySnow morrySnow changed the title branch-3.1: [support](orc)support orc file meta cache. #54591) branch-3.1: [support](orc)support orc file meta cache. #54591 Sep 4, 2025
@morrySnow morrySnow merged commit c1d4af2 into apache:branch-3.1 Sep 4, 2025
20 of 22 checks passed
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants