Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Dec 10, 2025

What problem does this PR solve?

Problem Summary:

Refine some metrics in parquet reader profile.

  1. Rename some Statistics class name to make it readable. (There are too many Statistics struct with same name)
  2. Add read page header timer in parquet reader profile
  3. fix issue of invalid check logic for MergeRangeFileReader when setting prefetch buffer size
  4. fix issue that data cache profile is incorrect for external table can.

After #57204 and #58785

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman force-pushed the sort_parquet_statistics branch 2 times, most recently from 1524f8e to 99e4239 Compare December 10, 2025 16:01
@morningman morningman changed the title [opt](profile) sort out parquet reader profile [fix](profile) sort out parquet reader profile Dec 10, 2025
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 36883 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 14b5083b8258a13502f6f575632b818b8c0c2c42, data reload: false

------ Round 1 ----------------------------------
q1	17640	4314	4094	4094
q2	2032	364	244	244
q3	10169	1340	770	770
q4	10232	915	332	332
q5	7526	2142	1999	1999
q6	216	172	140	140
q7	1060	863	717	717
q8	9354	1486	1163	1163
q9	7036	5443	5410	5410
q10	6802	2394	1982	1982
q11	535	325	297	297
q12	657	718	576	576
q13	17767	3717	3015	3015
q14	282	297	285	285
q15	600	513	523	513
q16	942	953	860	860
q17	738	902	446	446
q18	7639	8045	7867	7867
q19	1408	1004	653	653
q20	410	380	254	254
q21	4792	4246	4270	4246
q22	1121	1020	1051	1020
Total cold run time: 108958 ms
Total hot run time: 36883 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4360	4340	4374	4340
q2	318	388	316	316
q3	2364	2878	2469	2469
q4	1421	1836	1437	1437
q5	4630	4313	4503	4313
q6	221	170	127	127
q7	2089	2014	1818	1818
q8	2674	2557	2615	2557
q9	7596	7467	7454	7454
q10	3166	3281	2835	2835
q11	601	529	495	495
q12	661	851	625	625
q13	3577	3734	3067	3067
q14	265	286	256	256
q15	533	489	496	489
q16	864	866	842	842
q17	1150	1351	1330	1330
q18	7510	7058	7003	7003
q19	845	808	849	808
q20	1903	1955	1799	1799
q21	4653	4243	4087	4087
q22	1049	1001	1003	1001
Total cold run time: 52450 ms
Total hot run time: 49468 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181453 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 14b5083b8258a13502f6f575632b818b8c0c2c42, data reload: false

query5	5084	618	468	468
query6	356	249	232	232
query7	4218	473	285	285
query8	302	250	246	246
query9	8770	2588	2602	2588
query10	550	370	332	332
query11	15336	15302	14557	14557
query12	200	120	114	114
query13	1272	518	403	403
query14	7381	3245	3013	3013
query14_1	2911	2880	2909	2880
query15	207	198	184	184
query16	883	483	465	465
query17	1181	686	575	575
query18	2700	434	341	341
query19	221	224	208	208
query20	123	117	110	110
query21	228	137	120	120
query22	3897	4067	3935	3935
query23	16601	16244	15893	15893
query23_1	16032	16073	16069	16069
query24	7102	1644	1246	1246
query24_1	1239	1218	1255	1218
query25	551	512	420	420
query26	1252	281	160	160
query27	2747	465	300	300
query28	4442	2159	2150	2150
query29	813	551	486	486
query30	322	244	221	221
query31	799	716	639	639
query32	80	72	71	71
query33	545	341	295	295
query34	899	904	542	542
query35	779	819	734	734
query36	859	893	829	829
query37	130	89	76	76
query38	3836	3919	3889	3889
query39	743	734	707	707
query39_1	706	696	710	696
query40	228	140	121	121
query41	77	62	62	62
query42	114	110	109	109
query43	439	427	404	404
query44	1356	760	756	756
query45	195	188	181	181
query46	882	985	628	628
query47	1667	1705	1606	1606
query48	334	328	255	255
query49	634	439	381	381
query50	658	300	224	224
query51	3809	3903	3858	3858
query52	107	112	104	104
query53	324	350	296	296
query54	283	259	250	250
query55	79	77	71	71
query56	308	297	297	297
query57	1131	1161	1080	1080
query58	275	258	252	252
query59	2379	2395	2277	2277
query60	330	323	294	294
query61	172	166	179	166
query62	715	673	619	619
query63	331	301	310	301
query64	5087	1414	1128	1128
query65	4035	3995	3941	3941
query66	1396	454	319	319
query67	15191	14971	14816	14816
query68	4685	1030	767	767
query69	495	348	313	313
query70	1056	1049	1000	1000
query71	366	310	286	286
query72	6070	5008	5177	5008
query73	712	680	309	309
query74	8954	8761	8625	8625
query75	3572	3535	3172	3172
query76	3899	1128	769	769
query77	504	406	304	304
query78	9603	9538	8942	8942
query79	1010	894	631	631
query80	703	663	559	559
query81	499	270	231	231
query82	232	135	115	115
query83	268	251	259	251
query84	256	123	100	100
query85	865	498	460	460
query86	320	300	279	279
query87	4098	4105	3905	3905
query88	3137	2303	2266	2266
query89	480	419	388	388
query90	1999	167	158	158
query91	172	167	143	143
query92	73	67	64	64
query93	985	906	572	572
query94	363	320	269	269
query95	589	392	308	308
query96	588	469	216	216
query97	2582	2654	2579	2579
query98	205	195	192	192
query99	1281	1306	1226	1226
Total cold run time: 259333 ms
Total hot run time: 181453 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.29 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 14b5083b8258a13502f6f575632b818b8c0c2c42, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.09	0.09
query4	1.60	0.11	0.11
query5	0.27	0.26	0.26
query6	1.16	0.65	0.62
query7	0.03	0.03	0.03
query8	0.05	0.05	0.04
query9	0.56	0.51	0.49
query10	0.55	0.55	0.56
query11	0.15	0.12	0.11
query12	0.15	0.12	0.11
query13	0.62	0.60	0.60
query14	1.00	0.99	0.98
query15	0.82	0.79	0.81
query16	0.39	0.42	0.39
query17	1.05	1.04	1.04
query18	0.24	0.21	0.22
query19	1.93	1.82	1.77
query20	0.02	0.02	0.01
query21	15.45	0.29	0.14
query22	4.88	0.05	0.06
query23	16.01	0.28	0.11
query24	2.21	0.90	0.41
query25	0.10	0.05	0.08
query26	0.15	0.13	0.14
query27	0.06	0.06	0.05
query28	4.80	1.21	1.02
query29	12.60	3.97	3.20
query30	0.28	0.14	0.13
query31	2.86	0.62	0.40
query32	3.24	0.54	0.45
query33	2.96	2.97	3.09
query34	16.67	5.16	4.50
query35	4.55	4.56	4.53
query36	0.67	0.49	0.49
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.05	0.03	0.03
query40	0.17	0.16	0.13
query41	0.08	0.04	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 99.06 s
Total hot run time: 27.29 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 81.46% (123/151) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.35% (18721/35092)
Line Coverage 39.08% (173172/443178)
Region Coverage 33.77% (134373/397902)
Branch Coverage 34.68% (57738/166490)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 90.73% (137/151) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.22% (24840/34394)
Line Coverage 58.96% (261007/442702)
Region Coverage 53.85% (216794/402625)
Branch Coverage 55.40% (92716/167365)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 90.73% (137/151) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.22% (24840/34394)
Line Coverage 58.96% (261007/442702)
Region Coverage 53.85% (216794/402625)
Branch Coverage 55.40% (92716/167365)

Copy link
Contributor

@hubgeter hubgeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CalvinKirs
CalvinKirs previously approved these changes Dec 15, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 15, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

1 similar comment
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

gavinchou
gavinchou previously approved these changes Dec 15, 2025
@morningman morningman marked this pull request as draft December 15, 2025 07:46
[opt](parquet) refine the parquet reader profile

2

3
@morningman morningman dismissed stale reviews from gavinchou and CalvinKirs via b043aef December 16, 2025 05:01
@morningman morningman force-pushed the sort_parquet_statistics branch from 14b5083 to b043aef Compare December 16, 2025 05:01
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 16, 2025
@morningman
Copy link
Contributor Author

run buildall

@morningman morningman marked this pull request as ready for review December 16, 2025 05:09
@doris-robot
Copy link

TPC-H: Total hot run time: 35459 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1b2164a60460901e9be6fb6f7ab06d93f496f17d, data reload: false

------ Round 1 ----------------------------------
q1	17615	4214	4113	4113
q2	2021	358	243	243
q3	10152	1379	777	777
q4	10210	830	321	321
q5	7548	2196	1931	1931
q6	187	173	136	136
q7	1021	877	704	704
q8	9356	1458	1206	1206
q9	7097	5367	5379	5367
q10	6806	2403	1954	1954
q11	529	336	303	303
q12	651	734	615	615
q13	17762	3715	3030	3030
q14	289	294	279	279
q15	623	518	515	515
q16	706	668	630	630
q17	705	751	585	585
q18	7563	7247	7156	7156
q19	1238	986	611	611
q20	405	367	246	246
q21	4226	3920	3790	3790
q22	1102	1017	947	947
Total cold run time: 107812 ms
Total hot run time: 35459 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4153	4094	4085	4085
q2	352	391	321	321
q3	2164	2701	2325	2325
q4	1349	1780	1290	1290
q5	4301	4772	4777	4772
q6	224	178	131	131
q7	2118	1987	1843	1843
q8	2680	2527	2617	2527
q9	7733	7686	7650	7650
q10	3104	3257	2819	2819
q11	609	512	491	491
q12	723	772	602	602
q13	3567	4018	3348	3348
q14	303	310	301	301
q15	543	510	497	497
q16	645	684	640	640
q17	1241	1638	1419	1419
q18	7970	7707	7754	7707
q19	930	917	892	892
q20	2017	2088	1936	1936
q21	5047	4264	4140	4140
q22	1078	1039	1000	1000
Total cold run time: 52851 ms
Total hot run time: 50736 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 179031 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1b2164a60460901e9be6fb6f7ab06d93f496f17d, data reload: false

query5	5442	612	461	461
query6	331	242	216	216
query7	4226	471	268	268
query8	295	255	259	255
query9	8771	2566	2579	2566
query10	569	364	351	351
query11	15317	15252	14644	14644
query12	185	122	116	116
query13	1252	483	392	392
query14	7057	3180	2969	2969
query14_1	2853	2867	2800	2800
query15	207	195	185	185
query16	918	495	459	459
query17	1119	722	607	607
query18	2719	459	353	353
query19	229	231	212	212
query20	122	119	111	111
query21	225	143	117	117
query22	3991	3969	3920	3920
query23	16610	16376	16058	16058
query23_1	16086	16088	16068	16068
query24	7265	1675	1255	1255
query24_1	1264	1236	1250	1236
query25	577	516	440	440
query26	1247	280	165	165
query27	2725	479	330	330
query28	4442	2145	2102	2102
query29	823	606	468	468
query30	321	250	218	218
query31	819	708	602	602
query32	76	71	69	69
query33	556	346	312	312
query34	916	919	551	551
query35	781	834	731	731
query36	870	927	823	823
query37	131	97	77	77
query38	2874	2850	2756	2756
query39	765	742	718	718
query39_1	699	713	701	701
query40	230	142	127	127
query41	73	70	69	69
query42	110	111	104	104
query43	431	441	412	412
query44	1325	748	739	739
query45	197	199	184	184
query46	885	998	622	622
query47	1728	1749	1656	1656
query48	325	339	255	255
query49	642	458	363	363
query50	674	295	222	222
query51	3812	3845	3963	3845
query52	110	110	100	100
query53	316	358	297	297
query54	309	276	274	274
query55	78	75	72	72
query56	314	373	298	298
query57	1123	1136	1086	1086
query58	267	256	249	249
query59	2363	2453	2394	2394
query60	314	306	296	296
query61	158	158	151	151
query62	713	683	628	628
query63	325	290	315	290
query64	4924	1311	1002	1002
query65	4028	3988	3975	3975
query66	1379	460	320	320
query67	15223	14971	14938	14938
query68	8392	1003	725	725
query69	486	340	309	309
query70	1103	986	980	980
query71	375	310	285	285
query72	6270	5023	5164	5023
query73	705	649	305	305
query74	8824	8818	8576	8576
query75	3195	3162	2744	2744
query76	4019	1156	777	777
query77	565	397	290	290
query78	9629	9826	8949	8949
query79	1794	844	610	610
query80	732	660	558	558
query81	537	269	235	235
query82	202	128	104	104
query83	280	254	253	253
query84	255	122	100	100
query85	895	507	455	455
query86	371	297	283	283
query87	3010	3064	2876	2876
query88	3532	2300	2287	2287
query89	475	428	398	398
query90	2242	165	159	159
query91	177	169	145	145
query92	91	70	73	70
query93	2164	909	570	570
query94	477	288	283	283
query95	564	327	361	327
query96	588	488	212	212
query97	2305	2333	2243	2243
query98	221	194	194	194
query99	1254	1313	1194	1194
Total cold run time: 263889 ms
Total hot run time: 179031 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1b2164a60460901e9be6fb6f7ab06d93f496f17d, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.05
query3	0.26	0.09	0.09
query4	1.60	0.11	0.11
query5	0.27	0.24	0.27
query6	1.17	0.64	0.63
query7	0.03	0.02	0.03
query8	0.06	0.04	0.05
query9	0.57	0.50	0.51
query10	0.55	0.56	0.55
query11	0.16	0.10	0.11
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	0.98	0.98	0.97
query15	0.81	0.79	0.82
query16	0.39	0.41	0.39
query17	1.05	1.05	1.05
query18	0.24	0.22	0.22
query19	1.98	1.87	1.84
query20	0.02	0.01	0.01
query21	15.45	0.28	0.14
query22	4.94	0.06	0.05
query23	16.06	0.29	0.11
query24	1.90	0.37	0.17
query25	0.07	0.08	0.07
query26	0.14	0.14	0.14
query27	0.08	0.05	0.05
query28	3.12	1.24	1.04
query29	12.59	4.05	3.21
query30	0.28	0.14	0.14
query31	2.83	0.63	0.39
query32	3.24	0.56	0.46
query33	3.07	2.98	3.12
query34	16.79	5.16	4.58
query35	4.57	4.57	4.54
query36	0.67	0.51	0.48
query37	0.10	0.06	0.06
query38	0.07	0.04	0.04
query39	0.04	0.02	0.03
query40	0.18	0.14	0.14
query41	0.09	0.04	0.03
query42	0.04	0.04	0.03
query43	0.06	0.03	0.03
Total cold run time: 97.43 s
Total hot run time: 27.26 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 82.00% (123/150) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.44% (18849/35274)
Line Coverage 39.22% (174530/444987)
Region Coverage 33.82% (134979/399123)
Branch Coverage 34.77% (58106/167104)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 91.33% (137/150) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.12% (25275/34567)
Line Coverage 60.29% (267891/444373)
Region Coverage 55.81% (225337/403752)
Branch Coverage 57.04% (95785/167917)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 91.95% (137/149) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.98% (24880/34565)
Line Coverage 58.69% (260799/444359)
Region Coverage 53.45% (215795/403750)
Branch Coverage 55.06% (92454/167917)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 17, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 1619824 into apache:master Dec 17, 2025
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants