[Enhancement](explain)Display deleteFileNum for FileScanNode when explain verbose by hubgeter · Pull Request #60308 · apache/doris

hubgeter · 2026-01-28T09:37:18Z

What problem does this PR solve?

Problem Summary:
This PR enhances the output of EXPLAIN VERBOSE for File Scan nodes by adding the following metrics:
dataFileNum=xxx, deleteFileNum=xxx, deleteSplitNum=xxx
Especially useful for iceberg/paimon/hive acid

These metrics provide more visibility into the underlying file and split layout, helping users better tune parameters and control query performance.
Details:
dataFileNum : The number of distinct data files that need to be read.
This is not equivalent to the number of splits, since a single data file can be divided into multiple splits.

deleteFileNum : The number of distinct delete files that need to be read.

deleteSplitNum : Added because the relationship between data files and delete files is many-to-many:
one data file may be associated with multiple delete files
one delete file may apply to multiple data files
Using deleteSplitNum / dataSplitNum, users can estimate the average number of delete splits that need to be read per data split.

Example:

mysql> explain verbose select * from iceberg.format_v3.dv_test_1w;
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                                               |
|   OUTPUT EXPRS:                                                                                                                               |
|     id[#0]                                                                                                                                    |
|     grp[#1]                                                                                                                                   |
|     value[#2]                                                                                                                                 |
|     ts[#3]                                                                                                                                    |
|   PARTITION: RANDOM                                                                                                                           |
|                                                                                                                                               |
|   HAS_COLO_PLAN_NODE: false                                                                                                                   |
|                                                                                                                                               |
|   VRESULT SINK                                                                                                                                |
|      MYSQL_PROTOCOL                                                                                                                           |
|                                                                                                                                               |
|   0:VICEBERG_SCAN_NODE(32)                                                                                                                    |
|      table: iceberg.format_v3.dv_test_1w                                                                                                      |
|      inputSplitNum=220, totalFileSize=720774, scanRanges=220                                                                                  |
|      partition=0/0                                                                                                                            |
|      backends:                                                                                                                                |
|        1769590309070                                                                                                                          |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00004-51-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2672      |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00003-50-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2852      |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00000-47-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2894      |
|          ... other 216 files ...                                                                                                              |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00001-48-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 58397 length: 13894 |
|          dataFileNum=10, deleteFileNum=1 deleteSplitNum=220                                                                               |
|      cardinality=33334, numNodes=1                                                                                                            |
|      pushdown agg=NONE                                                                                                                        |
|      tuple ids: 0                                                                                                                             |
|                                                                                                                                               |
| Tuples:                                                                                                                                       |
| TupleDescriptor{id=0, tbl=dv_test_1w}                                                                                                         |
|   SlotDescriptor{id=0, col=id, colUniqueId=1, type=bigint, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}         |
|   SlotDescriptor{id=1, col=grp, colUniqueId=2, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}           |
|   SlotDescriptor{id=2, col=value, colUniqueId=3, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}         |
|   SlotDescriptor{id=3, col=ts, colUniqueId=4, type=datetimev2(6), nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}  |
|                                                                                                                                               |
|                                                                                                                                               |
|                                                                                                                                               |
|                                                                                                                                               |
| ========== STATISTICS ==========                                                                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------+

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…lain verbose

Thearas · 2026-01-28T09:37:24Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

hubgeter · 2026-01-28T09:43:30Z

run buildall

doris-robot · 2026-01-28T10:11:47Z

TPC-H: Total hot run time: 31670 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 44287822f92867ea9a75f43236e6419c748de82e, data reload: false

------ Round 1 ----------------------------------
q1	17807	5282	5109	5109
q2	2023	307	186	186
q3	10231	1323	733	733
q4	10201	829	315	315
q5	7534	2099	1921	1921
q6	192	183	157	157
q7	859	712	623	623
q8	9287	1357	1137	1137
q9	5234	4911	4810	4810
q10	6766	1944	1562	1562
q11	511	279	275	275
q12	334	376	225	225
q13	17770	4045	3178	3178
q14	229	246	216	216
q15	907	818	814	814
q16	668	675	627	627
q17	637	771	499	499
q18	6732	6430	6290	6290
q19	1364	993	604	604
q20	387	347	226	226
q21	2610	1979	1895	1895
q22	350	310	268	268
Total cold run time: 102633 ms
Total hot run time: 31670 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5310	5284	5300	5284
q2	261	340	247	247
q3	2138	2629	2253	2253
q4	1358	1719	1272	1272
q5	4303	4170	4225	4170
q6	222	180	141	141
q7	1921	2180	1841	1841
q8	2687	2478	2458	2458
q9	7434	7384	7434	7384
q10	2871	3096	2628	2628
q11	534	471	446	446
q12	690	791	613	613
q13	3929	4447	3753	3753
q14	282	316	273	273
q15	870	836	838	836
q16	687	753	671	671
q17	1132	1299	1322	1299
q18	8112	7864	7980	7864
q19	868	843	878	843
q20	2074	2150	1984	1984
q21	4801	4470	4115	4115
q22	591	551	498	498
Total cold run time: 53075 ms
Total hot run time: 50873 ms

doris-robot · 2026-01-28T10:28:31Z

ClickBench: Total hot run time: 28.27 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 44287822f92867ea9a75f43236e6419c748de82e, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.04	0.04
query3	0.25	0.09	0.08
query4	1.60	0.12	0.11
query5	0.27	0.25	0.26
query6	1.16	0.67	0.67
query7	0.03	0.02	0.03
query8	0.05	0.04	0.03
query9	0.56	0.50	0.48
query10	0.54	0.53	0.55
query11	0.14	0.09	0.10
query12	0.15	0.10	0.10
query13	0.63	0.62	0.60
query14	1.08	1.06	1.05
query15	0.88	0.88	0.86
query16	0.39	0.41	0.39
query17	1.12	1.10	1.16
query18	0.22	0.22	0.21
query19	1.97	1.96	2.02
query20	0.02	0.02	0.02
query21	15.39	0.28	0.14
query22	5.02	0.05	0.05
query23	15.90	0.28	0.10
query24	1.47	0.34	0.31
query25	0.07	0.05	0.09
query26	0.14	0.13	0.13
query27	0.09	0.06	0.06
query28	3.43	1.17	0.96
query29	12.54	4.00	3.18
query30	0.28	0.14	0.11
query31	2.81	0.68	0.40
query32	3.23	0.60	0.50
query33	3.35	3.29	3.22
query34	16.20	5.50	4.73
query35	4.76	4.83	4.81
query36	0.65	0.50	0.49
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.04	0.03
query40	0.19	0.17	0.15
query41	0.09	0.04	0.03
query42	0.05	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 97.15 s
Total hot run time: 28.27 s

hello-stephen · 2026-01-28T10:51:17Z

FE UT Coverage Report

Increment line coverage 0.00% (0/70) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-01-28T12:17:14Z

FE Regression Coverage Report

Increment line coverage 20.00% (14/70) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-02-02T02:57:21Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-02-02T02:57:23Z

PR approved by anyone and no changes requested.

…lain verbose (#60308) ### What problem does this PR solve? Problem Summary: This PR enhances the output of EXPLAIN VERBOSE for File Scan nodes by adding the following metrics: `dataFileNum=xxx, deleteFileNum=xxx, deleteSplitNum=xxx` Especially useful for iceberg/paimon/hive acid These metrics provide more visibility into the underlying file and split layout, helping users better tune parameters and control query performance. Details: `dataFileNum` : The number of distinct data files that need to be read. This is not equivalent to the number of splits, since a single data file can be divided into multiple splits. `deleteFileNum` : The number of distinct delete files that need to be read. `deleteSplitNum` : Added because the relationship between data files and delete files is many-to-many: one data file may be associated with multiple delete files one delete file may apply to multiple data files Using deleteSplitNum / dataSplitNum, users can estimate the average number of delete splits that need to be read per data split. Example: ``` mysql> explain verbose select * from iceberg.format_v3.dv_test_1w; +-----------------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +-----------------------------------------------------------------------------------------------------------------------------------------------+ | PLAN FRAGMENT 0 | | OUTPUT EXPRS: | | id[#0] | | grp[#1] | | value[#2] | | ts[#3] | | PARTITION: RANDOM | | | | HAS_COLO_PLAN_NODE: false | | | | VRESULT SINK | | MYSQL_PROTOCOL | | | | 0:VICEBERG_SCAN_NODE(32) | | table: iceberg.format_v3.dv_test_1w | | inputSplitNum=220, totalFileSize=720774, scanRanges=220 | | partition=0/0 | | backends: | | 1769590309070 | | s3://warehouse/wh/format_v3/dv_test_1w/data/00004-51-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2672 | | s3://warehouse/wh/format_v3/dv_test_1w/data/00003-50-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2852 | | s3://warehouse/wh/format_v3/dv_test_1w/data/00000-47-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2894 | | ... other 216 files ... | | s3://warehouse/wh/format_v3/dv_test_1w/data/00001-48-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 58397 length: 13894 | | dataFileNum=10, deleteFileNum=1 deleteSplitNum=220 | | cardinality=33334, numNodes=1 | | pushdown agg=NONE | | tuple ids: 0 | | | | Tuples: | | TupleDescriptor{id=0, tbl=dv_test_1w} | | SlotDescriptor{id=0, col=id, colUniqueId=1, type=bigint, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null} | | SlotDescriptor{id=1, col=grp, colUniqueId=2, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null} | | SlotDescriptor{id=2, col=value, colUniqueId=3, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null} | | SlotDescriptor{id=3, col=ts, colUniqueId=4, type=datetimev2(6), nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null} | | | | | | | | | | ========== STATISTICS ========== | +-----------------------------------------------------------------------------------------------------------------------------------------------+ ```

…ode when explain verbose #60308 (#60437) Cherry-picked from #60308 Co-authored-by: daidai <changyuwei@selectdb.com>

[Enhancement](explain)Display deleteFileNum for FileScanNode when exp…

4428782

…lain verbose

morningman added dev/3.1.x dev/4.0.x labels Jan 29, 2026

morningman approved these changes Feb 2, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 2, 2026

github-actions bot added the reviewed label Feb 2, 2026

CalvinKirs approved these changes Feb 2, 2026

View reviewed changes

suxiaogang223 approved these changes Feb 2, 2026

View reviewed changes

morningman merged commit cff565f into apache:master Feb 2, 2026
35 of 36 checks passed

github-actions bot mentioned this pull request Feb 2, 2026

branch-4.0: [Enhancement](explain)Display deleteFileNum for FileScanNode when explain verbose #60308 #60437

Merged

yiguolei pushed a commit that referenced this pull request Feb 3, 2026

branch-4.0: [Enhancement](explain)Display deleteFileNum for FileScanN…

9c57150

…ode when explain verbose #60308 (#60437) Cherry-picked from #60308 Co-authored-by: daidai <changyuwei@selectdb.com>

yiguolei added dev/4.0.4-merged and removed dev/4.0.x labels Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement](explain)Display deleteFileNum for FileScanNode when explain verbose#60308

[Enhancement](explain)Display deleteFileNum for FileScanNode when explain verbose#60308
morningman merged 1 commit intoapache:masterfrom
hubgeter:display_delete_num

hubgeter commented Jan 28, 2026 •

edited

Loading

Uh oh!

Thearas commented Jan 28, 2026

Uh oh!

hubgeter commented Jan 28, 2026

Uh oh!

doris-robot commented Jan 28, 2026

Uh oh!

doris-robot commented Jan 28, 2026

Uh oh!

hello-stephen commented Jan 28, 2026

Uh oh!

hello-stephen commented Jan 28, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

hubgeter commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Jan 28, 2026

Uh oh!

hubgeter commented Jan 28, 2026

Uh oh!

doris-robot commented Jan 28, 2026

Uh oh!

doris-robot commented Jan 28, 2026

Uh oh!

hello-stephen commented Jan 28, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Jan 28, 2026

FE Regression Coverage Report

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

hubgeter commented Jan 28, 2026 •

edited

Loading