[Feature]: Remove the Independent DeleteFiles
for the Iceberg Format Table
#1628
Closed
1 of 2 tasks
Labels
type:feature
Feature Requests
Description
For the Iceberg Format table, there are some scenarios that produce some DeleteFiles that are not related to any DataFile , which are called Independent DeleteFiles. These scenarios include:
Although these Independent DeleteFiles cannot be found when scanning files and do not affect read performance, they can cause file accumulation and put pressure on the file system.
We should remove these Independent
DeleteFiles
periodically.Use case/motivation
In our scenario, after several days of testing, we were able to find thousands of Independent DeleteFiles, including both equ-delete Files and pos-delete Files, even far more than the number of Data Files. Removing these files can greatly reduce the total number of files on the disk.
Describe the solution
I suggest removing Independent DeleteFiles during Orphan files cleanup because Independent DeleteFiles can be considered a special type of Orphan Files.
Of course, a table property, like
clean-independent-delete-files.enabled
(default true), should be introduced to control whether to clean Independent DeleteFiles for the table.We can further discuss whether this implementation is appropriate.
Subtasks
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: