-
Notifications
You must be signed in to change notification settings - Fork 409
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add RFC for 'Reduce the number of files used by TiFlash' (#7897)
close #7595
- Loading branch information
1 parent
93984ea
commit 278ed38
Showing
1 changed file
with
21 additions
and
0 deletions.
There are no files selected for viewing
21 changes: 21 additions & 0 deletions
21
docs/design/2023-08-04-reduce-the-number-of-files-used-by-TiFlash
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Reduce the number of files used by TiFlash | ||
|
||
- Author: [Yunyan Hong](https://github.com/hongyunyan) | ||
|
||
## Introduction | ||
|
||
This RFC introduces a new DMFile format V3, which merging the small files among DMFiles to cut down the number of inodes. | ||
|
||
## Background | ||
|
||
Currently, for each section in DMFile we generate a separate file. Specifically, for each column, it will contain `x.dat`, `x.mrk`, `x.null.dat`, `x.null.mrk` (if the data type for this column is nullable) and `x.idx` (if the column has min-max index). So when the table has many columns, each dmfile contains quite a number of files, which will lead to exhaustion of the number of inodes before filling the disk. | ||
|
||
Therefore, we consider merging the small files in DMFiles to cut down the number of inode and enhance TiFlash availability. | ||
|
||
## Detailed Desgin | ||
|
||
For these files in DMFiles, `x.idx`, `x.null.mrk`, `x.mrk` are always very small, stablely less than 4KB. Besides, when the table contains tiny data, the file sizes of `x.dat` and `x.null.dat` are also very small. Therefore, for each DMFile, we can always merge `x.idx`, `x.null.mrk` and `x.mrk` together. For `x.dat` and `x.null.dat`, we can decide whether they should be merged based on their actual file size with our min file size threshold. | ||
|
||
Furthermore, we do not merge the files of each column individually, but merge all the small files of each column collectively. In order to avoid the merged file being too large, we will set a max file size threshold. When the merged file reaches the threshold, we will close this merged file and write it into the next merged file. | ||
|
||
When upgrade the TiFlash version, we directly support the upgraded TiFlash to read the old version DMFiles and write into new version DMFiles when do compaction later. To downgrade the TiFlash version, we support a `DTTool` to support rewriting the new version DMFiles to old versions offline. |