Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize consistency checks for deleted files #189

Merged
merged 1 commit into from
Aug 27, 2020

Conversation

Connor1996
Copy link
Member

This PR optimizes consistency checks performance for the case with a lot of deleted files.

When doing DeleteFilesInRange with force_consistency_checks on for a large range which spans about 16.5K SST files, we found it spent most of the time in LogAndApply(the time in red rectangle).
image

In CheckConsistencyForDeletes, it traverses the whole LSM to check if the file existed in the previous version for every deleted file. In the case of a lot of deleted files such as DeleteFilesInRange, it would waste too much time on this operation. Even worse, it is done with db mutex held, so it may greatly affect the foreground write performance.

After making the check in batch with only one round of traverse of the LSM, the time is greatly reduced.
image

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996 Connor1996 changed the title improve consistency checks for deleted files Optimize consistency checks for deleted files Aug 27, 2020
@Connor1996 Connor1996 requested a review from yiwu-arbug August 27, 2020 06:21
Copy link
Collaborator

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Connor1996 Connor1996 merged commit c1178cd into tikv:6.4.tikv Aug 27, 2020
@Connor1996 Connor1996 deleted the optimize-check branch August 27, 2020 06:36
Connor1996 added a commit to Connor1996/rocksdb that referenced this pull request Sep 8, 2020
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
yiwu-arbug pushed a commit that referenced this pull request Sep 8, 2020
This PR optimizes consistency checks performance for the case with a lot of deleted files. 

When doing `DeleteFilesInRange` with `force_consistency_checks` on for a large range which spans about 16.5K SST files, we found it spent most of the time in `LogAndApply`(the time in red rectangle).
![image](https://user-images.githubusercontent.com/13497871/89766469-c8ab4b80-db2a-11ea-9064-9c0a969e5453.png)

In `CheckConsistencyForDeletes`, it traverses the whole LSM to check if the file existed in the previous version for every deleted file. In the case of a lot of deleted files such as `DeleteFilesInRange`, it would waste too much time on this operation. Even worse, it is done with db mutex held, so it may greatly affect the foreground write performance.

After making the check in batch with only one round of traverse of the LSM, the time is greatly reduced.
![image](https://user-images.githubusercontent.com/13497871/89766495-d1038680-db2a-11ea-961b-cb009e93cdd6.png)

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@tabokie tabokie mentioned this pull request May 9, 2022
39 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants