Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use checkpoint to implement data snapshot when full replication #205

Merged
merged 11 commits into from
Mar 22, 2021
Merged

Use checkpoint to implement data snapshot when full replication #205

merged 11 commits into from
Mar 22, 2021

Conversation

ShooterIT
Copy link
Member

@ShooterIT ShooterIT commented Mar 18, 2021

Motivation

Currently, master creates a new snapshot by backup function of rocksdb when replicas ask for full replication. Although, rocksdb backup function can support incremental backup, it still cost much time, disk bandwidth, and disk space if there is no based backup or current backup exists for long time, and backup function really needs copy real files. If kvrocks holds a huge amount of data, copying data files would influence the writing or reading of rocksdb that results in latency of user request.

As we know, the function checkpoint of rocksdb would hard link data files instead of copying files, that has minor resource consumption, so we wish using checkpoint to implement data snapshot when full replication to avoid influencing user requests.

Solution

Master

  • As similar with full replication by using backup, master will try to create a new checkpoint if there is no checkpoint when replicas ask full replication, and tell replicas all files name list of checkpoint.
  • We can create a new checkpoint for every replica, but there will be many checkpoints that may occupy disk space because hard link could make deleted files existing during copying file period. So we hope to use one shared checkpoint, but replicas still wait for next checkpoint if current checkpoint is too old to satisfy feeding new coming replicas.
  • How to clean checkpoint, it is much difficult because replicas fetch data info and data files by different connections and threads. So i use current fetching files threads number as checkpoint reference count, master will delete checkpoint if there no thread to fetch files during several seconds. To avoid that a slow replica always keep the checkpoint alive too long time, master still delete the checkpoint if it is very long time (1 day) from creating checkpoint.

Replica

  • Replicas still fetch checkpoint files like before, after fetching all files, replicas will destroy old db, rename synced checkpoint dir to db dir and reload this checkpoint. To avoid loading wrong checkpoint causes db broken, we rename old db dir, and restore it if the synced checkpoint is corrupt to fail to open.
  • How to resume broken transfer based files. We store synced files into a specified dir (synced_checkpoint). Firstly, because we don't get file CRC by checkpoint to identify unique files, so we must clean the synced_checkpoint dir when changing master. Secondly, we need to clean up invalid files of master old checkpoint when we check existing files of master new checkpoint for skipping some transferred files.

Compatibility

As #200, kvrocks still support to replicate data from old version master if you set master-use-repl-port to yes. For detail, we still use old implementation to coping from old version master.

@ShooterIT ShooterIT added release notes major decision Requires project management committee consensus labels Mar 18, 2021
@ShooterIT ShooterIT marked this pull request as ready for review March 18, 2021 14:25
@ShooterIT ShooterIT requested review from karelrooted, git-hulk and Alfejik and removed request for karelrooted March 19, 2021 04:04
karelrooted
karelrooted previously approved these changes Mar 19, 2021
Copy link
Contributor

@karelrooted karelrooted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/replication.cc Outdated Show resolved Hide resolved
src/storage.cc Outdated Show resolved Hide resolved
git-hulk
git-hulk previously approved these changes Mar 20, 2021
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good implementation! LGTM except Fail => Failed in error message.

Another potential optimization was in GetFullReplDataInfo when multi-slaves were in full sync at the same time.

For example:

A slave comes at first and creating the checkpoint, then B slave follows closely, it would get an error if the new checkpoint was creating. The way to fix was by waiting for the new checkpoint to finish.

lock();
if (!sync_dir.IsExists() || checkpoint.Expired()) {
    CreateCheckpoint();
}
unlock();

if (checkpoint.Expired()) {
   return error;
}
Get Replication Data Info

But the current implementation was fine to me that B slave would restart full sync again.

src/replication.cc Outdated Show resolved Hide resolved
src/replication.cc Outdated Show resolved Hide resolved
Co-authored-by: hulk <hulk.website@gmail.com>
git-hulk
git-hulk previously approved these changes Mar 20, 2021
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cheers!!!

@Alfejik Alfejik merged commit 22a8c2b into apache:unstable Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major decision Requires project management committee consensus release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants