-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use extra port for replication #200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
hi @git-hulk @karelrooted Currently i just implement basic function. we still need to test its performance although it doesn't have damage. |
It should be fine to merge the PR since the master was defined as unstable now. Also, those problems weren't imported by this PR so makes a lot of sense to file a new PR. |
Currently, master creates a new snapshot by backup function of rocksdb when replicas ask for full replication. Although, rocksdb backup function can support incremental backup, it still cost much time, disk bandwidth, and disk space if there is no based backup or current backup exists for long time, and backup function really needs copy real files. If kvrocks holds a huge amount of data, copying data files would influence the writing or reading of rocksdb that results in latency of user request. As we know, the function checkpoint of rocksdb would hard link data files instead of copying files, that has minor resource consumption, so we wish using checkpoint to implement data snapshot when full replication to avoid influencing user requests. The followings are some the details of solution. * Master - As similar with full replication by using backup, master will try to create a new checkpoint if there is no checkpoint when replicas ask full replication, and tell replicas all files name list of checkpoint. - We can create a new checkpoint for every replica, but there will be many checkpoints that may occupy disk space because hard link could make deleted files existing during copying file period. So we hope to use one shared checkpoint, but replicas still wait for next checkpoint if current checkpoint is too old to satisfy feeding new coming replicas. - How to clean checkpoint, it is much difficult because replicas fetch data info and data files by different connections and threads. So i use current fetching files threads number as checkpoint reference count, master will delete checkpoint if there no thread to fetch files during several seconds. To avoid that a slow replica always keep the checkpoint alive too long time, master still delete the checkpoint if it is very long time (1 day) from creating checkpoint. * Replica - Replicas still fetch checkpoint files like before, after fetching all files, replicas will destroy old db, rename synced checkpoint dir to db dir and reload this checkpoint. To avoid loading wrong checkpoint causes db broken, we rename old db dir, and restore it if the synced checkpoint is corrupt to fail to open. - How to resume broken transfer based files. We store synced files into a specified dir (synced_checkpoint). Firstly, because we don't get file CRC by checkpoint to identify unique files, so we must clean the synced_checkpoint dir when changing master. Secondly, we need to clean up invalid files of master old checkpoint when we check existing files of master new checkpoint for skipping some transferred files. For backward compatibility, as #200, kvrocks still support to replicate data from old version master if you set 'master-use-repl-port' to yes.
As we know, in order to implement replication, kvrocks has specified replication threads that use 'port + 1' as listening port, that may make somethings easy, but it is not convenient to deploy and maintain.
In this PR, kvrocks doesn't use extra port for replication instead of processing replication requests in general work threads.
Some problems we need to care
Blocking
We can't block work threads for processing replication requests. Currently, for replication commands,
_fetch_meta
and_fetch_file
are blocked, in this commit, we process both commands in new threads.Compatibility
We provide a configuration item
master-use-repl-port
to make new version able to replicate old version master. If the masters are old version, you should setmaster-use-repl-port
to yes, otherwise you should set it to no.