Skip to content

Commit

Permalink
Fix potential deadlock when switching master frequently
Browse files Browse the repository at this point in the history
Currently, the replication thread will wait for the worker exclusive
guard stop before closing the db. But it now stops the worker from
running new commands after acquiring the worker exclusive guard, and it
might cause deadlock if switches at the same time.

The following steps will show how it may happen:

- T0: client A sent `slaveof MASTER_IP0 MASTER_PORT0`, then the
replication thread was started and waiting for the exclusive guard.

- T1: client B sent `slaveof MASTER_IP1 MASTER_PORT1` and `AddMaster`
  will stop the previous replication thread, which is waiting for the
  exclusive guard. But the exclusive guard is acquiring by the current
  thread.

And the workaround is also straightforward, just stop workers from
running new commands by enabling `is_loading_` to true before acquiring
the lock in replication thread.
  • Loading branch information
git-hulk committed Sep 1, 2024
1 parent 6c85004 commit f0724b5
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions src/server/server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1353,14 +1353,16 @@ void Server::PrepareRestoreDB() {
// accessing, data migration task should be stopped before restoring DB
WaitNoMigrateProcessing();

// Workers will disallow to run commands which may access DB, so we should
// enable this flag to stop workers from running new commands. And wait for
// the exclusive guard to be released to guarantee no worker is running.
is_loading_ = true;

// To guarantee work threads don't access DB, we should release 'ExclusivityGuard'
// ASAP to avoid user can't receive responses for long time, because the following
// 'CloseDB' may cost much time to acquire DB mutex.
LOG(INFO) << "[server] Waiting workers for finishing executing commands...";
{
auto exclusivity = WorkExclusivityGuard();
is_loading_ = true;
}
{ auto exclusivity = WorkExclusivityGuard(); }

// Cron thread, compaction checker thread, full synchronization thread
// may always run in the background, we need to close db, so they don't actually work.
Expand Down

0 comments on commit f0724b5

Please sign in to comment.