-
Notifications
You must be signed in to change notification settings - Fork 50
Conversation
7719e45
to
b89e773
Compare
LogID lastLogIdCanCommit = std::min(lastLogId_, req.get_committed_log_id()); | ||
CHECK_LE(committedLogId_ + 1, lastLogIdCanCommit); | ||
if (commitLogs(wal_->iterator(committedLogId_ + 1, lastLogIdCanCommit))) { | ||
auto code = commitLogs(wal_->iterator(committedLogId_ + 1, lastLogIdCanCommit), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, when wait == false
, we will ignore whether the log commits successfully or not.but why do you need to set the committedLogId_ at line 1671?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L1671 only set committedLogId_ in resp, but we didn't update the committedLogId_
, when we commit successfully, we update the committedLogId_, and response the new one (Line 1661 1662).
src/kvstore/raftex/Host.cpp
Outdated
pro.setValue(std::move(t.value())); | ||
} | ||
}); | ||
return promise.getFuture(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be moved(not sure about this)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Our workflow looks like below, as for an write request
From the view of a storaged process, there would be many partition, we could be blocked in any phase if it takes longer time to process. Phase 5 and 7 are the main cause of blocking as before. When pressure is big enough, it is possible that all worker thread is busy. This would raise a lot problems, one of the most notorious among them is leader change.
Block reason:
IMO, there should be only one phase could be blocked, which is leader commit (phase 7). To achieve that, many works need to do:
don not hold raft lock when commit
will relax the restriction of raft lock. (we have another lock replicatingLogs to prevent concurrent replicate)follower delay commit if write stall
will make follower commit logs not to block (phase 5, follower could commit in async)heartbeat refactor
make heartbeat process in event base, even if all worker threads are blocked, there won't be unexpected election.So, the heartbeat logic is:
depends on vesoft-inc/nebula-common#497