-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory lock in raft #3926
Memory lock in raft #3926
Conversation
cc79619
to
c3683ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, generally LGTM. We could get rid of the ugly iterator finally...
src/kvstore/raftex/RaftPart.cpp
Outdated
replicatingLogs_ = false; | ||
return; | ||
|
||
// // Continue to process the original AppendLogsIterator if necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could check if cache empty here to continue? Otherwise all logs need to wait another round?
src/kvstore/raftex/RaftPart.cpp
Outdated
if (!promiseRef.isFulfilled()) { | ||
promiseRef.setValue(code); | ||
} | ||
return MergeAbleCode::MERGE_BOTH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just drop it instead of still sending out? It is quite easy to do it now.
I am not pretty sure we can survive this case:
- atomic op failed
- send the log out anyway
- leader change
- new leader atomic op succeeded...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
} | ||
} // namespace storage | ||
ret.batch = encodeBatchValue(batchHolder->getBatch()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering whether we could move all log-encoding into raft later, because we need to first encode here, and decode in raft again, it also introduce some extra string copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some insert and push_back could be replaced
src/kvstore/raftex/RaftPart.cpp
Outdated
@@ -1961,10 +1999,10 @@ bool RaftPart::checkAppendLogResult(nebula::cpp2::ErrorCode res) { | |||
{ | |||
std::lock_guard<std::mutex> lck(logsLock_); | |||
logs_.clear(); | |||
cachingPromise_.setValue(res); | |||
cachingPromise_.reset(); | |||
// cachingPromise_.setValue(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set the promise in logs_
and sendingLogs_
?
eba1a77
to
a53b365
Compare
52f9f1d
to
9ab0b7e
Compare
b884a87
to
329cbd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, we get rid of it finally...
@kikimo Do I merge it first or wait until test? This is a important change, maybe risky since many code in raft has been modified. |
@liuyu85cn could write something about this PR in release note. |
Hold, don't merge before I do the test. |
class AppendLogsIteratorFactory { | ||
public: | ||
AppendLogsIteratorFactory() = default; | ||
static void make(RaftPart::LogCache& cacheLogs, RaftPart::LogCache& sendLogs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nubility
@@ -511,6 +526,10 @@ TEST_F(RebuildIndexTest, RebuildEdgeIndexWithAppend) { | |||
RebuildIndexTest::env_->rebuildIndexGuard_->clear(); | |||
writer->stop(); | |||
sleep(1); | |||
for (int i = 1; i <= 5; ++i) { | |||
LOG(INFO) << "sleep for " << i << "s"; | |||
sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just increase the sleep time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because when run this case manually(watching the execution), it may be confused when sleep more than 1 seconds.
22003dc
to
8b39a36
Compare
b3f3779
to
ecc05cd
Compare
ecc05cd
to
8599d0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
4818e6a
to
9b69cf4
Compare
I think it's a good job, do you agree? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGood job!
* init upload * type * address comments: remove some comments * ?? Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>
This reverts commit 4112c7d.
What type of PR is this?
What problem(s) does this PR solve?
Issue(s) number:
Description:
Let add tag / edge use atomic Op again.
before and include Nebula 2.0, we use atomicOp to deal with some atomic operation,
e.g. change a tag/edge and its index in a batch.
It works, but as we implement this by send raft log in a sync way.
(all atomic op should be sent seperately, even if they are disjoint.) this is really slow.
In 2.6.x we use memory lock to deal with concurrent control.
We check early(in processor) if a request can run or not.
If it can, then we do the get/put as a normal log, which we can treat in batch.
if it can't, we return error.
How ever, some user complain that they meet so many "Conflict error".
That they need to retry, they believe it will slow down the bulk insert.
We explained that those conflict has to be retry either in Nebula it self or client,
but it looks like they didn't agree with us.
So now we implement a hybrid mode for this.
We had a memory lock in raft. just like Solution2. we check every logs to see if it can be combine with previous logs.
If it can, then we send them in batch.
if it can't, then we treat it like the atomicOp way (Solution1).
How do you solve it?
Special notes for your reviewer, ex. impact of this fix, design document, etc:
Checklist:
Tests:
Affects:
Release notes:
As describe in the "Description", conflict concurrent insert tag/edge will not report "Data conflict".
But execute in a queue.