-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a single set of Progresses for ProgressSet. #108
Conversation
I don't have strong opinion on separate them. |
@siddontang Seems this doesn't slow down Raft's campaign, I baselined this against master:
|
13c079d
to
cbebfc8
Compare
I still worry the performance reduction when calculating quorum, maybe it is better to use a var to store the quorum. /cc @BusyJay |
@siddontang Ok, that needs to be done for joint consensus anyways so I'll add it to this PR. :) |
172f5df
to
33f3ed4
Compare
@siddontang PTAL, this uses |
33f3ed4
to
8154d37
Compare
I introduced another PR, #119 that builds off this since I think it is ready for merge. I can always rebase it. :) |
src/raft.rs
Outdated
@@ -593,13 +584,14 @@ impl<T: Storage> Raft<T> { | |||
pub fn maybe_commit(&mut self) -> bool { | |||
let mut mis_arr = [0; 5]; | |||
let mut mis_vec; | |||
let mis = if self.prs().voters().len() <= 5 { | |||
&mut mis_arr[..self.prs().voters().len()] | |||
let voters = self.prs().voters().count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using num_voters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 It's a good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use two ProgressSet
?
src/progress.rs
Outdated
), | ||
configuration: Configuration { | ||
voters: HashSet::with_capacity_and_hasher( | ||
voters + learners, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sum them up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's a good point!
This is so we can move around Additionally, this means that during a Joint Consensus state we can avoid having to sync changes and manage multiple As you can see in #119 the next PR removes I do not want to create a mess, or create ways for us to make mistakes. :) |
a68d4cb
to
f59f2db
Compare
572d5d5
to
7af6379
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
src/progress.rs
Outdated
} | ||
if self.learners.contains_key(&id) { | ||
pub fn insert_voter(&mut self, id: u64, mut pr: Progress) -> Result<(), Error> { | ||
if self.learner_ids().contains(&id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the progress will not exist at all. Hence, only one query is necessary even the flag is removed.
src/progress.rs
Outdated
.union(&self.configuration.learners) | ||
.all(|v| self.progress.contains_key(v)) | ||
); | ||
assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid the assert is not enough. What if there are 1, 2, 3
in voters and 3, 4, 5
in learners, 1, 2, 3, 4, 5, 6
in progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I'll make it check both ways.
src/raft.rs
Outdated
@@ -244,7 +236,7 @@ impl<T: Storage> Raft<T> { | |||
raft_log, | |||
max_inflight: c.max_inflight_msgs, | |||
max_msg_size: c.max_size_per_msg, | |||
prs: Some(ProgressSet::new(peers.len(), learners.len())), | |||
prs: Some(ProgressSet::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use with_capacity
?
src/raft.rs
Outdated
@@ -268,14 +260,13 @@ impl<T: Storage> Raft<T> { | |||
tag: c.tag.to_owned(), | |||
}; | |||
for p in peers { | |||
let pr = new_progress(1, r.max_inflight); | |||
let pr = Progress::new(r.raft_log.last_index(), r.max_inflight, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right. I'm sorry. 😞
src/raft.rs
Outdated
@@ -819,7 +810,7 @@ impl<T: Storage> Raft<T> { | |||
// Only send vote request to voters. | |||
let prs = self.take_prs(); | |||
prs.voters() | |||
.keys() | |||
.map(|(k, _)| k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it's exact what voter_ids
does.
Thanks for the really great and detailed reviews @BusyJay , @overvenus . I've resolved your comments, can you take another look? |
fa35933
to
f999da2
Compare
ca9965c
to
a81ae8f
Compare
a81ae8f
to
3a7e2e4
Compare
src/progress.rs
Outdated
// Determine the correct error to return. | ||
if self.learner_ids().contains(&id) { | ||
return Err(Error::Exists(id, "learners")); | ||
} else if self.voter_ids().contains(&id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check seems redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think inserting a voter who is already a voter should be silently ignored. It is a clear sign that the user thinks the system is in a different state than it is. Performance wise this is completely non-consequential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean if it exists in progress
and not in learner_ids
, then it must be in voter_ids
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose. I didn't really think hard about it since we cannot do the progress
check in the final code anyways, so this is just interm code until #101 . I will remove it.
src/raft.rs
Outdated
if (!self.prs().voters().is_empty() || !self.prs().learners().is_empty()) | ||
&& !self.is_learner | ||
{ | ||
if (self.prs().iter().len() != 0) && !self.is_learner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
()
is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we need more assert in tests to make sure the refactor is correct.
src/raft.rs
Outdated
if let Some(progress) = self.prs().get(id) { | ||
// If progress.is_learner == learner, then it's already inserted as what it should be, return early to avoid error. | ||
if progress.is_learner == learner { | ||
info!("Ignoring redundant insert."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tag
is missing. And better includes the target id.
src/raft.rs
Outdated
self.is_learner = false; | ||
// If progress.is_learner == false, and learner == true, then it's a demotion, return early to avoid an error. | ||
if !progress.is_learner && learner { | ||
info!("Ignoring voter demotion."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
src/raft.rs
Outdated
"Adding node (learner: {}) with ID {} to peers.", | ||
learner, id | ||
); | ||
let progress = Progress::new(self.raft_log.last_index(), self.max_inflight, learner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more readable to me to move this to L1867. Otherwise it may be confusing as L1855 overlaps the variable name in its scope. And I think it should be last_index + 1
.
src/raft.rs
Outdated
} | ||
self.is_learner = learner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think it's only necessary to be updated when id == self.id
.
tests/integration_cases/test_raft.rs
Outdated
@@ -2730,8 +2728,13 @@ fn test_restore() { | |||
s.get_metadata().get_term() | |||
); | |||
assert_eq!( | |||
sm.prs().nodes(), | |||
s.get_metadata().get_conf_state().get_nodes() | |||
sm.prs().voter_ids().iter().cloned().collect::<HashSet<_>>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just *sm.prs().voter_ids()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it was forgotten to be changed many comments ago.
tests/integration_cases/test_raft.rs
Outdated
|
||
// remove all nodes from cluster | ||
r.remove_node(1); | ||
assert!(r.prs().nodes().is_empty()); | ||
assert_eq!(r.prs().voter_ids().len(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about r.prs().voter_ids().is_empty()
?
tests/integration_cases/test_raft.rs
Outdated
@@ -3013,8 +3022,10 @@ fn test_raft_nodes() { | |||
]; | |||
for (i, (ids, wids)) in tests.drain(..).enumerate() { | |||
let r = new_test_raft(1, ids, 10, 1, new_storage()); | |||
if r.prs().nodes() != wids { | |||
panic!("#{}: nodes = {:?}, want {:?}", i, r.prs().nodes(), wids); | |||
let voter_ids = r.prs().voter_ids().iter().cloned().collect::<HashSet<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use r.prs().voter_ids()
here and use *voter_ids != wids
at L3027.
tests/integration_cases/test_raft.rs
Outdated
assert_eq!(n1.prs().learner_nodes(), vec![2]); | ||
assert!(n1.prs().learners()[&2].is_learner); | ||
assert_eq!( | ||
n1.prs().learner_ids().iter().cloned().collect::<Vec<_>>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the wrong data type. To make you happy I will unwrap the value and check it directly.
tests/integration_cases/test_raft.rs
Outdated
assert_eq!(n1.prs().nodes(), vec![1]); | ||
assert_eq!(n1.prs().learner_nodes(), vec![]); | ||
assert_eq!( | ||
n1.prs().voter_ids().iter().cloned().collect::<Vec<_>>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the wrong data type. To make you happy I will unwrap the value and check it directly.
tests/integration_cases/test_raft.rs
Outdated
|
||
n1.remove_node(1); | ||
assert!(n1.prs().nodes().is_empty()); | ||
assert!(n1.prs().learner_nodes().is_empty()); | ||
assert_eq!(n1.prs().voter_ids().len(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems is_empty
is OK.
382bf6f
to
66c6aad
Compare
The motivation for this PR may be unclear until you consider future work. But, shortly, this is partially in preparation for Joint Consensus.
This is a step to allowing us to use different configurations (eg voter/learner topology) on top of the same Progress states. This will mean we can have
ProgressSet
be aware of its peer set changing, and able to have calls likeself.prs().voters()
orself.prs().learner_nodes()
always remain accurate.Rationale
Currently we store
Progress
data, which are of a non-trivial size, in twoFxHashMap
s inProgressSet
. They are holding voters and learners respectively. Duringadd_node()
,add_learner()
,remove_node()
, andpromote_learner()
we currently check sometimes multiple HashMaps, and at times need to moveProgress
values between maps. We don't need to do this.Typical use cases for
ProgressSet
include iterating over these peers (such asself.prs().voters()
), fetching individual peers (such asself.prs().voters().get(id)
), checking membership (self.prs().voters().contains_key()
), removing nodes (either role), adding voters, adding learners, promoting learners, or removing learners.Scanning the entire set is done by chaining iterators over the two hashmaps already, so storing them in a single HashMap has no penalty. Returning Iterators where possible instead of realized structures means we can possibly save on allocations (you can do
self.prs().voters().any(_)
faster now).Scanning over a specific subset (eg voter, learner) is slightly slower since the iterator returned is internally filtering out non-voter/non-learners. Since most node sets are fairly small, and the check is a simple bool, this performance change is not dramatic. (This can be possibly optimized, but I'm not sure it's worth the complexity).
Fetching individual nodes was optimized since it is always a lookup now, not possibly two lookups (in the case of getting a node without the specifier).
Removing nodes should also be faster (single lookup).
Promoting Learners now is just changing a boolean (but this will likely change later for Joint Consensus).
Future Work
In the next PR my thought is to introduce
poll
andquorum
logic toProgressSet
, instead of being inRaft
(Raft
can still proxy to them). Why? When we enter a joint consensus (#101 ) state we no longer can just check if# of votes >= quorum
. The cluster needs to account for the votes of nodes in two separate configurations. So the check for elections is# of votes >= quorum
or(# votes_old >= quorum_old) && (# votes_new >= quorum_new)
depending on the state of the system.I'd like to follow with a PR to have
ProgressSet
hold aConfState
(a vec of voters, and avec of learners in a protobuf), and this
FxHashMap<u64, Progress>
thatProgressSet
alreadyholds. This will allow us to in the future use an enum inside the
ProgressSet
:It may be wise to use a different datatype for the
ConfStates
than the protobuf though. This needs to be explored. Perhaps aFxHashMap<u64, Weak<Progress>>
pointing back into the actual Progress, so we can just follow the pointer.This will allow us to quickly pass lists of IDs to callers and compose iterators of progress with chains, mapping into the HashMap. (If we do this, the Weak/Rc idea above is more compelling perhaps).
Performance Impact
There were no major performance impacts. A few benchmarks were slightly slower or slightly faster, but no remarkable impacts. Note many tests are a small % slower due to the additional allocation involved since we're now holding a bit of extra data.