-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invariant MoreUpToDateCorrectInv
violated in TLA+ specification
#3837
Comments
I am probably missing something here, but
suggests node 3 is running for election. But
suggests that other nodes are receiving votes!? |
So the following state means that node 1 requested a vote from node 3
Likewise,
I can update the current documentation of Lines 119 to 121 in 2a048e2
I could also add an invariant that checks that if a vote was granted (except by the candidate itself) then it must have been requested. |
This issue seems to have been caused by a problem with the logic for determining what set of servers constitutes a quorum when a candidate is trying to become a leader whilst having an uncommitted but signed reconfiguration tx in its log. In this particular case, node 1 becomes leader in term 3 after receiving votes from itself and node 3. However, the first entry in its log is a reconfiguration transaction which changes the set of servers to just node 2. This transaction has been committed by node 2 (and thus the configuration has changed) but node 1 does not know that yet. The full trace is here (ccf-reconfig-bug.txt) if anyone is interested. I believe a candidate should get a quorum in each of the current configurations before becoming a leader. The TLA+ spec calculates the set of servers in all active configurations and requires a quorum across this set instead. The code is below: Lines 328 to 330 in e4362ab
This explains why node 1 was able to become a leader without a vote from node 2. This approach could cause other issues as well, for instance, if a system was switching from nodes 1, 2 and 3 to nodes 4, 5, 6 then a new leader could be elected with only the support of nodes 3, 4, 5 and 6, leaving a majority quorum (node 1 and 2) in the old configuration who might not know about the new configuration. I have some thoughts about how to patch this in the spec but would like to discuss this to work out what CCF is doing in practice |
@heidihoward the implementation does use a quorum across all active nodes:
as opposed to a quorum in each active configuration. I have not been able to find any notes about why that choice was made, but I remember it was discussed at least once. I suspect MoreUpToDateCorrectInv does not hold with the currently implemented election scheme, which is based on committable watermarks rather than committed watermarks: https://github.com/microsoft/CCF/blob/main/src/consensus/aft/raft.h#L1642 The best entry point for history on why this change was made is probably #589. |
So I think there are two separate issues here which are worth considering separately:
|
Where
Node 2 was primary, until a partition occurs.
In the each election mode, Node 2 elects itself easily, continues. Node 0 and Node 1 continuously run elections, unsuccessfully, since they can never get a majority in In the across election mode, Node 2 also elects itself and continues. But Node 0 and Node 1 elect Node 1, which replicates its log. We think the latter is preferable, because:
Edit: while I think this example is useful in illustrating why it seems good for elections to work this way, it does not illustrate a breach of |
Ok, so it seems that:
despite:
Node 1 is emitting an |
I think at the heart of this discussion is the definition of committable. My understanding is that a committable transaction (one that is followed by a signed transaction) is not guaranteed to eventually be committed (and is thus not guaranteed to be on a majority of replicas). This differs from your claim if I think the problem with quorums across committable configurations is that a leader can be elected without the support of a majority quorum in the current configuration. Consider the set of replicas
|
Yes, that's correct.
That claim on its own is definitely incorrect. I think it holds in the context of the example I gave, but I agree not in general, and in particular not in the example you have given. As discussed, this has unpleasant implications for the convergence back to a commit previously advertised in at least some scenarios, like the one I've given in example. Clearly though, ending up with two actual leaders (not just a caretaker and a real leader) is much worse! |
Fixed by #3965 |
Describe the bug
TLC found a violation of the
MoreUpToDateCorrectInv
invariant when checking the current TLA+ specification in https://github.com/microsoft/CCF/blob/main/tla/raft_spec/ccfraft.tla.Note that this is different TLA+ specification (and thus a different issue) from #3828.
To Reproduce
Use TLC to check the current spec in https://github.com/microsoft/CCF/blob/main/tla/raft_spec/MCraft.tla using a term limit of 3. As you can see from the output below, TLC took >2 days to find this bug (after checking 37 billon states) on a well-resourced machine so I'll need to be patient.
Expected behavior
TLC should have completed checking without finding a safety violation.
Environment information
Host: Azure standard_HB120rs_v3
OS: Ubuntu 20.04 LTS
TLC: TLC2 Version 2.17 of 02 February 2022
Additional context
The final output & counter example provided by TLC is copied below:
The text was updated successfully, but these errors were encountered: