Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft persistence bug #589

Closed
achamayou opened this issue Nov 26, 2019 · 4 comments · Fixed by #1641
Closed

Raft persistence bug #589

achamayou opened this issue Nov 26, 2019 · 4 comments · Fixed by #1641
Assignees
Labels

Comments

@achamayou
Copy link
Member

Our raft implementation can lose transactions that have been advertised as committed.

  1. Primary issues signature at i
  2. Primary gather f + 1 acks for signature at i
  3. Primary commits i, advertises all transactions up to i to callers as committed
  4. Primary fails
  5. Secondary gets elected, rolls back to last commit, earlier than i because primary did not have time to advertise i as committed
  6. i is lost

This was spotted before as #519 / #520, but the fix is incorrect.

Instead of using commit_idx as their log_idx/value to roll back to, secondaries must use the last verified index (ie. the index of the last signature transaction they verified), which is the logical equivalent of log_idx with added verifiability.
Using commit_idx instead is unnecessarily strong and creates a potential loss window.

Fixing this will also mitigate #521, because the suffix committed on the primary, verified but not yet committed on the followers will be on a majority of followers and therefore part of the future ledger post-election.

@achamayou achamayou added the bug label Nov 26, 2019
@achamayou achamayou self-assigned this Nov 26, 2019
@achamayou
Copy link
Member Author

Switching to last verified index isn't quite enough, because if the failure at 4. occurs after a majority of secondaries have acked the last verified index but before they have been notified that the commit index has moved to it, they will not be able to confirm to the user exactly what has been committed on failover, even though the transactions won't be lost.

Instead they will need to re-establish consensus over the verified suffix, which is effectively commited, since they cannot tell it is.

We therefore need to modify raft further to keep track of the secondaries' views on what the commit index is, and only confirm to the user that transactions are committed once a majority of followers has seen that commit index.

On election, voting must also happen by comparing the last commit index first, and the last verified index second.

@achamayou
Copy link
Member Author

This is also insufficient, because a node post-election would not be able to use its commit index as the new commit index (it could be further than the majority). I think an additional field and round trip is necessary.

@eddyashton
Copy link
Member

Quick notes from today's long discussions around this:

We believe the 'Fancy Election Scheme' is inevitable. After winning an election as new leader of the new consensus term, you may not immediately know where every follower is, so you may not know the consensus global commit that the previous primary reached. As far as Raft is concerned, this 'leader elect' is a full primary - they are sending append entries to the other nodes to catch them up. But publicly, to the clients, they must continue to look like a backup, advertising their previous global commit - initially, this is what they last heard from the previous primary, then this is advanced as they catch up the followers. Only once this 'leader elect' has a global commit at least as high as their last signed index can they claim to be a normal primary to clients. If they follow this rule, then the advertised global commit can never move backwards (though the last-thing-a-client-heard may be briefly unavailable, and then jump forward. around elections).

In practice this 'leader-elect' phase should be very short, as sufficient nodes are caught up that an election winner can immediately transition to Primary. We can make this even faster by including the follower's current received index in vote responses (otherwise we need an AppendEntries roundtrip to find out where they are, and thus what we can advertise).

We can shrink this unavailability window even further by doing BFT-style all-to-all announcements of the follower's current indices, shrinking some of the state we need to gather to confirm the global commit index after winning an election.

@eddyashton
Copy link
Member

Reading this issue again I realise my last comment doesn't really flow from the rest. As we now have a fix for this in #1641, I'll try to summarise our work on this clearly for posterity.

The core problem is summarised correctly above:

Instead of using commit_idx as their log_idx/value to roll back to, secondaries must use the last verified index (ie. the index of the last signature transaction they verified), which is the logical equivalent of log_idx with added verifiability.
Using commit_idx instead is unnecessarily strong and creates a potential loss window.

This "last verified index" is what we're now calling last_committable_idx. This is the most recent signature transaction we have received from the primary, and will be at least as far as the last_commit_idx (the last value the primary told us is committed, which in turn often lags the true committed point they may have advertised to clients).

The initial fix seemed to be to use last_committable_idx in place of last_commit_idx, but then we believed there was an issue with this approach:

Switching to last verified index isn't quite enough, because [the new primary after winning an election] will not be able to confirm to the user exactly what has been committed on failover, even though the transactions won't be lost.

We explored some complex workarounds for this, to try and find an advertisable commit point that winnable follower would always know was committed. Eventually we decided this was impossible - there is inevitably a stage where the primary knows something the followers don't, and the new primary must rediscover this in/after an election. The fear that "they will need to re-establish consensus over the verified suffix" isn't such a big problem - the new primary is sharing some suffix of the previous primary's transactions just as they would share any other transactions from prior terms.

Our implementation of this is what we call the 'Fancy Election Scheme', where the service is potentially in a limbo state for a period after an election. The knock-on implications of this for the service, and what it can advertise as globally committed, are described in my comment above.

The fix for the core issue, that commits which were advertised as committed could be rolled back, is fixed in #1641 by using last_committable_idx during elections, rather than last_commit_idx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants