Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update snapshot commit rules #1972

Merged
merged 19 commits into from
Dec 3, 2020

Conversation

jumaffre
Copy link
Contributor

@jumaffre jumaffre commented Dec 1, 2020

A snapshot now becomes committed once there has been two commits on the snapshot evidence. Waiting for a second commit guarantees that there will be a proof (i.e. signature) of the commit of the snapshot evidence in the ledger.

As snapshot file names now include the seqno at which the snapshot evidence was committed, starting up from the latest committed snapshot on join/recovery should always succeed. This is achieved by picking the latest committed snapshot whose evidence commit seqno is included in the ledger.

This should simplify #1925 and the overall safety of the join/recover from snapshot procedure, at the cost of waiting a little longer (i.e. another round of commit) for snapshots to be committed.

Note that if the service is completely idle after the snapshot evidence has been committed, the snapshot won't actually be committed until the next write transaction is committed.

Edit: The snapshot evidence seqno is now included in the snapshot file name as soon as the snapshot is serialised to disk.

@jumaffre jumaffre requested a review from a team as a code owner December 1, 2020 17:33
@eddyashton
Copy link
Member

2 slightly meta points:

  1. I think we need to give this concept a different name than 'committed'. We already have overloaded local vs global commit (and use commit within the KV code for even earlier steps), and we want 'committed' to mean something more precise, in relation to a transaction's state within the consensus. In fact I'm not completely sure what we're waiting for here - 2 signatures? Or a signature whose global commit marker has moved past a certain point, followed by a signature whose global commit marker is past that earlier signature? "A snapshot now becomes final" or "A snapshot now becomes useable" once this state is reached, but make it clear this snapshot state is distinct from the state of any of its individual contents, or even the evidence for the snapshot which is persisted into the ledger.

  2. This seems like a major shortcoming:

Note that if the service is completely idle after the snapshot evidence has been committed, the snapshot won't actually be committed until the next write transaction is committed.

Means we can never produce a snapshot of the service's final state, with (IUIC) an arbitrarily long suffix of transactions that are in the ledger, committed, but not snapshotted?

@@ -69,9 +69,9 @@ Once a snapshot has been generated by the primary, operators can copy or mount t

To validate the snapshot a node is added from, the node first replays the transactions in the ledger following the snapshot until the proof that the snapshot was committed by the service to join is found. This process requires operators to copy the ledger suffix to the node's ledger directory. The validation procedure is generally quick and the node will automatically join the service one the snapshot has been validated. On recovery, the snapshot is automatically verified as part of the usual ledger recovery procedure.

For example, if a node is added using the ``snapshot_1000.committed_1250`` snapshot file, operators should copy the ledger files containing the sequence numbers ``1000`` to ``1250`` to the directories specified by ``--ledger-dir`` (or ``--read-only ledger-dir``). This would involve copying the ledger files following the snapshot sequence number ``1000`` until the evidence sequence number ``1250``, e.g. ``ledger_1001-1200.committed`` and ``ledger_1201-1500.committed``, to the joining node's ledger directory.
For example, if a node is added using the ``snapshot_1000.committed_1250_1300`` snapshot file, operators should copy the ledger files containing all the sequence numbers between ``1000`` to ``1300`` to the directories specified by ``--ledger-dir`` (or ``--read-only ledger-dir``). This would involve copying the ledger files following the snapshot sequence number ``1000`` until the evidence commit sequence number ``1300``, e.g. ``ledger_1001-1200.committed`` and ``ledger_1201-1500.committed``, to the joining node's ledger directory.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the new _<evidence_commit_seqno> suffix at the end of the snapshot file name.

@@ -17,23 +19,12 @@ namespace asynchost
{
private:
const std::string snapshot_dir;
const Ledger& ledger;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host snapshot manager now takes a reference to the ledger so that it can verify that the latest committed snapshot has a proof of evidence in the ledger.

@ghost
Copy link

ghost commented Dec 1, 2020

snapshot_commit_audit@16251 aka 20201203.20 vs master ewma over 50 builds from 15498 to 16236
images

@jumaffre
Copy link
Contributor Author

jumaffre commented Dec 2, 2020

@eddyashton: good points!

  1. I think we need to give this concept a different name than 'committed'. We already have overloaded local vs global commit (and use commit within the KV code for even earlier steps), and we want 'committed' to mean something more precise, in relation to a transaction's state within the consensus. In fact I'm not completely sure what we're waiting for here - 2 signatures? Or a signature whose global commit marker has moved past a certain point, followed by a signature whose global commit marker is past that earlier signature? "A snapshot now becomes final" or "A snapshot now becomes useable" once this state is reached, but make it clear this snapshot state is distinct from the state of any of its individual contents, or even the evidence for the snapshot which is persisted into the ledger.

That's true. However, from an operator's point of view, we already use "committed" for a number of things, e.g. ledger files (and we use that word loosely there, as a non-committed ledger file probably contains committed entries). We may want to change the wording in the code, but I'm not too keen on creating a new term for operators.

What we're waiting for here is a signature in a ledger which confirms that the (global) commit point passed the snapshot evidence seqno. This is what nodes that join/recover from the ledger check for to verify the validity of the ledger (explained here). We achieve this by waiting for two commits on the snapshot evidence: the first commit proves that the snapshot evidence was committed, the second commit proves that a signature that contains (at least) the first commit seqno was created.

  1. This seems like a major shortcoming:

Note that if the service is completely idle after the snapshot evidence has been committed, the snapshot won't actually be committed until the next write transaction is committed.

Means we can never produce a snapshot of the service's final state, with (IUIC) an arbitrarily long suffix of transactions that are in the ledger, committed, but not snapshotted?

This is already the case in #1925 but #1925 only is even worse: we mark snapshots are committed (i.e. usable) as soon as the evidence is committed, which means that a node may join/recover from a snapshot which isn't valid. This PR (#1972) delays the moment a snapshot is usable but we can then select early (on the host side) whether the snapshot we join/recover from will be able to be validated (provided that ledger integrity is verified).

I think this is a general limitation of our signatures: there never is a signature that validates the latest commit seqno (we only generate signatures based on time if the latest entry wasn't a signature).

@@ -170,14 +170,6 @@ def run(args):
if args.snapshot_tx_interval is not None:
test_add_node_from_snapshot(network, args, copy_ledger_read_only=True)

try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed for now as the test will succeed until #1925 is merged.

@@ -762,17 +762,71 @@ def wait_for_new_primary(self, old_primary_id, timeout_multiplier=2):
flush_info(logs, None)
raise error(f"A new primary was not elected after {timeout} seconds")

def wait_for_snapshot_committed_for(self, seqno, timeout=3):
primary, _ = self.find_primary()
def wait_for_commit_proof(self, node, seqno, timeout=3):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file are a little awkward but they help us to guarantee that historical queries after a node has joined from a snapshot work. In such scenario, we really want the new node to join from the latest available snapshot. If it didn't verifying the historical query would be trivial as the node would simply have received the historical ledger through classic catch up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should hopefully get simpler once snapshots can be trivially verified from a receipt over their evidence.

@achamayou achamayou merged commit 489e511 into microsoft:master Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants