Snapshot auditability #1539

jumaffre · 2020-08-27T14:42:33Z

Follow up from #1302

Snapshots are currently generated at regular intervals for a state that is globally committed. However, the snapshot evidence (hash of snapshot) is only committed after the snapshot has been generated. The snapshot is written to disk as soon as it is generated.

This first implementation means that the evidence of a snapshot that is available for new joiners to resume from (i.e. an operator can copy the snapshot file and start a new joiner from it straight away) can actually be rolled back. In this case, the snapshot would be blameless as there's no evidence for it in the ledger.

What we should do instead is:

Only write the snapshot to disk once the snapshot evidence is globally committed. This means we would have to keep the serialised snapshot around (+ the version of the evidence) until version is globally committed.

However, this may not be enough to guarantee that a joiner that resumed from a snapshot can join the consensus:

From its perspective, it should not become part of the network (e.g. process app requests) until it has seen the globally committed evidence of the snapshot it has resumed from.
From the existing network perspective, this new node shouldn't count as part of the Raft quorum until it has confirmed that it has resumed from a trustworthy snapshot.

The text was updated successfully, but these errors were encountered:

jumaffre · 2020-09-23T10:06:19Z

On recovery, we also need to make sure that we do not rollback the snapshot evidence.

jumaffre · 2020-10-01T16:17:25Z

Discussed further with @achamayou today. The simplest solution is probably for new joiners to be passed the snapshot and the ledger suffix until the next signature that confirms that the snapshot evidence has been globally committed. The scheme is then very similar to the existing recovery scheme:

The new joiner deserialises the snapshot and the ledger suffix (in public mode), until its see the signature that confirms that the snapshot evidence has been globally committed. It also verifies that the evidence matches the snapshot (SHA256).
It joins the new network from there and then initiates the deserialisation of the snapshot and the private ledger suffix, but this time in private mode.
When the private ledger has been deserialised at the same level than the public store (i.e. signature that validates the snapshot evidence), the Merkle roots of both stores are compared and the private maps are swapped in the public store and the node now becomes part of the consensus.

The main drawback are:

A node cannot join from just a snapshot but also needs an arbitrarily long ledger suffix: there's no way for operators to tell in which ledger chunk the snapshot evidence is committed although it is likely to be in the chunk that immediately follows the snapshot.
There is also a short period of time in which the new joiner is part of the consensus from the perspective of the existing network but hasn't yet initialised its own consensus as it is replaying private ledger entries during this time.

The main advantages is that there is no added complexity in Aft on either sides (i.e. existing network or joiner) to not count the new joiner in the quorum until it has seen the global commit of the snapshot evidence.

Starting from a snapshot also means that we can use the state of the store to find out which nodes RPC address we could use on join (providing that the snapshot is not too outdated, e.g. if the configuration in the snapshot does not overlap with the latest configuration).

jumaffre · 2020-11-20T14:50:06Z

As discussed with @achamayou today, some more details on this:

(The following can be simplified once we can ask a receipt for the evidence of the snapshot. The receipt for the evidence could probably be embedded in the snapshot file directly)

For the new joiner, the aim is simply to verify that the snapshot it wants to join from is valid. To do so, on startup, the new joiner deserialises the snapshot in public mode and the following ledger entries until evidence that the snapshot evidence has been (globally) commit (i.e. signature entry which contains a commit seqno > snapshot evidence). If successful, the node can reset its store and attempt to join the service. Otherwise, the node should refuse to start (*). When the service successfully returns the ledger secrets to the new joiner, the full snapshot can be applied to the store (existing behaviour).

On recovery, the node should apply the snapshot in public mode and when deserialising public entries, it should check that the evidence for that snapshot is indeed present in the ledger. If when reaching the end of the ledger, there hasn't been any evidence that the snapshot evidence was committed, it should abort with an error.

(*) We could make the node auto-retries with the previous snapshot, etc. However, this would mean adding a couple of extra messages on the ring-buffer to retrieve the previous snapshot, which adds some complexity. Instead, we'll simply check that when a snapshot is selected for join/recovery, the seqno of its evidence is included in a ledger chunk that is available.

jumaffre added the enhancement label Aug 27, 2020

jumaffre added this to the Checkpointing milestone Aug 27, 2020

jumaffre added the 0.14 label Sep 16, 2020

jumaffre mentioned this issue Sep 29, 2020

Validate snapshot when snapshot evidence is committed #1668

Merged

achamayou removed the 0.14 label Oct 1, 2020

achamayou added 0.15 consensus labels Oct 2, 2020

jumaffre self-assigned this Oct 13, 2020

jumaffre added 0.16 and removed 0.15 labels Nov 17, 2020

jumaffre mentioned this issue Nov 18, 2020

Include evidence seqno in committed snapshot filename #1900

Merged

This was referenced Nov 23, 2020

Add clear method to KV store #1911

Merged

Verify snapshot evidence on join/recover #1925

Merged

achamayou closed this as completed in #1925 Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshot auditability #1539

Snapshot auditability #1539

jumaffre commented Aug 27, 2020 •

edited

Loading

jumaffre commented Sep 23, 2020

jumaffre commented Oct 1, 2020 •

edited

Loading

jumaffre commented Nov 20, 2020

Snapshot auditability #1539

Snapshot auditability #1539

Comments

jumaffre commented Aug 27, 2020 • edited Loading

jumaffre commented Sep 23, 2020

jumaffre commented Oct 1, 2020 • edited Loading

jumaffre commented Nov 20, 2020

jumaffre commented Aug 27, 2020 •

edited

Loading

jumaffre commented Oct 1, 2020 •

edited

Loading