Snapshots should be passed to new nodes on join #1302

jumaffre · 2020-06-16T16:46:26Z

When a new node joins a network, it should be possible for this node to initialise its store and history from a given snapshot instead of replaying all transactions since genesis.

For now, the responsibility of passing a snapshot to a new joiner is on the operator(s) who will have to copy a snapshot produced by a primary to the new node. Then, the new node can be started with cchost ... join --snapshots-dir <snapshot_dir>, which will automatically fetch the snapshot from disk and resume from there.

The text was updated successfully, but these errors were encountered:

jumaffre · 2020-07-21T16:44:58Z

Some clarification:

TL;DR: A joiner should only be given a snapshot if the evidence for that snapshot has been endorsed by the node that generated that snapshot.

In the current model, snapshots are generated on the primary and given to (late) joiners via consensus messages. It is still not clear whether we will re-use the existing AppendEntries message for this or create a new message type for that à la InstallSnapshot message (see Raft paper). I lean towards the latter for now.
To provide auditability and blame, the evidence of this snapshot is emitted by the primary node. This evidence is a hash of serialised snapshot which is committed in a new ccf.snapshots table. See Snapshots should be generated at regular interval #1301 for more detail.
A snapshot should only be given to a late joiner only if the evidence for that snapshot has been endorsed via the primary's signature. In other words, a snapshot whose evidence has been applied at version N, should only be served to a new joiner once a signature at version S has been emitted, with S > N.
It is still not clear whether a snapshot should be given to a late joiner only when a majority of backups have ack'ed its evidence (i.e. they have applied version N to their store). This would prevent the late joiners to be able to catch up if an election occurs while it's joining. This is somehow related to Raft persistence bug #589 but it feels like the first implementation can omit this detail for now.
Upon receiving the snapshot, the late joiner should not 1) serve read entries and 2) count as part of the consensus quorum until it has received the signature S that endorses the snapshot evidence at N.

Edit: For now, the snapshot is passed to the new node "manually" by operators so some of the points below may no longer apply

achamayou · 2020-08-28T12:21:42Z

As discussed, passing or making a snapshot available to starting joiners can be handled by the operator.

jumaffre added the enhancement label Jun 16, 2020

jumaffre added this to the Snapshot milestone Jun 16, 2020

jumaffre mentioned this issue Aug 27, 2020

Snapshot auditability #1539

Closed

2 tasks

achamayou closed this as completed Aug 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshots should be passed to new nodes on join #1302

Snapshots should be passed to new nodes on join #1302

jumaffre commented Jun 16, 2020 •

edited

Loading

jumaffre commented Jul 21, 2020 •

edited

Loading

achamayou commented Aug 28, 2020

Snapshots should be passed to new nodes on join #1302

Snapshots should be passed to new nodes on join #1302

Comments

jumaffre commented Jun 16, 2020 • edited Loading

jumaffre commented Jul 21, 2020 • edited Loading

achamayou commented Aug 28, 2020

jumaffre commented Jun 16, 2020 •

edited

Loading

jumaffre commented Jul 21, 2020 •

edited

Loading