Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshots should be passed to new nodes on join #1302

Closed
jumaffre opened this issue Jun 16, 2020 · 2 comments
Closed

Snapshots should be passed to new nodes on join #1302

jumaffre opened this issue Jun 16, 2020 · 2 comments

Comments

@jumaffre
Copy link
Contributor

jumaffre commented Jun 16, 2020

When a new node joins a network, it should be possible for this node to initialise its store and history from a given snapshot instead of replaying all transactions since genesis.

For now, the responsibility of passing a snapshot to a new joiner is on the operator(s) who will have to copy a snapshot produced by a primary to the new node. Then, the new node can be started with cchost ... join --snapshots-dir <snapshot_dir>, which will automatically fetch the snapshot from disk and resume from there.

@jumaffre jumaffre added this to the Snapshot milestone Jun 16, 2020
@jumaffre
Copy link
Contributor Author

jumaffre commented Jul 21, 2020

Some clarification:

TL;DR: A joiner should only be given a snapshot if the evidence for that snapshot has been endorsed by the node that generated that snapshot.

  • In the current model, snapshots are generated on the primary and given to (late) joiners via consensus messages. It is still not clear whether we will re-use the existing AppendEntries message for this or create a new message type for that à la InstallSnapshot message (see Raft paper). I lean towards the latter for now.
  • To provide auditability and blame, the evidence of this snapshot is emitted by the primary node. This evidence is a hash of serialised snapshot which is committed in a new ccf.snapshots table. See Snapshots should be generated at regular interval #1301 for more detail.
  • A snapshot should only be given to a late joiner only if the evidence for that snapshot has been endorsed via the primary's signature. In other words, a snapshot whose evidence has been applied at version N, should only be served to a new joiner once a signature at version S has been emitted, with S > N.
  • It is still not clear whether a snapshot should be given to a late joiner only when a majority of backups have ack'ed its evidence (i.e. they have applied version N to their store). This would prevent the late joiners to be able to catch up if an election occurs while it's joining. This is somehow related to Raft persistence bug #589 but it feels like the first implementation can omit this detail for now.
  • Upon receiving the snapshot, the late joiner should not 1) serve read entries and 2) count as part of the consensus quorum until it has received the signature S that endorses the snapshot evidence at N.

Edit: For now, the snapshot is passed to the new node "manually" by operators so some of the points below may no longer apply

@jumaffre jumaffre mentioned this issue Aug 27, 2020
2 tasks
@achamayou
Copy link
Member

As discussed, passing or making a snapshot available to starting joiners can be handled by the operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants