Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

SecretStore: administrative sessions prototypes #6605

Merged
merged 74 commits into from
Oct 2, 2017

Conversation

svyatonik
Copy link
Collaborator

@svyatonik svyatonik commented Sep 28, 2017

This is first (and largest, I hope) part of #6480 implementation (and thus made on top of it).

PR outline:

  1. added polynom1 field to SS database + upgrade. It is impossible to add shares for keys, generated in previous SS versions
  2. added --secretstore-admin-public=[PUBLIC] cli option (I suppose this is a temporary soultion - see explanation below)
  3. added 4 'administrative' sessions (i.e. they can only be started by 'administrator') - ShareAdd, ShareMove, ShareRemove && ServersSetChange
  4. added ServersSetChangeAccess consensus, as a portion of all administrative sessions
  5. added UnknownSession job for nodes to report sessions, unknown to requester
  6. fixed spontaneous failures in 2 SS tests (reported by @arkpar)
  7. fixed warnings after migration to futures version

ServersSetChangeAccess consensus description:

Every administrative session is about changing some set of nodes from old_nodes_set to new_nodes_set:
ShareAdd: old_nodes_set = current owners of same-key shares, new_nodes_set = old_nodes_set + additional nodes
ShareMove: old_nodes_set = current owners of same-key shares, new_nodes_set = old_nodes_set except nodes that lose their shares + additional nodes
ShareRemove: old_nodes_set = current owners of same-key shares, new_nodes_set = old_nodes_set except nodes that lose their shares
ServesSetChange: old_nodes_set = current set of ALL SS nodes, new_nodes_set = old_nodes_set except nodes that we are going to remove from SS
Only owners of 'administrator' key are allowed to start administrative sessions.
To make sure it is actually started by administrators, there are 3 input parameters for session:
new_nodes_set - described above
old_set_signature - signature of keccak(ordered set of old_nodes_set)
new_set_signature - signature of keccak(ordered set of new_nodes_set)
The purpose of ServersSetChangeAccess is to check if old_nodes_set && new_nodes_set are valid (in general) and that signatures are valid signatures of admin key owner.
This could be changed later to check signatures using on-chain contract or some other way (I suppose we need to discuss it during next (integration) task).
For now, admin_public is added for simplicity.

ShareAdd session description:

participants of session = all owners of given key shares + all nodes that are being added
session result = every added node owns its new key share
session steps:

  1. consensus establishing
  2. master node sends shared (among all owners) key data to new nodes
  3. every 'old' node sends its part of polynom1[0] to every 'new' node
  4. every nodes sends its own 'key stamps' to all other nodes
  5. key share is updated in database

ShareMove session description:

participants of session = all owners of given key shares + all nodes that are being added
session result = share is removed from nodes selected for removal + share is moved (it isn't regenerated - just moved) to 'new' nodes
session steps:

  1. consensus establishing
  2. master node sends move request to all nodes selected for removing
  3. every removing-node sends its own share to new node && removes share from db
  4. every new node receives share && broadcasts confirmation to all other nodes
  5. upon receiving confirmations from all new nodes, share is updated && saved to db

ShareRemove session description:

participants of session = all owners of given key shares
session result = share is removed from nodes selected for removal
session steps:

  1. consensus establishing
  2. master node sends remove request to all nodes selected for removing
  3. every removing-node removes its own share && sends removal confirmation to all other nodes
  4. upon receiving confirmations from all removing nodes, share is updated && saved to db

ServersSetChange session description:

participants of session = all SS nodes
session result = clear db on nodes being removed + shares for ALL existing sessions on nodes being added
sesion steps:

  1. consensus establishing
  2. master node requests all other nodes for sessions he is not participating (aka unknown sessions)
  3. every slave node responds with sessions id => we are able to collect Map<SessionId, Set> of unknown sessions on master
  4. for every known session (i.e. session that master participates in):
    4.1) share change plan is created = nodes to add shares for, nodes to move shares from/to, nodes to remove shares from
    4.2) share change session is started. Share change session = sequential execution of ShareAdd, then ShareMove && then ShareRemove sessions (order matters here) for single key
  5. for every unknown session:
    5.1) sub_master is selected from sessions participants
    5.2) share change session is delegated from master to this sub_master
    5.3) share change session is executed by this sub_master
    5.4) share change confirm is sent from sub_master to master
  6. upon completing all known share change sessions && receiving confirmations for all unknown share change sessions, session completion signal is sent to all slave nodes && session is completed

General administrative sessions behavior:

  1. there could be at most 1 active admin session for the same key at a moment (i.e. you cannot simultaneously add and remove shares from the same key). See also TODO for ServersSetChangeSession
  2. any disconnect/timeout for participating node will lead to session failure
  3. any error on any participating node will lead to session failure

TODOs:

  1. there's no official way to start admin sessions right now - I want to agree upon all other concepts && fix all important TODOs before allowing this
  2. when session (ShareAdd/ShareMove/ShareRemove/Decrypt/Sign) is started on node, which does not own required key share, it must be delegated to node with key share
  3. important: if any error occurs during any admin session, we could end up with partially updated databases: i.e. 50% of nodes have saved key shares where share of node1 was moved to node2, and another 50% have old version. There should be either some distributed commit protocol, or we should store every version of key_share (and negotiate version during consensus establishing in signing/decryption/... sessions)
  4. important: ShareRemove session currently requires that participants = all owners of key share. If some node is down of completely removed from SS before ShareRemove session => there's no way to remove share from this node
  5. ServersSetChange session can take a lot of time (if number of keys is large). It must be tweaked:
    5.1) report unknown sessions by chunks to avoid large messages
    5.2) inactive nodes can mark session as stalled (if haven't received session messages for too long) => this will lead to failure
  6. ServersSetChange session can start only if there are no other active sessions && no other sessions can start if there is active ServersSetChange session
  7. there's need for more tests of new admin sessions
  8. there's a lot of almost-similar code in admin_sessions/*.rs, cluster_sessions.rs && cluster.rs - will refactor this once functional part is done

@svyatonik svyatonik added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. M4-core ⛓ Core client code / Rust. labels Sep 28, 2017
@bjornwgnr bjornwgnr requested review from gavofyork and removed request for gavofyork September 28, 2017 19:15
@svyatonik svyatonik added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Sep 29, 2017
@arkpar
Copy link
Collaborator

arkpar commented Oct 2, 2017

needs a rebase

@svyatonik
Copy link
Collaborator Author

Of above TODOs, there's one which may need some external review/advice before fixing. It is TODO#3. Here's my thoughts on it:

  1. ShareRemove session failure can't harm the key, since worst thing that can happen in SS is losing share && reason why we have started ShareRemove session is to lose the share
  2. ShareMove session failure can't harm the key, since the worst failure that can happen is that there will be 2 different nodes with the same key share && this isn't critical (current implementations must be updated to support this case, though)
  3. ShareAdd session failure can lead to case when key is lost in current implementation, because currently share is overwrited upon ShareAdd session completion

So idea is not to overwrite share in ShareAdd, but append a new version instead. This will need special handling:

  1. during decryption/signing sessions: nodes should agree on which version to use before starting session. We can use any key version to run these sessions.
  2. during ShareMove/ShareRemove sessions - we are always moving/removing node from latest version (by moving/removing here I mean - read share from db, update it and insert new version to db)
  3. in new ShareAdd(old_nodes_set -> new_nodes_set) sessions - nodes should agree on which version of key is the latest (i.e. to which version we want to add nodes). Details:
    3.1) initialization request is broadcasted to all cluster nodes (currently - it is only sent to nodes we believe to have key shares)
    3.2) every node responds with latest version of share it have
    3.3) of these responses we select latest version (LATEST) with number of nodes >=threshold
    3.4) let's call nodes which have responded with other versions (!= LATEST) - out_of_sync_nodes
    3.5) we start ShareAdd session, but instead of adding only new nodes (new_nodes_set.except(old_nodes_set)), we are adding new nodes + all nodes from out_of_sync_nodes
    3.6) after session is completed && new version is inserted, nodes from out_of_sync_nodes remove all 'forked' versions of shares (to not to store garbage in db)
    There are still some unclear technical details here (for example - how to mark versions, what is new_nodes_set that have to be signed by administrator to start admin sessions in this case, db structure, ...), but in general I consider this as a solution for that issue.
    Suggestions/criticism is welcomed - meanwhile I'll start with other TODOs.

@arkpar
Copy link
Collaborator

arkpar commented Oct 2, 2017

I'd propose share version negotiation to be performed on startup and not on ShareAdd. This way additional negotiations on signing and ShareAdd won't be required.

@arkpar arkpar added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Oct 2, 2017
@arkpar arkpar merged commit 9a086fa into master Oct 2, 2017
@arkpar arkpar deleted the secretstore_change_server_set branch October 2, 2017 13:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants