-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Cryptographic authentication of validators' session keys in network #47
Comments
We might make this roughly
We do not require this proof-of-possession for babe currently, but probably good anyways. We absolutely need the self signature by grandpa for proof-of-possession. I do not see any particular reason for either BABE or GRANDPA to sign the other's signature like described above, but we could do one or the other if we can think of any benefit. We make the transport key flexible with a tiny merkle tree or hide it by making it a hash. We might ask if TLS likes some certificate format, except it does not like our key formats, so maybe no benefit. |
We should list out the obvious concerns:
I think 4 and 5 overrule 3 right now myself since controller account private keys must be easily accessible, even if technically air gaped. If we want really strong forward security then pixel sounds cool, but increases GRANDPA verification costs by 50% and increases GRANDPA signing costs dramatically. If we want weaker forward security then we could later alter the certificate format, but doing so now complicates implementation slightly. We should not use BABE keys for 6 because they need to be registered well in advance. We could use GRANDPA keys but GRANDPA keys still incur some change over costs. We suggest using transport keys for this. I therefore moved the transport_public from We'd thus end up with something like:
You're free to split the private keys in |
In polkadot, the important constraints boil down to: BABE Sr25519 keys should not be used in BABE's VRF, i.e. for block production, until a full BABE epoch has elapsed, meaning if a key is registered before epoch j started then the key becomes usable in epoch i+1. We've discussed a notion of full and mini BABE epoch in which the randomness cycle operates in mini epoch but the security analysis happens in full epochs btw. It's fine if BABE Sr25519 keys and GRANDPA BLS keys are use for signing messages before a full BABE epoch has elapsed however. As a result, we could permit validator operators to "immediately" deploy a new session key, which halts their block production until a new BABE epoch elapses, but permits them to continue with GRANDPA. This avoids scenarios in which validator operators must choose between (a) some risk that their key was compromised, and (b) the risk of being slashed for being offline. This permits validator nodes to migrates hardware, data center, etc. without transferring session keys, which reduces validator operator error, but at a small financial loss from block production. We should still support session keys being placed into a "revoked" state, in case the validator operator cannot spin up a new validator node quickly enough. GRANDPA BLS keys should not be used by any node unless that node has first checked the proof-of-possession. Nodes should probably only check the proof-of-possession for each GRANDPA BLS eky once, which means some local runtime state. It'll might simplify the code considerably if session keys exist prior to validator election. We caution however that election must not interact any revocation and immediate key update transactions to permit GRANDPA keys to bypass proof-of-possession check, or to permit BABE keys to be used too early for block production. As above, we still favor the GRANDPA BLS and BABE Sr25519 secret key being written to disk so that validator nodes can go down and come back up without involving the controller key. We recommend the transport Ed25519 secret key never being saved to disk and instead being created and certified by the BABE key on node startup. If this transport Ed25519 key gets pushed to the chain on startup, then we substantially reduce the scenarios in which validator operator error results in slashable equivocation. Any key changes should receive finality in GRANDPA before taking effect. |
Now that #788 is in, we have a bit more of a launching pad to do some kind of AEAD channel with peers who claim ownership of keys. @mxinden told me that the authority-discovery module is not fast enough for quick authentication of which peers we are connected to that are currently validators. There are certain messages that we only want to process if we're sure that they came from a particular validator/collator. On the flip side, there are also sometimes requests that we want to make only to certain validators, and the only way to know for sure is to do authentication. Schnorrkel added some AEAD functions we can use, probably in conjunction with AES256-GCM, which Ring supports. I'm not sure how the design here should look. I'd like to avoid emulating However, given that this involves doing a key agreement with a schnorrkel session key (or collator key), it seems difficult to bring Sentry nodes into the middle. We'd want to do this in a way where sentries can act as simple middlemen. |
As I understand it, this issue is about certificates, not encryption. Authenticated encryption (AEAD) does not provide the same sort of authentication as certificates, aka signatures by one key on another key. We already have certificates on session keys by controller keys, right? AEADs take a symmetric key and show the data was encrypted with that session key, which does many useful things, but AEADs only show that symmetric key encrypted the message. Anyone who can read the message can also forge a different message. We could use AEADs to make gossip work kinda-like direct connections for Sassafras' pre-announce phase. |
Any idea why? |
DHT queries are too expensive to write code like Maybe
That seems to be what you've been talking about, but it's not what I opened the issue about :) - the purpose of this issue was always to find a way to ensure that nodes have a way to know for sure if they are talking to a validator or collator with a particular key and that they are not being MITMed by another node who is talking to that actor. AEAD based on a key exchange is one way of accomplishing that, which is encryption. The other way is certificates for PeerIDs stored on the DHT. |
I see! :) I'd prefer if encryption primarily happened at the transport level by correctly using Noise or TLS 1.3 with QUIC, mostly just because these transport level libraries already works hard making this stuff relatively easy to use correctly. At the transport level, I'd expect the core issue to be that validators have sentry nodes, so the validator should certify all sentry nodes keys somehow. I'd previously suggested that transport level keys be certified by some consensus key, not the controller key, but actually we never thought through exactly what infrastructure people may want for sentry nodes. Is there a need to add or remove sentry nodes automatically? What are the risks? How does revocation of sentry nodes work? We should've some conversation about when validators need encryption above the transport level too. If direct connections exist then sassafras does not necessarily require encryption, thanks to being only one hop, but maybe easier with encryption though. I'm not averse to sassafras using second layer encryption. We might also want this simply for when messages should not be seen by sentry nodes, but not sure what fits that. What else needs encryption? cc @infinity0 |
Sentry nodes definitely make this harder. That's what I wanted to loop @mxinden in for, and I also spoke with @tomaka a bit about libp2p APIs yesterday. I guess what would happen is that we'd have communication happening over Noise/Quic, but then we'd layer another |
Talked with @mxinden and @tomaka today. There are a couple different techniques that are useful in different situations.
1 and 2 have a couple caveats - there aren't good ways to prove that the owner of a specific session key is the same as the owner of a historical session key. This makes forwarding trust from a previously authenticated connection to their new ID more difficult. We also can't detect if an incoming peer is a validator without making a (potentially slow) query to I'm most in favor of using options 1 & 2 in conjunction to build good discovery APIs, along with option 3 to keep live connections updated. Option 4 seems difficult to implement and also requires us to know which session a peer is on, so I would prefer to avoid it. |
We can layer a whole second Noise session over if these are not one off messages. My little AEAD model is more targeted at one off messages.
I doubt encrypting DHT records helps anything. You just want the DHT connection encrypted for some reason? Or you're worried about authenticating DHT data? Assuming yes and you want a DHT solution.. If I understand, we're pulling on-chain information from the DHT so that either we can figure out who to talk to to sync the chain, or else we continue as a light client. In the second case, we need to trust the public claims in DHT somehow. Ideally, we'd want each validator set to sign off on the next validator set, which they do in granpa, but that requires reading a whole block. We could maybe put some validator set change information in the block header, so that a light client could pull only specific between -epoch relay chain block headers track the validator set changes and have grandpa signatures from each on the next. Sentry nodes can only really use a certificate issues by the validator or controller key. (see my previous comment)
I'd expect these use the transport layer encryption and authentication, no? Is the issue that you want to open connections before fully authenticating nodes? And lazily obtain certificates the long-term keys used to authenticate those connections? That's doable but maybe hard to know you did right.
I've mostly been saying this yes. Yes, you can cache certificate chain checking.
I suppose PeerIds means transport keys? It's not essential to put them on chain, but the certificate chain should be checked when they get used, and that ultimately points all the way back to the chain.
Ahh! I see! We naively assumed one should never forward trust because the chain is our root of trust. We change epochs and eras all the time however, which creates this problem: In era e, we had some validator with session key X certified by controller key Z. Yet X's obligations go beyond era e. In era e+1 we have some validator with session key Y distinct form X but also certified by controller key Z. Does Y still have the same responsibilities as X? We might let validators change some session keys faster than era, at which point you replace eras in this with epochs or even shorter. At first blush, yes we should mostly give Y the same responsibilities as X, and simply track by the controller key that certifies the validator key, but some responsibilities like erasure coded pieces make this hard perhaps. We do not slash for lacking erasure coded pieces though, so maybe new nodes could just make some effort to obtain the most recent ones before their session key became live. I know grandpa caused some headaches, but I'd expect grandpa rounds should run entirely within a specific session key lifetime. If Z swaps their session key from X to Y then they should maybe keep the node with X up long enough to answer any old grandpa challenges, or else risk being slashed. There are two forms of this:
We care which validator even in both AnV and sassafras, not just that they're some validator.
These proof can only really be a certificate chain:
It's much like the X.509 standard used by HTTP, but terminating in the chain. We rolled our own without the loop because it's normally quite short in our case, and we do not care about the same property sets a X.509, but maybe you need an extra layer to deal with transport keys, especially of sentry nodes, not sure (see my previous comment). Would it help if we abstracted certificate handling in some loop like this? |
Currently, we just believe someone if they say they control a specific session key. We will need a cryptographic authentication mechanism for this.
paritytech/substrate#271
The text was updated successfully, but these errors were encountered: