-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements for Fork Detection in IBC + Accountability #424
Conversation
Co-authored-by: Zarko Milosevic <zarko@informal.systems>
Co-authored-by: Zarko Milosevic <zarko@informal.systems>
Co-authored-by: Zarko Milosevic <zarko@informal.systems>
Co-authored-by: Zarko Milosevic <zarko@informal.systems>
Co-authored-by: Zarko Milosevic <zarko@informal.systems>
a fork, we need some assumption about a correct relayer being on a | ||
different branch than the handler, and we need such a relayer to | ||
check-in not too late. Also | ||
what happens if the relayer's light client is forced to roll-back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this scenarios assumes fork detected by the light client. In that case it's game over and the only left to be done is evidence creation and submission. So we need to spec what is happening in the absence of forks (normal operation), then fork detection and evidence submission once fork is detected. But there are no continuation of the normal operation after fork is detected as we don't support automatic recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I had the impression that we might think about rolling back but now I realize that that most likely doesn't make sense.
So if a relayer detects a fork (however it does so), does it have to halt all handlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should submit proof of fork to affected handler (if fork is on chain A, it should send it to IBC handler for A on chain B) and this should probably halt handler.
conflicting blocks are signed by +2/3 of the validators of that | ||
height, and a *light client fork* where one of the conflicting headers | ||
is not signed by +2/3 of the current height, but by +1/3 of the | ||
validators of some smaller height. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is not clear. I am not sure distinction we make between on chain forks and light client forks is about voting power of faulty nodes. You can create a fork on the main chain with +1/3 of voting power. In my view it is more "location" of the fork which defines its type. Main chain fork is perceived by full nodes (including validators) by running Tendermint protocols and in this case we expect full node to panic and we enter off-chain fork accountability protocol. On the other side light client fork is detected by the light client and it enters Tendermint network sandboxed in an evidence, so we don't expect full node to panic in this case but to try to execute on chain fork accountability protocol triggered by this evidence. In case (due to censorship) we fail to execute evidence within some time frame, a full node (that is aware of evidence) should panic and resort to off chain fork accountability protocol. The intuition for this reasoning is that light client fork could be one-shot event (and therefore we try to process it) while the main chain fork is an attempt to take over main chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the distinction is not about the location. It is about "potential" location, if you want.
The Tendermint validation says that if +2/3 of the validators sign a block, then the block is OK, and all the protocols accept that.
If there are two blocks for the same height with +2/3 signatures, then both blocks are equally valid. Even if at the moment no full node is on an alternative branch, it might be that in the future, a node that does fast sync might accept it. Then it suddenly becomes a main chain fork. So a definition based on location does not seem stable. Therefore, I am not sure it is safe to make the distinction on the "location". This is also what underlines my axiomatization in the ./detection.md draft.
However, if there is a unique chain with +2/3 signatures then this chain is authoritative. If in addition there is a block that is signed by +1/3 of some older validator set, then the light client might accept it while skipping. However, by uniqueness, this cannot transform into a main chain fork in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I believe if the bad guys have barely +1/3 in order to get +2/3 they have to act in the open, that is, on the systems, and it is likely that their attempt to fork the system can easily be observed/caught. This makes it unlikely to appear.
The light client attack with the +1/3 they can do among themselves, without exchanging messages in the open. This might be more likely to appear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I can't fully get is dynamics of +1/3 attack on the main chain where a different block is created for some height h. Is it possible to keep extending forked chain with +1/3 of voting power if in the worst case block at height h leads to changing val set such that correct validators are excluded and only faulty (+1/3) stays in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess yes. If in the other block the bad guys can control the next validator set, it might even be that the system falls apart. They should not have the same correct guy in both next validator sets. But otherwise, in principle, they could convince one correct validator to be validator in one branch and another correct validator to be validator in another branch. As agreement is violated, I guess this is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember well, there is something about off by one that constrain scenarios. We elect new valset after block h is executed but they are in power only for block h+2. I was trying yesterday quickly to come up with a worst case scenario, but failed. But we can for now probably ignore this issue and try to figure out the rest :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a fork on the main chain with +1/3 of voting power.
Yes but you still need +2/3 to sign the block. For a light-client fork, that's not the case, you just need +1/3 of the old voting power to sign the block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I can't fully get is dynamics of +1/3 attack on the main chain where a different block is created for some height h. Is it possible to keep extending forked chain with +1/3 of voting power if in the worst case block at height h leads to changing val set such that correct validators are excluded and only faulty (+1/3) stays in?
So to fork a full node, the two conflicting blocks at height H must have the same NextValidatorSet (validity condition). So if you make a fork on the main chain at height H with +1/3 faulty, both blocks at H+1 are going to have to have the same validator set, though they can now have different NextValidatorSet. It's only at H+2 that the Validator set itself can differ between the two forks. Does this mean that H+1 can't get committed and the fork stops after 1 block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit lost in indices, so to clarify:
- To create a fork on the chain, one needs an attack of two consecutive blocks.
- On height H we have our classic scenarios (equivocation, amnesia),
- On height H+1 it basically means that the validators run two consensus instances for the same hight, which at the end also boils down to double signing, although they might behave "correctly" in each instance.
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember well, there is something about off by one that constrain scenarios.
Does this mean that H+1 can't get committed and the fork stops after 1 block?
Not sure about this, but it seems there's nothing preventing each branch of the fork to continue extending (to H+1 and beyond). For H+1, it will require the faulty guys to at least double sign, but by H+2, they can make the validator set change, and maintain the forks without more faulty behaviour.
introduce different terms for: | ||
|
||
- proof of fork for the handler (basically consisting of lightblocks) | ||
- proof of fork for a full node (basically consisting of (fewer) lightblocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of proof of work for this purpose. It is basically 1) the information light client submits to full node after detecting a fork and 2) relayer submit to IBC handler after detecting a fork. Proof of misbehaviour is something different and at the moment we have been using the term evidence for this. For example double singing is proof of misbehaviour that can be packaged into an evidence. I would therefore not talk about it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of proof of work
proof of fork. careful :)
But this terminology distinction seems super helpful.
|
||
- this is the job of a full node. | ||
|
||
- might be subjective in the future: the protocol depends on what the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Tendermint we assume that we have correct gossip paths between all correct validators. This allows information about fork observed by one correct validator to be eventually received by all correct validators. Therefore eventually all correct validators panic and we halt chain. On the other side, we don't make explicit assumptions regarding full nodes and connectivity between correct full nodes. So it is possible having a full node that is being eclipsed by adversary on the forked chain. The question is if this change something in protocol design when we talk about evidence submission?Light client will anyway need to communicate proof of fork to primary and all witnesses so we ensure that this information reach out correct full nodes. Note that our assumption of light client being connected to a correct peer is actually not precise enough. We probably need to say that it is connected to a correct full node that is on the main chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not clear about the right way to think about that. I just wanted to highlight that currently all our discussion implicitly assume that a full node that tries to collect evidence knows the "true" chain, and that I am not sure we want to keep it that way. It might also be that anyway we have to halt the system in such situations...
|
||
### Isolating misbehaving nodes | ||
|
||
- this is the job of a full node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we have some missing parts here: a full node should verify proof of fork and if valid, it should create the corresponding evidence that should be published on chain using evidence submission protocol. There are several kind of evidences and they don't have the same treatment on chain. Double signing and lunatic are self-contained and they can be processed without additional information. Evidence for amnesia requires running challenge-response protocol (posting votesets on chain by suspected validators) and running fork accountability procedure to catch bad validators. We need to precisely specify proof of fork verification logic and also how valid proof is turned into evidence(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are right. I drew decision tree while discussing with @ebuchman but didn't add it in the draft. I overlooked that. I will add that!
Eventually the question is where this should go, because at the end, this is independent of all the light client and IBC related questions, but just needs proof of fork as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. It's actually Tendermint specific stuff.
> state of the work on IBC. Some/most of it might already exist and we | ||
> will just need to bring everything together. | ||
|
||
- "proof of fork for a full node" defines a clean interface between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view these are the components for which we need precise specs: 1) detecting forks on IBC module, 2) proof of fork submission and verification (most probably separate protocols for full node and IBC module), 3) creating evidence(s) from a valid proof of fork (executed by a full node). We also need to specify 4) evidence gossip and on chain commitment and 5) on chain evidence handling with amnesia handling requiring more complex challenge-response + fork accountability protocol execution. In terms of priority, 1), 2) and 3) have higher priority, as 4) and 5) are in some fork already specified before and it shouldn't be affected by 1)-3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be in good shape for landing. Will probably be moved to spec repo but good to have on master for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. @milosevic @josef-widder can we get a follow up issue written up with next steps?
|
||
- a handler cannot trust the information provided by the relayer, | ||
but must verify | ||
(Доверя́й, но проверя́й) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doveryáy, no proveryáy
. Lol in most of crypto it is said slightly differently: "don't trust, verify"
|
||
- if a relayer sees a header *h* it doesn't know at a handler (`queryChainConsensusState`), the | ||
relayer needs to | ||
verify that header. If it cannot do it locally based on downloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, according to the ics 07 spec and the current sdk implementation, the ConsensusState only contains a validator set, it doesn't actually contain the full commit or even the header (and it probably shouldn't, that's a lot of data to store on chain). So unless this were to change, the relayer would only be able to check the app_hash and/or the validator set, which might be sufficient for detection, but it wouldn't be able to verify the signed header directly. To get the signed header, it would have to find the corresponding UpdateClient tx in the blockchain (not the state) and extract the signed header (and potentially a bisection trace) from that. I'm not sure if this is straight forward right now, but something we'll have to enable either through events or otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, to convince the handler that there is a fork we can still submit the conflicting header. So this can stay the same.
To convince a full node that there is a fork, we need to work on extracting the signed header.
- we have to specify what precisely `queryChainConsensusState` | ||
returns. It cannot be the complete lightstore. Is the last header enough? | ||
|
||
- we would like to assume that every now and then (smaller than the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the relayer to check every single ConsensusState? Is it possible that the handler could be tricked onto a fork for just one height and then brought back to the main chain in such a way that if relayers don't check every ConsensusState inserted by the handler, they could actually miss this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not so clear for me. What I understood from recent discussions is that the application state stores the history -- at least several (hundred) blocks back. If this is true, I believe one cannot go back from a branch without violating invariants in the application state that should show up in some checks. But this needs to be confirmed.
check-in not too late. Also | ||
what happens if the relayer's light client is forced to roll-back | ||
its lightstore? | ||
Does it have to re-check all handlers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should maintain a local copy of all headers from handlers, as if handlers were another witness in the light client system?
We'd basically want to know at all times that our light client conforms with those of the handers we know about. If at any point our light client has to roll-back, we should assume that means there's a fork relevant to all those handlers as well.
Basically we want to roughly ensure that our light client and those of all the handlers are basically in sync and in agreement so it's easy to detect forks and make all parties aware of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we want to roughly ensure that our light client and those of all the handlers are basically in sync and in agreement so it's easy to detect forks and make all parties aware of them.
This makes a lot of sense. Do you think performance-wise it is reasonable to maintain such a tight synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense. Do you think performance-wise it is reasonable to maintain such a tight synchronization?
Not sure. Syncing the on-chain clients to every header a relayer has is probably excessive, but we might want relayers to download and verify every header that gets updated on-chain. I guess whenever the relayer receives an UpdateClient event from a chain, it should tell its own light client to fetch and verify that height so it can be on the lookout for forks. But the relayer should always make sure its at least as up-to-date as the onchain client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I had in mind. Whenever relayer sees UpdateClient event, it fetches and verifies on chain data and input this info to the light client so it can verify it.
conflicting blocks are signed by +2/3 of the validators of that | ||
height, and a *light client fork* where one of the conflicting headers | ||
is not signed by +2/3 of the current height, but by +1/3 of the | ||
validators of some smaller height. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a fork on the main chain with +1/3 of voting power.
Yes but you still need +2/3 to sign the block. For a light-client fork, that's not the case, you just need +1/3 of the old voting power to sign the block
conflicting blocks are signed by +2/3 of the validators of that | ||
height, and a *light client fork* where one of the conflicting headers | ||
is not signed by +2/3 of the current height, but by +1/3 of the | ||
validators of some smaller height. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I can't fully get is dynamics of +1/3 attack on the main chain where a different block is created for some height h. Is it possible to keep extending forked chain with +1/3 of voting power if in the worst case block at height h leads to changing val set such that correct validators are excluded and only faulty (+1/3) stays in?
So to fork a full node, the two conflicting blocks at height H must have the same NextValidatorSet (validity condition). So if you make a fork on the main chain at height H with +1/3 faulty, both blocks at H+1 are going to have to have the same validator set, though they can now have different NextValidatorSet. It's only at H+2 that the Validator set itself can differ between the two forks. Does this mean that H+1 can't get committed and the fork stops after 1 block?
introduce different terms for: | ||
|
||
- proof of fork for the handler (basically consisting of lightblocks) | ||
- proof of fork for a full node (basically consisting of (fewer) lightblocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of proof of work
proof of fork. careful :)
But this terminology distinction seems super helpful.
Opened #461 for follow up |
Following
I collected the document that should capture the current understanding and the requirements for fork detection.
As the result of detecting a fork is to do something to resolve the problem, I also have added some broader discussion about accountability and attacks. This is a complex problem, with many interacting protocols and requirements. I hope this document is a start to get a decomposition of the problem.
As this is cross-project between IBC and light client (tendermint-rs), I invited a lot of reviewers to add their thoughts. My understanding about IBC is very limited at this moment. I am very grateful for clarification if I got something wrong.