-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time Monotonicity Enforcement #141
Conversation
…HSA-fw94-c6w5-gw8j into aditya/update-client-fix
Co-authored-by: Zarko Milosevic <zarko@informal.systems>
…HSA-fw94-c6w5-gw8j into aditya/update-client-fix
…-c6w5-gw8j into alderfly-ibc-fix
Codecov Report
@@ Coverage Diff @@
## main #141 +/- ##
===========================================
+ Coverage 65.92% 78.99% +13.06%
===========================================
Files 131 109 -22
Lines 8382 6516 -1866
===========================================
- Hits 5526 5147 -379
+ Misses 2476 1009 -1467
+ Partials 380 360 -20
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to keep the FrozenHeight
to being the height of the offending header that caused misbehaviour. With time misbehaviour i use the larger height. Since we no longer use its specific value, it doesn't really matter so long as its non-zero. @colin-axner argues for just making the height {0-1} since it isn't used for anything other than a boolean value.
I figure if we're keeping it, we may as well make it meaningful. Care to weigh in @cwgoes ?
// If client state is not frozen after clientState CheckHeaderAndUpdateState, | ||
// then write the update state changes, and set new consensus state. | ||
// Else the update was proof of misbehaviour and we must emit appropriate misbehaviour events. | ||
if !newClientState.IsFrozen() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was too much confusion regarding separation of responsibilities for detecting misbehaviour here. Because conflicting header can be detected here, but time monotonicity can't. Thus, it makes more sense to just make it the responsibility of client developers to do this correctly so we have clear separation of responsibility.
Here i just check if new clientstate is frozen and if so emit appropriate events/write state
I think resulting code is cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly
Solo machines store the consensus state in the client state. Thus the only protection cached context adds is against metadata, but it still seems confusing to me. As client developers should be very aware not to write unwanted state changes for an update which is actually evidence of misbehaviour. What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?
We should either be as defensive as possible by assuming client developers miss checks or we should be as explicit as possible in saying it is entirely the responsibility of the app developer. If we cache the context, then I think we might as well do the duplicate consensus state check (and return an error if a duplicate update is successful)
I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior
Regardless, these requirements should be clearly documented in a light_client.md
under docs/
. These are subtle checks that are essential for security
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly
They are responsible for telling core IBC if an update was misbehaviour. They are not responsible for rolling back all state changes.
What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?
This is definitely possible, i guess its up to us what we want to enable. The downside is accidentally leaving in metadata writes after misbehaviour that we intend to write only for valid updates. Think @cwgoes can weigh in on tradeoff between flexibility and opinionated code. I believe being opinionated here and having the ClientKeeper write metadata on valid update makes more sense. Light client implementations are fully responsible for doing update logic (UpdateClient will do none of that).
But ClientKeeper will take the returned output, and do all of the necessary store writes. I think that's a clean separation of responsibility.
I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior
I tried doing this and the code got super ugly because there was freezing logic in both the ClientKeeper and in tendermint's update function. That could have been much cleaner if it was just in one place.
Furthermore, I think it's possible to take all client developer checks that must be done by every light client and put them in the ClientKeeper to minimize the possibility of light-client developer error.
But I think in practice, this would make things less secure if it trades off too much on separation of concerns.
Critically, I think it just needs to be clear to a reviewer/developer where a particular check is supposed to happen.
My proposal is that we create a very clear separation of concern that acts as a contract between core IBC and light client developer.
Light client implementation must give core IBC the updated clientstate/consensus state. And it must return a frozen client state if the update was evidence of misbehaviour.
Core IBC will in turn store the clientstate (and consensus if valid update), write all callback state changes on successful updates, and emit appropriate events.
This means that there may be redundant checks happening in light clients, that may be missed by some of them. But it gives a very clear rule for what a light client implementation is responsible for. Even though i place responsibility of all misbehaviour checks on light client. As a reader and reviewer I can analyze the light-client implementation in isolation and check that it is catching all misbehaviour and holding up its side of bargain.
Without this, I need to be checking whether ClientKeeper+misbehaviour together are catching all misbehaviour. And I need to make sure together they don't miss a gap between them. And that they aren't doing redundant checks. It's also harder as time goes on to determine where a check should go. We would need to make a subjective decision on whether we think some check is universal or not.
For these reasons I think clear separation of concerns is more important than putting all universal checks (even subtle ones) in the ClientKeeper. But yes, this should absolutely be documented in light_client.md
. Will do so once there's consensus on this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of if we allow metadata writes on misbehaviour, we still want to cache so we can discard on error.
Developers shouldn't be forced to revert state themselves on error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you make great points.
it must return a frozen client state if the update was evidence of misbehaviour.
I agree with this.
As a reader and reviewer I can analyze the light-client implementation in isolation and check that it is catching all misbehaviour and holding up its side of bargain.
I like this, and I think we can still achieve this with a duplicate check. My concern is that allowing a duplicate update at an existing height is a critical security vulnerability and I'm hesitant to let it go by when we have the capacity to do the check. This is the code I have in mind:
consState, exists := keeper.GetConsensusState()
newClientState, newConsensusState, err := CheckHeaderAndUpdateState()
if err != nil {
return err
}
// write client state, errors returned later revert state changes
switch {
case: newCilentState.IsFrozen()
// use logic you have
case: exists && !reflect.DeepEqual(consState, newConsensusState)
// light client implementation missed misbehaviour handling
return err
default:
// regular update code
}
I don't see why this code gets ugly? It allows light clients to fully implement misbehaviour logic without relying on 02-client and it allows 02-client to prevent duplicate updates which are misbehaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of if we allow metadata writes on misbehaviour, we still want to cache so we can discard on error.
Developers shouldn't be forced to revert state themselves on error
Do you have the use case in mind that update is being called by an external module? Messages that result in errors always have state changes reverted by baseapp. I think this is a safe assumption to make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have the use case in mind that update is being called by an external module? Messages that result in errors always have state changes reverted by baseapp. I think this is a safe assumption to make
Oh yes you're correct about this. We should only cache if we discard metadata on misbehaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that allowing a duplicate update at an existing height is a critical security vulnerability and I'm hesitant to let it go by when we have the capacity to do the check.
Here's a question - is this always true? Certainly it is a problem if unequal consensus states at the same height would allow for violation of exactly-once packet delivery guarantees or timeouts, but there could conceivably be client types which allow duplicate consensus states, just not verification at them (so they are only intermediate update points) - for example, a (non-Tendermint) consensus algorithm could have a block history which looks like this:
Is this a case we want to consider? There is something to be said for not overly constraining what it means for clients to be "correct", since clients implement all of the packet data / timeout / etc. verification functions anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! I didn't realize intermediate update points were a possibility.
In light of our discussion yesterday, I don't see the usefulness of adding this check if in the near future, light client implementations will be responsible for getting/setting client/consensus states. In this design, light clients should definitely be aware to guard against duplicate updates which constitute misbehaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, nice work!
// If client state is not frozen after clientState CheckHeaderAndUpdateState, | ||
// then write the update state changes, and set new consensus state. | ||
// Else the update was proof of misbehaviour and we must emit appropriate misbehaviour events. | ||
if !newClientState.IsFrozen() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly
Solo machines store the consensus state in the client state. Thus the only protection cached context adds is against metadata, but it still seems confusing to me. As client developers should be very aware not to write unwanted state changes for an update which is actually evidence of misbehaviour. What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?
We should either be as defensive as possible by assuming client developers miss checks or we should be as explicit as possible in saying it is entirely the responsibility of the app developer. If we cache the context, then I think we might as well do the duplicate consensus state check (and return an error if a duplicate update is successful)
I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior
Regardless, these requirements should be clearly documented in a light_client.md
under docs/
. These are subtle checks that are essential for security
changelog entry is needed (I keep forgetting as well) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, waiting to approve once we resolve decisions on
- using cache ctx (no preference)
- returning error if client did not handle misbehaviour (duplicate update produces different consensus state)
I have a preference for doing the consensus state check, but willing to concede if others consider it unnecessary
Just to refresh my memory, the goal here is backwards compatibility, right? Otherwise it seems odd to me to keep a field with a range (integer) far greater than the set of semantically meaningful values (0, non-zero). For backwards compatibility I guess there isn't too much difference between 0, 1 and 0, nonzero height - although the latter does run the risk of appearing to be more semantically relevant than it actually is - and we don't really want IBC users to start using this value for other kinds of data processing, if they really care about misbehaviour heights they should read them from the event logs - those are both some (weak) reasons to prefer 0, 1 I think. |
modules/light-clients/07-tendermint/types/misbehaviour_handle.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! In light of our discussion yesterday, I think the current solution aligns perfectly with future changes
@AdityaSripal I fixed the merge conflicts. Everything was straightforward except for the check in update client for if the client is frozen. I update the code to be semantically equivalent: Before if !client.IsFrozen {
} else {
} after if status := client.Status; status != exported.Frozen {
} else {
} It is just slightly odd because we aren't checking if the client is active, just that it is not frozen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK modulo quick confirmation question
…hains * Adding github workflows for coco * Adding CoCo & FreeFlix chains details * updating node details.
Description
closes: #XXXX
Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.
docs/
) or specification (x/<module>/spec/
)godoc
comments.Unreleased
section inCHANGELOG.md
Files changed
in the Github PR explorerCodecov Report
in the comment section below once CI passes