-
Notifications
You must be signed in to change notification settings - Fork 1
Reject finalized updates without a sync committee in next store period #145
Conversation
<CurrentSyncCommittee<T>>::set(<NextSyncCommittee<T>>::get()); | ||
<NextSyncCommittee<T>>::kill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: slightly more readable
<CurrentSyncCommittee<T>>::set(<NextSyncCommittee<T>>::get()); | |
<NextSyncCommittee<T>>::kill(); | |
CurrentSyncCommittee::<T>::set(NextSyncCommittee::<T>::take()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maybe wrong but would prefer not do this(i.e. <NextSyncCommittee<T>>::kill()
), my concern here is that reset NextSyncCommittee
is expensive which we should avoid unless abosolutely nessessary, and we only switch SyncCommittee
for sync_committee_update
(i.e. hand-over update).
Is there a way to only change the logic on relayer side to ensure signature_period == update_finalized_period == store_period
for non-handover update?
Be more specific, code here https://github.com/Snowfork/snowbridge/blob/18fe233041e34c3b14f8a23d7c07ef1b4df706ca/relayer/relays/beacon/header/header.go#L251 can we find a proper update(non-handover in 1086) to syncInterimFinalizedUpdate
with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maybe wrong but would prefer not do this(i.e. <NextSyncCommittee>::kill()), my concern here is that reset NextSyncCommittee is expensive which we should avoid unless abosolutely nessessary.
It is absolutely necessary, otherwise the light client is in an incorrect state. For example, now CurrentSyncCommittee = period 1087 and NextSyncCommittee = period 1087, which is fundamentally incorrect and could open up weird code paths that we do not expect. It could open us up to exploits, perhaps a header that was not signed by the validator set will now be verified successfully.
Also, if we don't clear NextSyncCommittee, the next sync committee import will fail because of this check: https://github.com/Snowfork/polkadot-sdk/blob/snowbridge/bridges/snowbridge/pallets/ethereum-client/src/lib.rs#L384 This is basically what happened in prod.
NextSyncCommittee is expensive
This can only happen once every sync committee period, in any case, right?
Is there a way to only change the logic on relayer side to ensure signature_period == update_finalized_period == store_period for non-handover update?
Our relayers are permissionless, so we have to do an on-chain fix for this. Otherwise anyone else could exploit this bug in our light client and send an update for store_period + 1
and brick the bridge.
The light client allows sending an update in the next period (update for store_period + 1
) here: https://github.com/Snowfork/polkadot-sdk/blob/snowbridge/bridges/snowbridge/pallets/ethereum-client/src/lib.rs#L404 So I think the light client spec supports this and so should we.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand the light client is in an incorrect state we need to fix with a runtime upgrade.
What I'm trying to fix is add an else branch at verification phase to check signature_period == update_finalized_period == store_period
for non-handover update.
Because I can't find a code path to switch current_sync_committee
for non-handover update in the spec https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/light-client/sync-protocol.md#apply_light_client_update so I'm a bit hesitate to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair enough, that's a good point.
What I'm trying to fix is add an else branch at verification phase to check signature_period == update_finalized_period == store_period for non-handover update.
What do you think the else
here should do? Something like
else if update_finalized_period == store_period + 1 {
return Error::<T>::SkippedSyncCommitteePeriod;
}
Because I can't find a code path to switch current_sync_committee for non-handover update in the spec
Me neither, but that's because the consensus spec always has a sync committee in the update, right? Imo this allows headers to be verified in the next period:
<NextSyncCommittee<T>>::get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yrong another practical implication if we disallow the non-sync committee rollover is that if we are in period 1086, and we've received a header in 1087, we won't be able to process any messages until we've received period 1087's sync committee update. Maybe that's OK, just mentioning it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the consensus spec always has a sync committee in the update
Not sure fully understand your point. Updates withthout sync committee is definetely allowed. Search is_sync_committee_update(update)
in the spec we can see logics accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are in period 1086, and we've received a header in 1087, we won't be able to process any messages until we've received period 1087's sync committee update.
Yeah, I would assume that's the expected behaviour previously before adding syncInterimFinalizedUpdate
, right? so I'm expecting we refactor the relayer logic to still respect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see! But isn't the condition to handle updates in the next period missing in the spec, then? I read through validate_light_client_update again now and I don't see it handled.
Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
// If there was no sync committee in the update, but the finalized header provided | ||
// is in the new period, we need to move the sync committee period forward. | ||
if update_finalized_period == store_period + 1 { | ||
ensure!(<NextSyncCommittee<T>>::exists(), <Error<T>>::NextSyncCommitteeUnknown); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yrong @vgeddes I added this check. One thing I dislike about the Ethereum consensus spec is that the separation between verify_update
and apply_update
makes it difficult to see what conditions are possible in apply_update
and what has already been checked in verify_update
.
I don't think it is possible for <NextSyncCommittee> to be empty here because the header validation had to pass BLS signature verification here to get to this point. I would rather err on the side of being too cautious though. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I would prefer to move all check logics to verification phase and apply update with only actions required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds good. So are we going to disallow updates in the next period without a sync committee update, like I illustrated here: #145 (comment)
@vgeddes @alistair-singh @yrong This version is what I think is the best solution. I am not against disallowing beacon header updates in the next period though, I just think the spec supports it and it is more flexible so it is the solution I prefer (to allow updates in the next period without a sync committee update). Let me know what you think. |
@vgeddes @yrong @alistair-singh I've decided to go with the more conservative approach for now, since the potential side-effects are less. Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
A downside to this approach is that when the sync committee rolls over, we'll have to wait for the next sync committee to be available in the finalized header update. This would mean any messages received in the last two epochs of the sync committee period would have to wait 2 additional epochs to be verified. That means at most 4 epochs waiting time (4 x 6.4 minutes) = 25.6 minutes. https://discord.com/channels/595666850260713488/595701319793377299/1240360783309967483 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
#145) * sync committee handover fix * remove comment * Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com> * add extra check * move checks around * remove extra validation * fix compiler error * reject updates in next period without sync committee * fix test * polish * Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com> --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
#145) * sync committee handover fix * remove comment * Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com> * add extra check * move checks around * remove extra validation * fix compiler error * reject updates in next period without sync committee * fix test * polish * Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com> --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
#145) * sync committee handover fix * remove comment * Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com> * add extra check * move checks around * remove extra validation * fix compiler error * reject updates in next period without sync committee * fix test * polish * Update bridges/snowbridge/pallets/ethereum-client/src/lib.rs Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com> --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: Snowfork#145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: #145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: #145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: #145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: #145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: #145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: Snowfork#145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
…c committee in next store period (paritytech#4478) While syncing Ethereum consensus updates to the Snowbridge Ethereum light client, the syncing process stalled due to error `InvalidSyncCommitteeUpdate` when importing the next sync committee for period `1087`. This bug manifested specifically because our light client checkpoint is a few weeks old (submitted to governance weeks ago) and had to catchup until a recent block. Since then, we have done thorough testing of the catchup sync process. ### Symptoms - Import next sync committee for period `1086` (essentially period `1087`). Light client store period = `1086`. - Import header in period `1087`. Light client store period = `1087`. The current and next sync committee is not updated, and is now in an outdated state. (current sync committee = `1086` and current sync committee = `1087`, where it should be current sync committee = `1087` and current sync committee = `None`) - Import next sync committee for period `1087` (essentially period `1088`) fails because the expected next sync committee's roots don't match. ### Bug The bug here is that the current and next sync committee's didn't handover when an update in the next period was received. ### Fix There are two possible fixes here: 1. Correctly handover sync committees when a header in the next period is received. 2. Reject updates in the next period until the next sync committee period is known. We opted for solution 2, which is more conservative and requires less changes. ### Polkadot-sdk versions This fix should be backported in polkadot-sdk versions 1.7 and up. Snowfork PR: Snowfork#145 --------- Co-authored-by: Vincent Geddes <117534+vgeddes@users.noreply.github.com>
This change corrects the bug where importing a finalized header in the next sync committee period, without the next sync committee, prevents any subsequent sync committee updates.
For illustration purposes:
Light client period:
1
Import header in period
2
(because next sync committee is already known)Light client period should now be
2
, but it is still1
.Current and next sync committee now references the wrong periods, and every subsequent sync committee imports fails because of this check: https://github.com/Snowfork/polkadot-sdk/blob/snowbridge/bridges/snowbridge/pallets/ethereum-client/src/lib.rs#L384
More Details
A detailed description of the issue here: https://www.notion.so/snowfork/Prod-Investigation-45a74b16488d44249de26f149dbc3ad1
Resolves: SNO-981