Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fix reversion loop #4439

Closed
eskimor opened this issue Dec 1, 2021 · 14 comments
Closed

Fix reversion loop #4439

eskimor opened this issue Dec 1, 2021 · 14 comments
Labels
I3-bug Fails to follow expected behavior. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.

Comments

@eskimor
Copy link
Member

eskimor commented Dec 1, 2021

Our testing on Rococo revealed, that we still might be susceptible to reversion loops, as can be witnessed, by those errors for example (and others obviously):

Seen logs:

WARN tokio-runtime-worker parachain::availability-recovery: SessionInfo is `None` at (421890, 0x1207919d23edf9d65eaf68d3a5e1a06e643777a621f04864c24bf6eee471b77d)
2021-12-01 16:20:25 | 2021-12-01 15:20:25.518 ERROR tokio-runtime-worker polkadot_node_subsystem_util: job finished with an error job="candidate-backing-job" parent_hash=0x6312…230c err=RuntimeApi(RuntimeApiError("Application(UnknownBlock(\"State already discarded for BlockId::Hash(0x6312111b71bfae0559781ed697b8f22a6be6b647826e0f089c2468e67663230c)\"))"))
-- | --
Err(Other(ClientImport("Import failed: State Database error: Too many sibling blocks inserted")))

Especially the "Too many sibling blocks inserted" suggests a reversion loop, because this can only happen if we have at least 32 heads a the same height.

@eskimor eskimor added the I3-bug Fails to follow expected behavior. label Dec 1, 2021
@bkchr
Copy link
Member

bkchr commented Dec 1, 2021

Re-orgs do not prune chain state and validators run as archive nodes, so the state should be available always 🤔

@burdges
Copy link
Contributor

burdges commented Dec 2, 2021

We're surely not currently set up to fetch old chain state during disputes, no?

If depends if adversaries could push us into demanding purged chain state: Any chain reversion caused by the approvals system also causes a slash, but not necessarily 100%, depending upon what optimizations we employ there. We should avoid an adversary guilty of 100% slashable soundness offenses from distracting us from checking their offending chain.

We've discussed blind availability recovery for that case, so we'd just assume everyone has chunks, but if not then we'll time out eventually. This complicates life for pre-availability disputes.

@eskimor
Copy link
Member Author

eskimor commented Dec 2, 2021

Re-orgs do not prune chain state and validators run as archive nodes, so the state should be available always thinking

Thanks @bkchr ! I guess it could also be a result of this error:

Err(Other(ClientImport("Import failed: State Database error: Too many sibling blocks inserted")))

@bkchr
Copy link
Member

bkchr commented Dec 2, 2021

Re-orgs do not prune chain state and validators run as archive nodes, so the state should be available always thinking

Thanks @bkchr ! I guess it could also be a result of this error:

Err(Other(ClientImport("Import failed: State Database error: Too many sibling blocks inserted")))

Yes, but that means that we have 32 blocks imported on the same height, which should not happen :P

@eskimor
Copy link
Member Author

eskimor commented Dec 2, 2021

We're surely not currently set up to fetch old chain state during disputes, no?

On chain reversion, if we still have ongoing disputes with regards to the reverted chain, then we rely on that old chain to recover the relevant session of the candidate for example.

@eskimor
Copy link
Member Author

eskimor commented Dec 2, 2021

Yes, but that means that we have 32 blocks imported on the same height, which should not happen :P

assuming we don't have a reversion loop, but it looks like we did - finding out, whether we can still have reversion loops was the whole point of that test run, so a good thing we tested. 😄

@eskimor eskimor changed the title Purge chain state more slowly Fix reversion loop Dec 2, 2021
@bkchr
Copy link
Member

bkchr commented Dec 2, 2021

One thing that came to my mind, do you may start working on a fork that you never have imported? Aka you see a dispute that references some block that you are not aware of? Or can that not happen?

@burdges
Copy link
Contributor

burdges commented Dec 2, 2021

Yes I think remote disputes should work on non-imported forks.

@eskimor
Copy link
Member Author

eskimor commented Dec 3, 2021

Yep we want to work disputes to work even for candidates that appeared on an to the node unknown fork. But the issue with the first error is different - it uses a recent head for querying the chain for the SessionInfo and that state is missing. We are using a block here that has been seen by the node, it just no longer appears to be having any state associated with it.

availability-recovery monitors ActiveLeavesUpdates and updates its live_block whenever a block appears with a greater height than already known. Thus on reversion, we won't use the new head as it has a lower height, but stick to the head of the reverted chain. Which is the correct behavior, as otherwise we would not see the correct SessionInfo for any disputes concerning the reverted chain.

@jasl
Copy link
Contributor

jasl commented Jun 8, 2022

I saw this in our testnet (build with polkadot-launch)

2022-06-08 08:17:11 [Parachain] Block import error: State Database error: Too many sibling blocks inserted
2022-06-08 08:17:11 [Parachain] 💔 Error importing block 0xa58bed7c6b43ea63d14b0b9777be84d669bfac773c23d09b4526476d9f9c1d12: consensus error: Import failed: State Database error: Too many sibling blocks inserted
2022-06-08 08:17:12 [Relaychain] ✨ Imported #36717 (0xab24…6424)
2022-06-08 08:17:12 [Relaychain] 💤 Idle (7 peers), best: #36717 (0xab24…6424), finalized #36713 (0x0962…12cf), ⬇ 3.7kiB/s ⬆ 4.2kiB/s
2022-06-08 08:17:12 [Parachain] 💤 Idle (3 peers), best: #17158 (0x2b7a…5f2d), finalized #17158 (0x2b7a…5f2d), ⬇ 1.5kiB/s ⬆ 1.0kiB/s
2022-06-08 08:17:17 [Relaychain] 💤 Idle (7 peers), best: #36717 (0xab24…6424), finalized #36714 (0x0dee…5e78), ⬇ 2.4kiB/s ⬆ 4.4kiB/s
2022-06-08 08:17:17 [Parachain] Block import error: State Database error: Too many sibling blocks inserted
2022-06-08 08:17:17 [Parachain] 💔 Error importing block 0xa58bed7c6b43ea63d14b0b9777be84d669bfac773c23d09b4526476d9f9c1d12: consensus error: Import failed: State Database error: Too many sibling blocks inserted
2022-06-08 08:17:17 [Parachain] 💤 Idle (3 peers), best: #17158 (0x2b7a…5f2d), finalized #17158 (0x2b7a…5f2d), ⬇ 0.8kiB/s ⬆ 0.2kiB/s

it seems on polkadot-v0.9.3 branch including Polkadot 0.9.3 relaychain trigger this often, I saw this 3 times in one week

@jasl
Copy link
Contributor

jasl commented Jun 8, 2022

the story looks like because of #5639 validators randomly panic, then only 1 or 2 validators still alive, then the collator will be trouble

@bkchr
Copy link
Member

bkchr commented Jun 8, 2022

@jasl your issue is on the parachain side and this is tracked here: paritytech/cumulus#432

@ordian ordian moved this to Backlog in Parachains-core Aug 16, 2022
@ordian ordian added the T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes. label Aug 16, 2022
@eskimor eskimor removed this from Parachains-core Dec 5, 2022
@tdimitrov
Copy link
Contributor

The "Too many sibling blocks inserted" was already fixed in paritytech/cumulus#432.
The "state discarded" part sounds like a duplicate of paritytech/polkadot-sdk#793.

Is there anything else we need to do for this one?

@eskimor
Copy link
Member Author

eskimor commented Apr 24, 2023

No, I think we are good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug Fails to follow expected behavior. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.
Projects
No open projects
Development

No branches or pull requests

6 participants