-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the submitted solution gets reverted, we are left with nothing, and thus a heavy on-chain election
Under what conditions can the solution be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! So the assumption right now is that:
- If you run OCW at the election window opening block, you always attempt to compute a solution.
- When the window stays open AND there is no submission already, you attempt to re-submit your solution.
In theory we could attempt to submit if we've found a better solution, but we can leave that as a future improvement, cause afaiu the main goal right now is to ensure that there is ANY solution to never fall back to on-chain process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How complicated would it to write a test for this? :D
Probably in the order that I'd say given the dry-run, not needed 😄 but I will give it a try before merge. |
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
…te into kiz-staking-OCW-rerun
Burnin looks well, will leave it running for another day with full logs to check everything, else we're good to go. |
I see it very likely that we will close this and just re-introduce it in the multi-phase election pallet, given that the audit has taken a while. |
Closing this since will be re-introduced by @coriolinus soon. |
This is a much smaller diff than that PR contained, but I think it contains all the essentials.
* not climate * explain the intent of the bool in the unsigned phase * remove glob imports from unsigned.rs * add OffchainRepeat parameter to ElectionProviderMultiPhase * migrate core logic from #7976 This is a much smaller diff than that PR contained, but I think it contains all the essentials. * improve formatting * fix test build failures * cause test to pass * Apply suggestions from code review Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> * collapse imports * threshold acquired directly within try_acquire_offchain_lock * add test of resubmission after interval * add test that ocw can regenerate a failed cache when resubmitting * ensure that OCW solutions are of the correct round This should help prevent stale cached solutions from persisting past the election for which they are intended. * add test of pre-dispatch round check * use `RawSolution.round` instead of redundantly externally * unpack imports Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> * rename `OFFCHAIN_HEAD_DB` -> `OFFCHAIN_LOCK` * rename `mine_call` -> `mine_checked_call` * eliminate extraneous comma * check cached call is current before submitting * remove unused consts introduced by bad merge. Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> * resubmit when our solution beats queued solution * clear call cache if solution fails to submit * use local storage; clear on ElectionFinalized * Revert "use local storage; clear on ElectionFinalized" This reverts commit 4b46a93. * BROKEN: try to filter local events in OCW * use local storage; simplify score fetching * fix event filter * mutate storage instead of setting it * StorageValueRef::local isn't actually implemented yet * add logging for some events of interest in OCW miner * rename kill_solution -> kill_ocw_solution to avoid ambiguity * defensive err instead of unreachable given unreachable code * doc punctuation Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> * distinguish miner errors between "out of date" and "call invalid" * downgrade info logs -> debug * ensure encoded call decodes as a call * fix semantics of validation of pre-dispatch failure for wrong round * move score check within `and_then` * add test that offchain workers clear their cache after election * ensure that bad ocw submissions are not retained for resubmission * simplify fn ocw_solution_exists * add feasibility check when restoring cached solution should address https://github.com/paritytech/substrate/pull/8290/files#r617533358 restructures how the checks are sequenced, which simplifies legibility. * simplify checks again Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
* not climate * explain the intent of the bool in the unsigned phase * remove glob imports from unsigned.rs * add OffchainRepeat parameter to ElectionProviderMultiPhase * migrate core logic from paritytech#7976 This is a much smaller diff than that PR contained, but I think it contains all the essentials. * improve formatting * fix test build failures * cause test to pass * Apply suggestions from code review Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> * collapse imports * threshold acquired directly within try_acquire_offchain_lock * add test of resubmission after interval * add test that ocw can regenerate a failed cache when resubmitting * ensure that OCW solutions are of the correct round This should help prevent stale cached solutions from persisting past the election for which they are intended. * add test of pre-dispatch round check * use `RawSolution.round` instead of redundantly externally * unpack imports Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> * rename `OFFCHAIN_HEAD_DB` -> `OFFCHAIN_LOCK` * rename `mine_call` -> `mine_checked_call` * eliminate extraneous comma * check cached call is current before submitting * remove unused consts introduced by bad merge. Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> * resubmit when our solution beats queued solution * clear call cache if solution fails to submit * use local storage; clear on ElectionFinalized * Revert "use local storage; clear on ElectionFinalized" This reverts commit 4b46a93. * BROKEN: try to filter local events in OCW * use local storage; simplify score fetching * fix event filter * mutate storage instead of setting it * StorageValueRef::local isn't actually implemented yet * add logging for some events of interest in OCW miner * rename kill_solution -> kill_ocw_solution to avoid ambiguity * defensive err instead of unreachable given unreachable code * doc punctuation Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> * distinguish miner errors between "out of date" and "call invalid" * downgrade info logs -> debug * ensure encoded call decodes as a call * fix semantics of validation of pre-dispatch failure for wrong round * move score check within `and_then` * add test that offchain workers clear their cache after election * ensure that bad ocw submissions are not retained for resubmission * simplify fn ocw_solution_exists * add feasibility check when restoring cached solution should address https://github.com/paritytech/substrate/pull/8290/files#r617533358 restructures how the checks are sequenced, which simplifies legibility. * simplify checks again Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
Fix issues such as https://github.com/paritytech/stakingops/issues/130
Our OCW runner in staking is too conservative now. It only runs once at the beginning of the election window, and only attempts to submit once. If for any reason (e.g. secondary/primary slot overlap, fork and rollback) all validators temporarily think that a solution is submitted, they will all drop their computed solution and never try again. In essence, they will attempt to submit it, which will fail because there is already a good solution on-chain, and the process ends. In this scenario, if the submitted solution gets reverted, we are left with nothing, and thus a heavy on-chain election.
This is a minimal patch to fix the situation. OCW will compute, save and submit the solution initially, and until the end of the election window periodically try to re-submitted the saved solution, if none is stored yet.
Note that I tried to make this patch minimal, as #7909 will have to reintroduce this again.
This needs a burn-in, but not a typical one. I will prepare a patch to the current polkadot release that contains this runtime change but does not bump the spec version. This is okay because there is no runtime change, and the new runtime should work with the old one. Then we can run this first on a full node, and the once we are confident, a validator node.
-l=staking=trace
is important in all stages.Polkadot dry-run branch: https://github.com/paritytech/polkadot/tree/kiz-staking-OCW-rerun-dryrun-polkadot
Created using: https://hackmd.io/MMGn9TnKSx2oganhfcQrpw