Domains: Freeze, Unfreeze, and Prune Execution receipt of Domain through Sudo #2896

vedhavyas · 2024-07-09T06:34:38Z

This PR add new dispatches for Sudo on Consensus to

Freeze a given domain
- Once frozen, Domain does not accept any new bundles but still processes the fraud proofs
Unfreeze a given domain
Prune a bad execution receipt of Frozen domain
- Dispatch makes an assumption that Sudo has already validated the invalidity of the Bad ER.

This dispatches are generally used but in case if the Fraud proof does not fit into the Consensus block due to incorrect weight/size limits. Then through social consensus, Sudo/Governance would freeze the domain for time being, prune any Bad ERs and unfreeze the domain to enable to normal execution.

~~Will update the spec with these changes.~~
Spec updates: subspace/protocol-specs#44

cc: @dariolina

Code contributor checklist:

I have read, understood and followed contributing guide

…ciept of a given domain through Sudo

NingLin-P · 2024-07-09T08:35:17Z

Prune a bad execution receipt of Frozen domain
Dispatch makes an assumption that Sudo has already validated the invalidity of the Bad ER.

I'm not quite sure about this, if the FP can't be submitted due to:

weight/size limit
prover can't generate FP
verifier rejects the FP
etc

Then it is a bug of the FP system we need to fix before unfreezing the domain. While pruning the bad ER is just a temporary workaround the issue can still happen again. Also, this puts the burden of manually validating the bad ER on the Sudo while the FP system is broken. cc @jfrank-summit

vedhavyas · 2024-07-09T10:00:24Z

I'm not quite sure about this, if the FP can't be submitted due to:
weight/size limit
prover can't generate FP
verifier rejects the FP
etc

That is actual reason why we have this dispatch. If the fraud proof is not able to be submitted for any reason. Then there will be some sort of offchain process to verify the actual fraud proof validity and then root will prune the bad ER.
This is what I assumed per audit discussion here - https://subspacenetwork.slack.com/archives/C04RUR1P37U/p1719441632508529

Maybe @dariolina can confirm here

dariolina · 2024-07-09T10:12:19Z

I think you're both right. The flow I imagine is:

Something is wrong, sudo freezes the domain
We investigate off-chain what went wrong, if needed fix relevant FP bugs so that the same situation does not happen again
Roll out upgrades with the fix
Prune bad ER manually
Unfreeze domain for the upgrades to apply
WDYT?

NingLin-P · 2024-07-09T10:34:50Z

I think we both agree that freeze and unfreeze domains are necessary but the controversy is how to prune the bad ER. What I imagine is after the fix is applied (either to the prover or the verifier) the FP flow should work again and an FP will generate and submit successfully to prune the bad ER, this should happen automatically without manually validating the FP/bad ER offchain.

dariolina · 2024-07-09T11:23:15Z

But assume we need to do domain runtime upgrade for the fix then the ER in question will go out of challenge period by the time new code kicks in. Right?

NingLin-P · 2024-07-09T11:55:14Z

But assume we need to do domain runtime upgrade for the fix

The domain runtime code that derives the bad ER will be used during FP verification, not the latest runtime code. Even if there is a necessary hack required for the fix, the hack should be part of the fix and present in the code rather than offchain IMHO.

vedhavyas · 2024-07-09T12:17:01Z

The domain runtime code that derives the bad ER will be used during FP verification, not the latest runtime code. Even if there is a necessary hack required for the fix, the hack should be part of the fix and present in the code rather than offchain IMHO.

Okay, let assume a edge case scenario where FP did not make it into Consensus block due to multiple reasons. Malicious operators can continue to submit fraud proofs while honest operators continue to submit FP. As long as the bad ER does not go out of challenge period, this is fine. But it is possible for the bad ER to be confirmed due to above FP not making into Consensus block.
If the team realises that, they would freeze the domain. Then whats next?
Do we make a temporary hack to ensure FP lands in Consensus block ? What if its was a bug from domain runtime itself? If so even if we unfreeze the domain to produce the bundle, there is a possibility that bad ER being confirmed and continue to produce new bad ERs?

At this point, governance can make a call to prune the first bad er, upgrade the domain runtime, wait for runtime to be enacted on consensus chain, then proceed to unfreeze the domain. The next domain block will contain the new runtime and honest operators will with domain block imports.

There could a pelthora such scenarios and the list might not even be exhaustive. This dispatch is essentially a way for us to ensure such guards are in place

NingLin-P · 2024-07-09T13:10:33Z

If the team realises that, they would freeze the domain. Then whats next?
Do we make a temporary hack to ensure FP lands in Consensus block ? What if its was a bug from domain runtime itself?

What the team will do is deploy a complete fix (which contains a hack if necessary) that allows the prover to generate the FP and the verifier to accept the FP, no matter whether it is a domain runtime bug or not, to prune the existing and any new bad ER.

At this point, governance can make a call to prune the first bad er, upgrade the domain runtime, wait for runtime to be enacted on consensus chain, then proceed to unfreeze the domain. The next domain block will contain the new runtime and honest operators will with domain block imports.

I really don't see how "governance can make a call to prune the first bad er" helps in this case:

If the verifier, i.e. consensus chain, needs to use a new domain runtime to verify the FP of the bad ER (which derive from an old runtime), then the fix should be a consensus runtime upgrade that contains a hack to use the latest domain runtime code to verify the FP (perhaps for a specific range of ER) instead of using the runtime code that derive the bad ER
If the prover, i.e. the domain client, needs to use the new domain runtime code to generate the FP, then it just needs to wait until the runtime is upgraded since domain runtime upgrade is forcibly and even if there is no bundle

After the fix is deployed the honest operator will produce/submit FP to prune the bad ER, and then we can unfreeze the domain, I don't see why we need to manually prune the bad ER beforehand.

There could a pelthora such scenarios and the list might not even be exhaustive. This dispatch is essentially a way for us to ensure such guards are in place

The only safeguard, from my understanding, is the ability to freeze the domain and deploy the fix, what the fix will be depends on the issue.

NingLin-P

Still I think we should not use prune_domain_execution_receipt during the recovery process but as the audit team points out there is no hurt to have this capability, let's move on.

NingLin-P · 2024-07-15T15:51:47Z

crates/pallet-domains/src/lib.rs

+    pub fn max_prune_domain_execution_receipt() -> Weight {
+        T::WeightInfo::handle_bad_receipt(MAX_BUNLDE_PER_BLOCK)
+            .saturating_add(T::DbWeight::get().reads_writes(3, 1))
+    }


The extrinsic weights are not just the database write & read but also a base weight, thus would prefer to have proper benchmarking like submit_fraud_proof, though non-blocker for this PR.

add functionality to freeze, unfreeze, and prune a given execution re…

cc6ed8a

…ciept of a given domain through Sudo

vedhavyas requested review from nazar-pc, rg3l3dr and NingLin-P as code owners July 9, 2024 06:34

vedhavyas added the need to audit This change needs to be audited label Jul 9, 2024

vedhavyas mentioned this pull request Jul 9, 2024

update decex workflow to include Sudo operations on domain subspace/protocol-specs#44

Merged

NingLin-P approved these changes Jul 15, 2024

View reviewed changes

vedhavyas added this pull request to the merge queue Jul 16, 2024

Merged via the queue into main with commit 8a1e2ed Jul 16, 2024
9 checks passed

vedhavyas deleted the domain_freeze_unfreeze branch July 16, 2024 09:30

vanhauser-thc added audited This change was audited and removed need to audit This change needs to be audited labels Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Domains: Freeze, Unfreeze, and Prune Execution receipt of Domain through Sudo #2896

Domains: Freeze, Unfreeze, and Prune Execution receipt of Domain through Sudo #2896

vedhavyas commented Jul 9, 2024 •

edited

Loading

NingLin-P commented Jul 9, 2024

vedhavyas commented Jul 9, 2024

dariolina commented Jul 9, 2024

NingLin-P commented Jul 9, 2024

dariolina commented Jul 9, 2024

NingLin-P commented Jul 9, 2024

vedhavyas commented Jul 9, 2024

NingLin-P commented Jul 9, 2024

NingLin-P left a comment

NingLin-P Jul 15, 2024

Domains: Freeze, Unfreeze, and Prune Execution receipt of Domain through Sudo #2896

Domains: Freeze, Unfreeze, and Prune Execution receipt of Domain through Sudo #2896

Conversation

vedhavyas commented Jul 9, 2024 • edited Loading

Code contributor checklist:

NingLin-P commented Jul 9, 2024

vedhavyas commented Jul 9, 2024

dariolina commented Jul 9, 2024

NingLin-P commented Jul 9, 2024

dariolina commented Jul 9, 2024

NingLin-P commented Jul 9, 2024

vedhavyas commented Jul 9, 2024

NingLin-P commented Jul 9, 2024

NingLin-P left a comment

Choose a reason for hiding this comment

NingLin-P Jul 15, 2024

Choose a reason for hiding this comment

vedhavyas commented Jul 9, 2024 •

edited

Loading