polkadot: pin one block per session #1220

ordian · 2023-08-28T14:23:27Z

Fixes #623.

Problem

When an invalid parachain block (candidate) gets backed and included, an honest validator doing approval-checking work will raise a dispute. Assuming the honest supermajority, the dispute will conclude against the candidate. That means we will revert to the block right before the inclusion of the invalid candidate and transplant the results of the dispute to the next block built on top (without the inclusion). So far so good.

After this happens, we want to slash the offending backers. Normally, this would happen right in the same block when the dispute concluded. However, if the inclusion happened in a past session, this might happen in the following blocks. For past session slashing to work, we need the state of a block in this past session in order to query the key ownership proof on the node side.
The problem is that inclusion blocks of the candidate might be pruned as a fork of a finalized chain as you can see from the logs:

2023-09-04 15:34:43.050  INFO tokio-runtime-worker parachain::dispute-coordinator: Dispute on candidate concluded with 'invalid' result candidate_hash=0x8e0b19f6d6ef2180b7de85377c5b1ea49bfe06a2667118621bb6b4125799cc4e session=1 traceID=188808017
2023-09-04 15:34:43.777 DEBUG tokio-runtime-worker parachain::approval-voting: approved blocks 41-[11 111111111]-31
2023-09-04 15:34:44.934  INFO tokio-runtime-worker substrate: 💤 Idle (5 peers), best: #41 (0x582e…7bbc), finalized #30 (0xdd25…2f3b), ⬇ 5.6kiB/s ⬆ 5.0kiB/s
2023-09-04 15:34:45.780 DEBUG tokio-runtime-worker parachain::approval-voting: Block finalized block_hash=0xb981441ffd53e96053e5126d63df0d23832fa09811dc15be8bcb4a88df2eab49 block_number=38
2023-09-04 15:34:48.041  INFO tokio-runtime-worker substrate: ✨ Imported #42 (0x846e…bf3a)
2023-09-04 15:34:48.045  INFO tokio-runtime-worker parachain::dispute-coordinator: Processing unapplied validator slashes session_index=1 candidate_hash=0x8e0b19f6d6ef2180b7de85377c5b1ea49bfe06a2667118621bb6b4125799cc4e n_slashes=1 traceID=18880
8017283788890442224175552623484580
2023-09-04 15:34:48.045  WARN tokio-runtime-worker parachain::runtime-api: cannot query the runtime API version: Api called for an unknown Block: State already discarded for 0x2f095f2550a11c0889c04dad89b22419f72c297960cd9ba1891da0b9b198794e
2023-09-04 15:34:48.045 DEBUG tokio-runtime-worker parachain::dispute-coordinator: Key ownership proof not yet supported. session_index=1 candidate_hash=0x8e0b19f6d6ef2180b7de85377c5b1ea49bfe06a2667118621bb6b4125799cc4e validator_id=Public(bc646
2023-09-04 15:34:48.045  WARN tokio-runtime-worker parachain::dispute-coordinator: Could not generate key ownership proofs for 1 keys session_index=1 candidate_hash=0x8e0b19f6d6ef2180b7de85377c5b1ea49bfe06a2667118621bb6b4125799cc4e traceID=18880
8017283788890442224175552623484580
2023-09-04 15:34:54.034  INFO tokio-runtime-worker parachain::dispute-coordinator: Couldn't find inclusion parent for an unapplied slash

Solution

To address this problem, we pin the state of 1 block per session for DISPUTE_WINDOW (which is 6 at the time of writing) sessions. See 5cba653. Given that we know the blocks that are not pruned, we can make guarantee that runtime APIs to that block will not fail.

Alternatives

Instead of storing (pinning) the whole state of the block, why don't we collect just key ownership proofs of backers?
Well, the problem with this approach, is that it assumes that the node side can predict who is going to be slashed in the runtime before it happens. The logic of slashing is encapsulated in the runtime and we don't want to leak that to the node side. Besides, there might be other storage items we might want to query in the future.
Pin inclusion blocks as soon as we see f+1 votes against a candidate. The problem with this approach is described below: polkadot: pin one block per session #1220 (comment)

polkadot/node/core/chain-api/src/lib.rs

polkadot/node/core/chain-api/src/metrics.rs

eskimor

Great work @ordian !

polkadot/node/core/chain-api/src/lib.rs

eskimor · 2023-08-28T16:38:01Z

polkadot/node/core/dispute-coordinator/src/initialized.rs

+					"All good. Unpinning the inclusion blocks",
+				);
+				for (_number, hash) in inclusions {
+					ctx.send_message(ChainApiMessage::UnpinBlock(hash)).await;


There is a mall issue. While we pin the block the scraper will eventually fade out blocks that fell behind finality too much.

This might be hard to exploit, as slashing should not take too long (always true?!), but we also have an lru that might just get full. I am wondering whether we can not do better? It looks like we would not strictly need the precise relay parent to be around. We should be able to make a block of the same session suffice.

With this, couldn't we make this more robust and more efficient at the same time, by making sure to always have one relay block available for every session in dispute window size and use this for any proofs/session fetches/..?

Technically with probabilistic finality fixed, this should not be too hard as that relay block could then always be on the canonical chain.

Interested in your thoughts. I am definitely not opposing the current solution, it should definitely get a security audit though. I feel there could be exploitable edge cases.

I definitely love the explicit block pinning!

This is a great suggestion, thank you! Currently tinkering with an implementation for it to see if there are any problems with it.

Also extract the leaf creation for tests into a common function.

* master: Sassafras primitives (#1249) Restructure `dispatch` macro related exports (#1162) backing: move the min votes threshold to the runtime (#1200) Bump zstd from 0.11.2+zstd.1.5.2 to 0.12.4 (#1326) Remove `substrate_test_utils::test` (#1321) remove disable-runtime-api (#1328)

stale

* master: (25 commits) Markdown linter (#1309) Update `fmt` file and some authors (#1379) Bump the known_good_semver group with 1 update (#1375) Bump proc-macro-warning from 0.4.1 to 0.4.2 (#1376) feat: add futures api to `TransactionPool` (#1348) Ensure cumulus/bridges is ignored by formatter and run it (#1369) substrate: chain-spec paths corrected in zombienet tests (#1362) contracts: Update to wasmi 0.31 (#1350) [improve docs]: Template pallet (#1280) [xcm-emulator] Unignore cumulus integration tests (#1247) Fix wrong ref counting (#1358) Use cached session index to obtain executor params (#1190) fix typos (#1339) Use bandersnatch-vrfs with locked dependencies ref (#1342) Bump bs58 from 0.4.0 to 0.5.0 (#1293) Contracts: `seal0::balance` should return the free balance (#1254) Logs: add extra debug log for negative rep changes (#1205) Added short-benchmarks for cumulus (#1183) [xcm-emulator] Improve hygiene and clean up (#1301) Bump the known_good_semver group with 1 update (#1347) ...

ordian · 2023-09-04T12:02:21Z

This is ready for the review, but I can't seem to repro the error observed without this changes 🙈
Also need to fix the test in CI (again, can't repro locally).

ordian · 2023-09-06T13:36:22Z

polkadot/node/subsystem-test-helpers/src/lib.rs

-	}
-
-	#[test]
-	fn forward_subsystem_works() {


I had to remove this test to break (dev-) cycle dependency. Also, in general, I am not convinced we need tests for tests.

eskimor

Awesome! This is code how I like it!

eskimor · 2023-09-06T13:29:06Z

polkadot/node/core/approval-voting/src/tests.rs

-				status: LeafStatus::Fresh,
-				span: Arc::new(jaeger::Span::Disabled),
-			},
+			fresh_leaf(*new_head, number),


Not at the side: We found this fresh/stale mechanism to be redundant. We should remove it at some point.

I've renamed it to new_leaf to account for this issue being worked on. Also with this helper function, we won't have to modify tons of files to implement this issue.

eskimor · 2023-09-06T13:49:05Z

polkadot/node/overseer/src/lib.rs

 	pub number: BlockNumber,
+	/// A handle to unpin the block on drop.
+	pub unpin_handle: UnpinHandle,


eskimor · 2023-09-06T13:52:18Z

polkadot/node/subsystem-types/src/lib.rs

+///
+/// This is useful for runtime API calls to blocks that are
+/// racing against finality, e.g. for slashing purposes.
+pub type UnpinHandle = sc_client_api::UnpinHandle<Block>;


This is awesome! No races against finality anymore! This should do away with a couple of issues, you have been looking into @tdimitrov !

All we need to do is keep the leaf alive, while we work on it.

eskimor · 2023-09-06T13:53:16Z

polkadot/node/subsystem-util/src/runtime/mod.rs

@@ -74,6 +75,10 @@ pub struct RuntimeInfo {
 	/// Look up cached sessions by `SessionIndex`.
 	session_info_cache: LruMap<SessionIndex, ExtendedSessionInfo>,

+	/// Unpin handle of *some* block in the session.
+	/// Only blocks pinned explicitly by `pin_block` are stored here.
+	pinned_blocks: LruMap<SessionIndex, UnpinHandle>,


sandreim

Great work @ordian . This is a very elegant solution.

polkadot/node/subsystem-util/src/runtime/mod.rs

polkadot/node/core/dispute-coordinator/src/initialized.rs

* master: (28 commits) Adds base benchmark for do_tick in broker pallet (#1235) zombienet: use another collator image for the slashing test (#1386) Prevent a fail prdoc check to block (#1433) Fix nothing scheduled on session boundary (#1403) GHW for building and publishing docker images (#1391) pallet asset-conversion additional quote tests (#1371) Remove deprecated `pallet_balances`'s `set_balance_deprecated` and `transfer` dispatchables (#1226) Fix PRdoc check (#1419) Fix the wasm runtime substitute caching bug (#1416) Bump enumn from 0.1.11 to 0.1.12 (#1412) RFC 14: Improve locking mechanism for parachains (#1290) Add PRdoc check (#1408) fmt fixes (#1413) Enforce a decoding limit in MultiAssets (#1395) Remove dynamic dispatch using `Ext` (#1399) Remove redundant calls to `borrow()` (#1393) Get rid of polling in `WarpSync` (#1265) Bump actions/checkout from 3 to 4 (#1398) Bump thiserror from 1.0.47 to 1.0.48 (#1396) Move Relay-Specific Shared Code to One Place (#1193) ...

* master: Forgotten `polkadot-core-primitives/std` (#1440)

* polkadot: propagate UnpinHandle to ActiveLeafUpdate Also extract the leaf creation for tests into a common function. * dispute-coordinator: try pinned blocks for slashin * apparently 1.72 is smarter than 1.70 * address nits * rename fresh_leaf to new_leaf

Polkadot-Forum · 2023-09-21T17:02:43Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/stalled-parachains-on-kusama-post-mortem/3998/1

* refactor finality relay helper definitions * add missing doc * removed commented code * fmt * disable rustfmt for macro * move best_finalized method const to relay chain def

ordian added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. T8-parachains_engineering labels Aug 28, 2023

sandreim previously approved these changes Aug 28, 2023

View reviewed changes

polkadot/node/core/chain-api/src/lib.rs Outdated Show resolved Hide resolved

polkadot/node/core/chain-api/src/metrics.rs Outdated Show resolved Hide resolved

ordian mentioned this pull request Aug 28, 2023

sc-client: expose pinning API #1219

Closed

eskimor previously approved these changes Aug 28, 2023

View reviewed changes

ordian marked this pull request as draft August 30, 2023 11:56

ordian mentioned this pull request Aug 30, 2023

Fix polkadot zombienet tests #1276

Merged

ordian added 2 commits August 31, 2023 12:03

polkadot: propagate UnpinHandle to ActiveLeafUpdate

908ab66

Also extract the leaf creation for tests into a common function.

dispute-coordinator: try pinned blocks for slashin

5cba653

ordian force-pushed the ao-polkadot-use-pinning-api branch from a3f6021 to 5cba653 Compare August 31, 2023 10:05

ordian changed the base branch from ao-expose-client-pinning-api to master August 31, 2023 10:05

ordian changed the title ~~polkadot: pin inclusion blocks until slashed~~ polkadot: pin one block per session Aug 31, 2023

ordian added 2 commits August 31, 2023 16:00

apparently 1.72 is smarter than 1.70

1c6d4b3

ordian requested review from eskimor and sandreim August 31, 2023 14:02

ordian marked this pull request as ready for review September 4, 2023 12:01

ordian commented Sep 6, 2023

View reviewed changes

eskimor approved these changes Sep 6, 2023

View reviewed changes

sandreim approved these changes Sep 6, 2023

View reviewed changes

polkadot/node/subsystem-util/src/runtime/mod.rs Outdated Show resolved Hide resolved

polkadot/node/core/dispute-coordinator/src/initialized.rs Show resolved Hide resolved

ordian added 3 commits September 7, 2023 10:22

address nits

24aa5d4

rename fresh_leaf to new_leaf

2442d49

ordian enabled auto-merge (squash) September 7, 2023 08:33

Merge 'master' again

bc0c362

* master: Forgotten `polkadot-core-primitives/std` (#1440)

fix compilation after a rename

cb91526

eskimor approved these changes Sep 7, 2023

View reviewed changes

ordian merged commit 1550388 into master Sep 7, 2023
7 checks passed

ordian deleted the ao-polkadot-use-pinning-api branch September 7, 2023 10:24

kiltbot mentioned this pull request Oct 19, 2023

[AUTOMATIC] Update Polkadot dependencies from 1.1.0 to 1.2.0 KILTprotocol/kilt-node#571

Closed

ahmadkaouk mentioned this pull request Nov 16, 2023

Update polkadot-sdk from v.1.1.0 to v1.3.0 moonbeam-foundation/moonbeam#2565

Closed

bkchr pushed a commit that referenced this pull request Apr 10, 2024

Refactor finality relay helpers (#1220)

e675b13

* refactor finality relay helper definitions * add missing doc * removed commented code * fmt * disable rustfmt for macro * move best_finalized method const to relay chain def

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polkadot: pin one block per session #1220

polkadot: pin one block per session #1220

ordian commented Aug 28, 2023 •

edited

Loading

eskimor left a comment

eskimor Aug 28, 2023

ordian Aug 30, 2023

ordian commented Sep 4, 2023 •

edited

Loading

ordian Sep 6, 2023

eskimor left a comment

eskimor Sep 6, 2023

ordian Sep 7, 2023

eskimor Sep 6, 2023

eskimor Sep 6, 2023

eskimor Sep 6, 2023

eskimor Sep 6, 2023

sandreim left a comment

Polkadot-Forum commented Sep 21, 2023

polkadot: pin one block per session #1220

polkadot: pin one block per session #1220

Conversation

ordian commented Aug 28, 2023 • edited Loading

Problem

Solution

Alternatives

eskimor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ordian commented Sep 4, 2023 • edited Loading

Choose a reason for hiding this comment

eskimor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreim left a comment

Choose a reason for hiding this comment

Polkadot-Forum commented Sep 21, 2023

ordian commented Aug 28, 2023 •

edited

Loading

ordian commented Sep 4, 2023 •

edited

Loading