Change forks pruning algorithm. #3962

shamil-gadelshin · 2024-04-03T10:33:01Z

This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the related discussion

The previous code contained this comment describing the situation:

	/// Note a block height finalized, displacing all leaves with number less than the finalized
	/// block's.
	///
	/// Although it would be more technically correct to also prune out leaves at the
	/// same number as the finalized block, but with different hashes, the current behavior
	/// is simpler and our assumptions about how finalization works means that those leaves
	/// will be pruned soon afterwards anyway.
	pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> {

The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future.

- Prune all possible forks after block finalizing without height limit.

Polkadot-Forum · 2024-04-03T10:46:01Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/block-header-pruning/7198/1

bkchr

While the logic is correct, it is much more costly doing this than doing the "lazy" pruning as before. If we would record the block at which a certain leaf forked off, we should probably be able to achieve the same without iterating always over all the headers from the leaf down to the finalized block?

substrate/client/api/src/leaves.rs

substrate/primitives/blockchain/src/backend.rs

Co-authored-by: Bastian Köcher <git@kchr.de>

shamil-gadelshin · 2024-04-04T08:01:02Z

it is much more costly doing this than doing the "lazy" pruning as before

I assume that we traverse the block subtree which is relatively shallow with a small of leaves on each block finalization and I see the most expensive operation here is "read header from DB (disk)" in a loop. (Am I correct here?).
a) However, this operation is cached and is likely to be accessible from the cache in the majority of header acquisitions.
b) Both the previous and the current algorithms invoke this function chain to prune blocks: note_finalized()->prune_blocks()->prune_displaced_branches()->tree_route() and tree_route is invoked for each dispaced leaf and contains within it a similar loop with extracting header metadata (which invokes the same "read header" operation each time). Even if the new algorithm will read all the necessary headers from DB instead of the cache (likely on a restart) - the subsequent tree_route operations will be much faster. To be precise the archive pruning mode won't invoke the prune_block code but we consider the worst case anyway.

If we would record the block at which a certain leaf forked off, we should probably be able to achieve the same without iterating always over all the headers from the leaf down to the finalized block?

I understand your suggestion as follows: on each block import operation calculate a branching block for the fork, save it to a dedicated structure, and use this information later instead of calculation.
a) Because of the possible restarts we need to make it persistent, but it contradicts your previous guideline: Add APIs for block pruning manually #1570 (comment)
I understood that comment as "in general, we should avoid adding persistent data if we can calculate the result on the fly (similar to tree_route operation)".
b) Even if we add the persistent cached data for current leaves it seems we will update this data in a similar loop but in the opposite direction: on each new block we would check whether it's a new best and recalculate saved "fork-points" for all the leaves. However, this time we won't be able to cache disk "write" operations. Did I understand it correctly?

Summary:
The new proposed algorithm doesn't seem to worsen the operation cost much because of the mostly cached operations and already existing tree_route operation.

Polkadot-Forum · 2024-04-04T08:05:03Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/block-header-pruning/7198/2

bkchr · 2024-04-04T20:20:09Z

a) However, this operation is cached and is likely to be accessible from the cache in the majority of header acquisitions.

Yes, I mean I would assume the same as well. I think the best is to ask @arkpar on why he hasn't done it this way from the beginning. There is maybe something we oversee, otherwise I'm also fine with the approach you are proposing here.

arkpar · 2024-04-05T07:35:54Z

This was written by someone else, but I guess it was done for simplicity, as the quoted comment explains.

As for this PR, it looks good for me. I wonder if it can be optimized though. prune_displaced_branches calls sp_blockchain::tree_route for each displaced leaf anyway. Instead of traversing the header chain, it should be possible to just call tree_route once somewhere and then store or pass around the retracted path for each leaf.

shamil-gadelshin · 2024-04-05T12:39:22Z

I implemented the suggestions from @arkpar The code seems simpler now.

`finalize_height` method doesn’t exist. It was used to determine the forks and that algorithm changed.

shamil-gadelshin · 2024-04-05T16:39:13Z

The last commit removes the tests related to the removed method: finalize_height.
The new code is tested with other tests related to pruning blocks (displaced leaves).

shamil-gadelshin · 2024-04-09T15:43:11Z

The last commits fix tests by updating the expected block where stale heads appear. However, I also changed expand_forks to use tree_route function first. It seems to me that the function works incorrectly because it uses hash(block_number) function to work with the canonical chain to get a parent block by number. I don't think that hash(block_number) defines its behavior in the presence of forks (it just reads from DB). Please, correct me if I'm wrong. I didn't delete the original code and left it as a fallback option because it tries to return a partial route in case where some blocks are missing and tree_route doesn't do that.

shamil-gadelshin · 2024-04-16T12:28:39Z

@arkpar Do you mind looking at the expand_fork() changes?

bkchr · 2024-04-16T13:01:17Z

I don't think that hash(block_number) defines its behavior in the presence of forks (it just reads from DB). Please, correct me if I'm wrong.

When you call hash(block_number), it will return the hash of the canonical block at block_number. If the hash that is returned, is different to your parent you are still in a fork from the POV of the canonical chain. This implementation is correct and also faster than tree_route, because it doesn't go up the chain again.

bkchr · 2024-04-16T12:38:57Z

substrate/client/api/src/leaves.rs

@@ -479,35 +436,4 @@ mod tests {
 		assert!(set.contains(10, 1_2));
 		assert!(!set.contains(10, 1_3));
 	}
-
-	#[test]
-	fn finalization_consistent_with_disk() {


Why did you remove this test?

finalize_height was removed from LeafSet. Please, let me know if you see how this test should be reimplemented differently.

substrate/client/rpc-spec-v2/src/chain_head/tests.rs

bkchr · 2024-04-16T13:01:36Z

substrate/primitives/blockchain/src/backend.rs

+			match tree_route(self, *fork_head, self.info().finalized_hash) {
+				Ok(tree_route) => {
+					for block in tree_route.retracted() {
+						expanded_forks.insert(block.hash);
+					}
+					continue
+				},
+				Err(_) => {
+					// Continue with fallback algorithm
+				},
+			}
+


Suggested change

match tree_route(self, *fork_head, self.info().finalized_hash) {

Ok(tree_route) => {

for block in tree_route.retracted() {

expanded_forks.insert(block.hash);

}

continue

},

Err(_) => {

// Continue with fallback algorithm

},

}

See my comment in the pr.

substrate/primitives/blockchain/src/backend.rs

prdoc/pr_3962.prdoc

Co-authored-by: Bastian Köcher <git@kchr.de>

shamil-gadelshin · 2024-05-10T13:04:52Z

Could you please remove one of the code paths?

Sure. I removed the old code and updated the function comment as well as its dependencies. I also updated one of the tests to counter its flaky behavior.

bkchr

Thank you!

bkchr · 2024-05-14T20:30:27Z

@shamil-gadelshin the rustdoc jobs are still failing.

This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the [related discussion](paritytech#1570) The previous code contained this comment describing the situation: ``` /// Note a block height finalized, displacing all leaves with number less than the finalized /// block's. /// /// Although it would be more technically correct to also prune out leaves at the /// same number as the finalized block, but with different hashes, the current behavior /// is simpler and our assumptions about how finalization works means that those leaves /// will be pruned soon afterwards anyway. pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> { ``` The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future. --------- Co-authored-by: Bastian Köcher <git@kchr.de>

## Issue Currently, syncing parachains from scratch can lead to a very long finalization time once they reach the tip of the chain. The problem is that we try to finalize everything from 0 to the tip, which can be thousands or even millions of blocks. We finalize sequentially and try to compute displaced branches during finalization. So for every block on the way, we compute an expensive tree route. ## Proposed Improvements In this PR, I propose improvements that solve this situation: - **Skip tree route calculation if `leaves().len() == 1`:** This should be enough for 90% of cases where there is only one leaf after sync. - **Optimize finalization for long distances:** It can happen that the parachain has imported some leaf and then receives a relay chain notification with the finalized block. In that case, the previous optimization will not trigger. A second mechanism should ensure that we do not need to compute the full tree route. If the finalization distance is long, we check the lowest common ancestor of all the leaves. If it is above the to-be-finalized block, we know that there are no displaced leaves. This is fast because forks are short and close to the tip, so we can leverage the header cache. ## Alternative Approach - The problem was introduced in #3962. Reverting that PR is another possible strategy. - We could store for every fork where it begins, however sounds a bit more involved to me. fixes #4614

This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the [related discussion](paritytech#1570) The previous code contained this comment describing the situation: ``` /// Note a block height finalized, displacing all leaves with number less than the finalized /// block's. /// /// Although it would be more technically correct to also prune out leaves at the /// same number as the finalized block, but with different hashes, the current behavior /// is simpler and our assumptions about how finalization works means that those leaves /// will be pruned soon afterwards anyway. pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> { ``` The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future. --------- Co-authored-by: Bastian Köcher <git@kchr.de>

…tech#4721) ## Issue Currently, syncing parachains from scratch can lead to a very long finalization time once they reach the tip of the chain. The problem is that we try to finalize everything from 0 to the tip, which can be thousands or even millions of blocks. We finalize sequentially and try to compute displaced branches during finalization. So for every block on the way, we compute an expensive tree route. ## Proposed Improvements In this PR, I propose improvements that solve this situation: - **Skip tree route calculation if `leaves().len() == 1`:** This should be enough for 90% of cases where there is only one leaf after sync. - **Optimize finalization for long distances:** It can happen that the parachain has imported some leaf and then receives a relay chain notification with the finalized block. In that case, the previous optimization will not trigger. A second mechanism should ensure that we do not need to compute the full tree route. If the finalization distance is long, we check the lowest common ancestor of all the leaves. If it is above the to-be-finalized block, we know that there are no displaced leaves. This is fast because forks are short and close to the tip, so we can leverage the header cache. ## Alternative Approach - The problem was introduced in paritytech#3962. Reverting that PR is another possible strategy. - We could store for every fork where it begins, however sounds a bit more involved to me. fixes paritytech#4614

Change forks pruning algorithm.

9decf7b

- Prune all possible forks after block finalizing without height limit.

shamil-gadelshin mentioned this pull request Apr 3, 2024

Add block header pruning. #3033

Closed

bkchr reviewed Apr 3, 2024

View reviewed changes

substrate/client/api/src/leaves.rs Outdated Show resolved Hide resolved

substrate/primitives/blockchain/src/backend.rs Outdated Show resolved Hide resolved

shamil-gadelshin and others added 2 commits April 4, 2024 15:00

Update substrate/client/api/src/leaves.rs

64094f2

Co-authored-by: Bastian Köcher <git@kchr.de>

Update substrate/primitives/blockchain/src/backend.rs

62ab0aa

Co-authored-by: Bastian Köcher <git@kchr.de>

Update fork calculation algorithm.

962b5d3

arkpar approved these changes Apr 5, 2024

View reviewed changes

Remove obsolete tests.

d325870

`finalize_height` method doesn’t exist. It was used to determine the forks and that algorithm changed.

Add pr_3962.prdoc

7e05f0d

shamil-gadelshin requested a review from andresilva as a code owner April 9, 2024 15:41

shamil-gadelshin added 3 commits April 9, 2024 22:49

Update expand_fork() function and fix test.

4aeeeb6

Update follow_report_multiple_pruned_block test.

27fcc5f

Update sc-service-test tests.

57816e9

shamil-gadelshin force-pushed the change-fork-calculation branch from 8fdcc7f to 57816e9 Compare April 9, 2024 15:49

bkchr approved these changes Apr 16, 2024

View reviewed changes

bkchr added the T0-node This PR/Issue is related to the topic “node”. label Apr 16, 2024

bkchr reviewed Apr 16, 2024

View reviewed changes

prdoc/pr_3962.prdoc Show resolved Hide resolved

Update substrate/primitives/blockchain/src/backend.rs

6fbc756

Co-authored-by: Bastian Köcher <git@kchr.de>

github-actions bot requested review from arkpar and bkchr April 17, 2024 14:24

shamil-gadelshin added 2 commits May 10, 2024 17:56

Update expand_forks() dependencies.

e56eed6

Update flaky test.

c875bd2

bkchr approved these changes May 14, 2024

View reviewed changes

Merge branch 'master' into change-fork-calculation

bc5290c

bkchr enabled auto-merge May 14, 2024 20:22

Fix doc-comment.

7e769f6

auto-merge was automatically disabled May 15, 2024 07:09
Head branch was pushed to by a user without write access

github-actions bot requested a review from bkchr May 15, 2024 07:09

bkchr approved these changes May 15, 2024

View reviewed changes

Merge branch 'master' into change-fork-calculation

95c1d7d

bkchr enabled auto-merge May 15, 2024 07:57

bkchr added this pull request to the merge queue May 15, 2024

Merged via the queue into paritytech:master with commit 9c69bb9 May 15, 2024
146 of 150 checks passed

skunert mentioned this pull request Jun 6, 2024

finalization: Skip tree route calculation if no forks present #4721

Merged

shamil-gadelshin mentioned this pull request Jun 13, 2024

Add support of block header pruning. #4788

Open

bkchr mentioned this pull request Jun 28, 2024

Finalization hangs in 1.13 #4903

Closed

2 tasks

MOZGIII mentioned this pull request Jul 26, 2024

Finalization stalling issues 2024-07 humanode-network/humanode#1104

Closed

This was referenced Aug 21, 2024

Update polkadot-sdk from v1.11.0 to stable2407 moondance-labs/tanssi#659

Open

Update polkadot-sdk from v1.11.0 to stable2407 moonbeam-foundation/moonbeam#2912

Closed

skunert mentioned this pull request Aug 30, 2024

chainHead/fix: Report bestBlock events only for newBlock reports #5527

Merged

lexnv mentioned this pull request Sep 11, 2024

chainHead: Clarify reported order of pruned blocks paritytech/json-rpc-interface-spec#143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change forks pruning algorithm. #3962

Change forks pruning algorithm. #3962

shamil-gadelshin commented Apr 3, 2024

Polkadot-Forum commented Apr 3, 2024

bkchr left a comment

shamil-gadelshin commented Apr 4, 2024

Polkadot-Forum commented Apr 4, 2024

bkchr commented Apr 4, 2024

arkpar commented Apr 5, 2024

shamil-gadelshin commented Apr 5, 2024

shamil-gadelshin commented Apr 5, 2024

shamil-gadelshin commented Apr 9, 2024

shamil-gadelshin commented Apr 16, 2024

bkchr commented Apr 16, 2024

bkchr Apr 16, 2024

shamil-gadelshin Apr 17, 2024

bkchr Apr 16, 2024

shamil-gadelshin commented May 10, 2024

bkchr left a comment

bkchr commented May 14, 2024

Change forks pruning algorithm. #3962

Change forks pruning algorithm. #3962

Conversation

shamil-gadelshin commented Apr 3, 2024

Polkadot-Forum commented Apr 3, 2024

bkchr left a comment

Choose a reason for hiding this comment

shamil-gadelshin commented Apr 4, 2024

Polkadot-Forum commented Apr 4, 2024

bkchr commented Apr 4, 2024

arkpar commented Apr 5, 2024

shamil-gadelshin commented Apr 5, 2024

shamil-gadelshin commented Apr 5, 2024

shamil-gadelshin commented Apr 9, 2024

shamil-gadelshin commented Apr 16, 2024

bkchr commented Apr 16, 2024

bkchr Apr 16, 2024

Choose a reason for hiding this comment

shamil-gadelshin Apr 17, 2024

Choose a reason for hiding this comment

bkchr Apr 16, 2024

Choose a reason for hiding this comment

shamil-gadelshin commented May 10, 2024

bkchr left a comment

Choose a reason for hiding this comment

bkchr commented May 14, 2024