Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add APIs for block pruning manually #1570

Open
nazar-pc opened this issue Sep 14, 2023 · 15 comments
Open

Add APIs for block pruning manually #1570

nazar-pc opened this issue Sep 14, 2023 · 15 comments
Labels
T0-node This PR/Issue is related to the topic “node”.

Comments

@nazar-pc
Copy link
Contributor

nazar-pc commented Sep 14, 2023

As discussed in paritytech/substrate#14758 there are currently APIs to finalize blocks (Finalizer::apply_finality() and Finalizer::finalize_block()) and blocks pruning is at a fixed offset from that, but we have a use case in Subspace where we want to have that offset dynamic.

Essentially we need an API to prune blocks manually and decouple it from block finalization (finalized blocks will be much newer than pruned blocks). The same for state.

While at it having API to prune headers and --headers-pruning CLI argument would have been both nice to have as well since currently they are not pruned and not allow for node to occupy bounded amount of space.

@kianenigma
Copy link
Contributor

This would be great. I think it is a good UX improvement to allow full nodes to run with all 3 pruning types set, and it should only occupy a more or less constant amount of space on disk.

@kianenigma kianenigma added the T0-node This PR/Issue is related to the topic “node”. label Sep 20, 2023
@nazar-pc
Copy link
Contributor Author

We have a downstream issue at Subspace (autonomys/subspace#2114) and might be able to sponsor work on this if someone is interested.

@kayabaNerve
Copy link

I'm specifically interested in an API where clients can issue over RPC a 'safe-to-prune' command to advance which blocks are pruned. This would allow reclaiming disk space after indexing, yet guarantees the ability to perform indexing.

If this issue is to be focused on header-pruning or a time-based prune (compared to current block count based prune), I'd be happy to open a new issue, yet this sounds close enough it may be best to simply tag in here.

(I hadn't prior opened an issue as it seemed easy enough I'd just write an impl myself at one point, yet I'm commenting here now as this sounds like a prerequisite I assumed would already be sufficiently available, but by this issue's existence, isn't.)

@nazar-pc
Copy link
Contributor Author

FYI @shamil-gadelshin on Subspace team will be working on this once he is back after holidays.

@shamil-gadelshin
Copy link
Contributor

My current plan is to introduce a CLI flag similar to state and block pruning. I'm going to remove data from db, in-memory header and header metadata caches. The current fork calculation and pruning depend on the block headers for already pruned blocks so I'm going to add a temporary in-memory cache for "block headers marked for pruning" and prune them with some delay.

@nazar-pc
Copy link
Contributor Author

The current fork calculation and pruning depend on the block headers for already pruned blocks so I'm going to add a temporary in-memory cache for "block headers marked for pruning" and prune them with some delay.

What if node restarts?

@shamil-gadelshin
Copy link
Contributor

The current fork calculation and pruning depend on the block headers for already pruned blocks so I'm going to add a temporary in-memory cache for "block headers marked for pruning" and prune them with some delay.

What if node restarts?

There are several non-exclusive options here:

  • Consider "space leak" as acceptable because of the potentially small size and rare case. We'll likely have those anyway because of some edge cases: I found a reference in the code to a warp sync as an explanation for a potential missing block header.
  • Introduce a CLI command to clean dangling blocks and block headers from DB.
  • Support persistence of the "marked for pruning headers" set.

@bkchr
Copy link
Member

bkchr commented Jan 15, 2024

FYI @shamil-gadelshin on Subspace team will be working on this once he is back after holidays.

Which code are you talking about?

  • Consider "space leak" as acceptable because of the potentially small size and rare case. We'll likely have those anyway because of some edge cases: I found a reference in the code to a warp sync as an explanation for a potential missing block header.

  • Introduce a CLI command to clean dangling blocks and block headers from DB.

Both of these options are a no go.

@nazar-pc
Copy link
Contributor Author

Which code are you talking about?

Shamil starts with implementation of --headers-pruning CLI option as described above, which will be followed by programmatic API for pruning blocks independently from finalization as described above. Current fixed offset doesn't work for us, we need more precise control over what is pruned, which includes pruning of headers as well.

@bkchr
Copy link
Member

bkchr commented Jan 15, 2024

I mean I know what the issue is about :P I meant explicitly the stuff about determining the fork and requiring pruned blocks.

@nazar-pc
Copy link
Contributor Author

/// Although it would be more technically correct to also prune out leaves at the
/// same number as the finalized block, but with different hashes, the current behavior
/// is simpler and our assumptions about how finalization works means that those leaves
/// will be pruned soon afterwards anyway.

@bkchr
Copy link
Member

bkchr commented Jan 15, 2024

Ahh this code :P Yeah, I think we should either not delete/prune anything in this 1 block behind the last finalized block. Or we remove this requirement. But coming up with some new data structure to keep these headers around sounds weird to me.

@shamil-gadelshin
Copy link
Contributor

Guys, please, have a look when you have time: #3033
@bkchr @kianenigma

@shamil-gadelshin
Copy link
Contributor

A friendly ping: #3033

@bkchr @kianenigma

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/block-header-pruning/7198/1

github-merge-queue bot pushed a commit that referenced this issue May 15, 2024
This PR changes the fork calculation and pruning algorithm to enable
future block header pruning. It's required because the previous
algorithm relied on the block header persistence. It follows the
[related
discussion](#1570)

The previous code contained this comment describing the situation:
```
	/// Note a block height finalized, displacing all leaves with number less than the finalized
	/// block's.
	///
	/// Although it would be more technically correct to also prune out leaves at the
	/// same number as the finalized block, but with different hashes, the current behavior
	/// is simpler and our assumptions about how finalization works means that those leaves
	/// will be pruned soon afterwards anyway.
	pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> {
```

The previous algorithm relied on the existing block headers to prune
forks later and to enable block header pruning we need to clear all
obsolete forks right after the block finalization to not depend on the
related block headers in the future.

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
hitchhooker pushed a commit to ibp-network/polkadot-sdk that referenced this issue Jun 5, 2024
This PR changes the fork calculation and pruning algorithm to enable
future block header pruning. It's required because the previous
algorithm relied on the block header persistence. It follows the
[related
discussion](paritytech#1570)

The previous code contained this comment describing the situation:
```
	/// Note a block height finalized, displacing all leaves with number less than the finalized
	/// block's.
	///
	/// Although it would be more technically correct to also prune out leaves at the
	/// same number as the finalized block, but with different hashes, the current behavior
	/// is simpler and our assumptions about how finalization works means that those leaves
	/// will be pruned soon afterwards anyway.
	pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> {
```

The previous algorithm relied on the existing block headers to prune
forks later and to enable block header pruning we need to clear all
obsolete forks right after the block finalization to not depend on the
related block headers in the future.

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
liuchengxu pushed a commit to liuchengxu/polkadot-sdk that referenced this issue Jun 19, 2024
This PR changes the fork calculation and pruning algorithm to enable
future block header pruning. It's required because the previous
algorithm relied on the block header persistence. It follows the
[related
discussion](paritytech#1570)

The previous code contained this comment describing the situation:
```
	/// Note a block height finalized, displacing all leaves with number less than the finalized
	/// block's.
	///
	/// Although it would be more technically correct to also prune out leaves at the
	/// same number as the finalized block, but with different hashes, the current behavior
	/// is simpler and our assumptions about how finalization works means that those leaves
	/// will be pruned soon afterwards anyway.
	pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> {
```

The previous algorithm relied on the existing block headers to prune
forks later and to enable block header pruning we need to clear all
obsolete forks right after the block finalization to not depend on the
related block headers in the future.

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this issue Aug 2, 2024
This PR changes the fork calculation and pruning algorithm to enable
future block header pruning. It's required because the previous
algorithm relied on the block header persistence. It follows the
[related
discussion](paritytech#1570)

The previous code contained this comment describing the situation:
```
	/// Note a block height finalized, displacing all leaves with number less than the finalized
	/// block's.
	///
	/// Although it would be more technically correct to also prune out leaves at the
	/// same number as the finalized block, but with different hashes, the current behavior
	/// is simpler and our assumptions about how finalization works means that those leaves
	/// will be pruned soon afterwards anyway.
	pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> {
```

The previous algorithm relied on the existing block headers to prune
forks later and to enable block header pruning we need to clear all
obsolete forks right after the block finalization to not depend on the
related block headers in the future.

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

No branches or pull requests

6 participants