Cleanup PVF artifact by cache limit and stale time #4662

AndreiEres · 2024-05-31T14:28:01Z

Part of #4324
We don't change but extend the existing cleanup strategy.

We still don't touch artifacts being stale less than 24h
First time we attempt pruning only when we hit cache limit (10 GB)
If somehow happened that after we hit 10 GB and least used artifact is stale less than 24h we don't remove it.

sandreim · 2024-05-31T15:48:18Z

polkadot/node/core/pvf/src/artifacts.rs

+pub enum CleanupBy {
+	// Inactive time after which artefact is deleted
+	Time(Duration),
+	// Max size in bytes. Reaching it older artefacts are deleted


The comment seems wrong, we delete least used ones. At least for me older usually means in relation to creation time.

s0me0ne-unkn0wn · 2024-05-31T18:42:26Z

polkadot/node/core/pvf/src/artifacts.rs

+
+				for (k, v) in self.inner.iter() {
+					if let ArtifactState::Prepared { ref path, last_time_needed, .. } = *v {
+						if let Ok(metadata) = fs::metadata(path) {


What bothers me here is that we're running a (possibly large) number of synchronous blocking filesystem operations in Artifacts::prune(), which is called from the PVF host's main loop running on a non-blocking threadpool. In case of I/O problems the whole PVF host will stall. I think we should either make use of tokio::fs::metadata(), or, even better, store the artifact's size as a property of the artifact itself along with the other data, and then no filesystem access is required inside prune().

Yeah, added the size to artifact's state

s0me0ne-unkn0wn · 2024-05-31T18:44:17Z

polkadot/node/core/pvf/src/artifacts.rs

+				artifact_sizes.sort_by_key(|&(_, _, _, last_time_needed)| last_time_needed);
+
+				while total_size > *size_limit {
+					let Some((artifact_id, path, size, _)) = artifact_sizes.pop() else { break };
+					to_remove.push((artifact_id, path));
+					total_size -= size;
+				}


A unit test to check the correctness of this behavior would be definitely appreciated :)

Added (and it saved me because the implementation was not correct)

This reverts commit a7e043b.

alexggh · 2024-06-05T15:19:35Z

polkadot/node/core/pvf/src/artifacts.rs

+			if now
+				.duration_since(last_time_needed)
+				.map(|stale_time| stale_time < cleanup_config.min_stale_time)
+				.unwrap_or(false)


Returns an Err if earlier is later than self, and the error contains how far from self the time is.

In case of error we should return true to break here, we don't want to delete things if time move backwards for whatever reason.

good catch!

alexggh

Looks alright to me.

paritytech-cicd-pr · 2024-06-06T08:20:34Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6412873

s0me0ne-unkn0wn

Nice, thank you!

polkadot/node/core/pvf/src/artifacts.rs

s0me0ne-unkn0wn · 2024-06-06T11:40:01Z

polkadot/node/core/pvf/src/artifacts.rs

+			let used_recently = now
+				.duration_since(last_time_needed)
+				.map(|stale_time| stale_time < cleanup_config.min_stale_time)
+				.unwrap_or(true);
+			if used_recently {
+				break;
+			}


Hmmm 🤔 So, the cache may grow unbounded as long as all the artifacts are fresh? I mean, this mimics the current behavior, so no new attack vectors are introduced here for sure, but probably we should review the old ones... I don't call for any change here right now, it's just for discussion.

the cache may grow unbounded as long as all the artifacts are fresh

I believe it growths no more than now because we use same 24h limit

Yes, exactly. I'm just wondering if there's some scenario where someone buys some cheap coretime leftovers and starts pushing tons of PVFs around within 24 hours to overflow validators' disk space. Sounds unlikely, just paranoia, probably.

Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>

polkadot/node/core/pvf/src/artifacts.rs

sandreim · 2024-06-06T18:28:44Z

polkadot/node/core/pvf/src/artifacts.rs

+	/// Remove artifacts older than the given TTL or the total artifacts size limit and return id
+	/// and path of the removed ones.


This doc comment needs fixing: we evict LRU artifacts only if we go over cache limit.

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

* master: (29 commits) Append overlay optimization. (#1223) finalization: Skip tree route calculation if no forks present (#4721) Remove unncessary call remove_from_peers_set (#4742) add pov-recovery unit tests and support for elastic scaling (#4733) approval-voting: Add no shows debug information (#4726) Revamp the Readme of the parachain template (#4713) Update README.md to move the PSVM link under a "Tooling" section under the "Releases" section (#4734) frame/proc-macro: Refactor code for better readability (#4712) Contracts: update wasmi to 0.32 (#3679) Backport style changes from P<>K bridge to R<>W bridge (#4732) New reference doc for Custom RPC V2 (#4654) Frame Pallets: Clean a lot of test setups (#4642) Fix occupied core handling (#4691) statement-distribution: Fix false warning (#4727) Update the README to include a link to the Polkadot SDK Version Manager (#4718) Cleanup PVF artifact by cache limit and stale time (#4662) Update link to a latest polkadot release (#4711) [CI] Delete cargo-deny config (#4677) fix build on MacOS: bump secp256k1 and secp256k1-sys to patched versions (#4709) Unify dependency aliases (#4633) ...

Part of paritytech#4324 We don't change but extend the existing cleanup strategy. - We still don't touch artifacts being stale less than 24h - First time we attempt pruning only when we hit cache limit (10 GB) - If somehow happened that after we hit 10 GB and least used artifact is stale less than 24h we don't remove it. --------- Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com> Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Add ability to remove PVF by size

988b73d

AndreiEres changed the title ~~[WIP] Add ability to remove PVF by size~~ [WIP] Add ability to remove PVF artifact by size May 31, 2024

AndreiEres requested a review from s0me0ne-unkn0wn May 31, 2024 14:30

sandreim reviewed May 31, 2024

View reviewed changes

s0me0ne-unkn0wn reviewed May 31, 2024

View reviewed changes

AndreiEres added 4 commits June 3, 2024 10:23

Add size to prepared artifacts

173f1cf

Read cached size, not from fs

2b8ae5a

Add tests and fix a bug

7f57911

Allow dead code

19380a6

AndreiEres requested a review from s0me0ne-unkn0wn June 3, 2024 10:24

AndreiEres changed the title ~~[WIP] Add ability to remove PVF artifact by size~~ Add ability to remove PVF artifact by size Jun 3, 2024

AndreiEres added 3 commits June 3, 2024 12:30

Remove unused

c8b2cee

Fix imports

9b1f82d

Make clippy happy

c519c51

alexggh self-requested a review June 3, 2024 12:23

AndreiEres added 3 commits June 5, 2024 08:22

Add configuration

a7e043b

Revert "Add configuration"

61d3efb

This reverts commit a7e043b.

Update cleanup strategy

7d0dea3

AndreiEres changed the title ~~Add ability to remove PVF artifact by size~~ Cleanup PVF artifact by cache limit and stale time Jun 5, 2024

Add pr doc

6144189

AndreiEres added T0-node This PR/Issue is related to the topic “node”. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Jun 5, 2024

AndreiEres requested a review from sandreim June 5, 2024 11:35

Update prdoc

f27aced

alexggh reviewed Jun 5, 2024

View reviewed changes

alexggh approved these changes Jun 5, 2024

View reviewed changes

AndreiEres added 2 commits June 5, 2024 18:28

Fix error

23ca3b9

Refactor

f8073b9

fixup

2dda9d9

s0me0ne-unkn0wn approved these changes Jun 6, 2024

View reviewed changes

Update polkadot/node/core/pvf/src/artifacts.rs

09a0a97

Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>

sandreim approved these changes Jun 6, 2024

View reviewed changes

AndreiEres and others added 3 commits June 6, 2024 20:51

Update polkadot/node/core/pvf/src/artifacts.rs

edd0e0f

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Update polkadot/node/core/pvf/src/artifacts.rs

0c09ea8

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Update doc comments

f10c8eb

AndreiEres enabled auto-merge June 6, 2024 18:59

AndreiEres added this pull request to the merge queue Jun 6, 2024

Merged via the queue into master with commit 494448b Jun 6, 2024
156 of 157 checks passed

AndreiEres deleted the AndreiEres/remove-pvf-by-size branch June 6, 2024 19:49

This was referenced Aug 21, 2024

Update polkadot-sdk from v1.11.0 to stable2407 moondance-labs/tanssi#659

Closed

Update polkadot-sdk from v1.11.0 to stable2407 moonbeam-foundation/moonbeam#2912

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup PVF artifact by cache limit and stale time #4662

Cleanup PVF artifact by cache limit and stale time #4662

AndreiEres commented May 31, 2024 •

edited

Loading

sandreim May 31, 2024

AndreiEres Jun 4, 2024

s0me0ne-unkn0wn May 31, 2024

AndreiEres Jun 3, 2024

s0me0ne-unkn0wn May 31, 2024

AndreiEres Jun 3, 2024

alexggh Jun 5, 2024

AndreiEres Jun 5, 2024

alexggh left a comment

paritytech-cicd-pr commented Jun 6, 2024

s0me0ne-unkn0wn left a comment

s0me0ne-unkn0wn Jun 6, 2024

AndreiEres Jun 6, 2024

s0me0ne-unkn0wn Jun 6, 2024

sandreim Jun 6, 2024

		/// Remove artifacts older than the given TTL or the total artifacts size limit and return id
		/// and path of the removed ones.

Cleanup PVF artifact by cache limit and stale time #4662

Cleanup PVF artifact by cache limit and stale time #4662

Conversation

AndreiEres commented May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexggh left a comment

Choose a reason for hiding this comment

paritytech-cicd-pr commented Jun 6, 2024

s0me0ne-unkn0wn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreiEres commented May 31, 2024 •

edited

Loading