Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0: Marks old storages as dirty and uncleaned in clean_accounts() (backport of #3737) #3747

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Nov 22, 2024

Problem

Copied from #3702

We do not clean up old storages.

More context: when calculating a full accounts hash, we call mark_old_slots_as_dirty() as a way to ensure we do not forget or miss cleaning up really old storages (i.e. ones that are older than an epoch old). But, when we enable skipping rewrites, we don't want to clean up those old storages, as they'll intentionally be treated as ancient append vecs. So inside mark_old_slots_as_dirty() we conditionally mark old slots as dirty. This is based on the value of ancient_append_vec_offset, which should be None unless ancient append vecs are enabled.

Unfortunately, normal running validators, we end up never marking old slots as dirty, because the ancient append vec offset is always Some. And thus we don't clean up old storages.

Summary of Changes

Mark old storages as dirty, and add to the uncleaned roots list in clean_accounts().

We still check if ancient append vecs are enabled, but not with the ancient_append_vec_offset. Instead we look at the skipping rewrites feature gate and the cli arg.

By moving this marking into clean_accounts(), we also decouple it from accounts hash calculation, which is not necessary anymore. This also removes behavioral differences based on if snapshots are enabled or not.

Justification to Backport

Without this fix, nodes may never clean up old account storage files, leading to eventual crashes due to running out of file descriptors/mmaps. There's also the general performance regressions that occur as these old account storage files are unexpectedly kept around forever.

Additional Testing

I started up a node running this PR, and used a snapshot containing over 800k account storage files. The node was quickly able to remove all the old storage files and resume normal behavior.

Here's a graph of the node's count of storages. It starts around 850k and quickly drops to the correct ~432k:
Screenshot 2024-11-22 at 8 13 43 PM


This is an automatic backport of pull request #3737 done by [Mergify](https://mergify.com).

(cherry picked from commit 31742ca)

# Conflicts:
#	accounts-db/src/accounts_db.rs
#	accounts-db/src/accounts_db/tests.rs
#	runtime/src/bank.rs
@mergify mergify bot added the conflicts label Nov 22, 2024
@mergify mergify bot requested a review from a team as a code owner November 22, 2024 18:04
Copy link
Author

mergify bot commented Nov 22, 2024

Cherry-pick of 31742ca has failed:

On branch mergify/bp/v2.0/pr-3737
Your branch is up to date with 'origin/v2.0'.

You are currently cherry-picking commit 31742ca61e.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   accounts-db/src/accounts_db.rs
	deleted by us:   accounts-db/src/accounts_db/tests.rs
	both modified:   runtime/src/bank.rs

no changes added to commit (use "git add" and/or "git commit -a")

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo brooksprumo merged commit f77014d into v2.0 Nov 25, 2024
38 checks passed
@brooksprumo brooksprumo deleted the mergify/bp/v2.0/pr-3737 branch November 25, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants