Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a workflow for cache garbage collection #49209

Closed
wants to merge 5 commits into from

Conversation

desrosj
Copy link
Contributor

@desrosj desrosj commented Mar 20, 2023

What?

This adds a GitHub Actions workflow that performs garbage collection on GHA cache keys that are associated with a PR once it's closed out.

Why?

Once a PR is merged and/or closed, the cache keys created within associated workflow runs are useless.

Access restrictions provide cache isolation and security by creating a logical boundary between different branches or tags. Workflow runs can restore caches created in either the current branch or the default branch (usually main). If a workflow run is triggered for a pull request, it can also restore caches created in the base branch, including base branches of forked repositories.

When a cache is created by a workflow run triggered on a pull request, the cache is created for the merge ref (refs/pull/.../merge). Because of this, the cache will have a limited scope and can only be restored by re-runs of the pull request. It cannot be restored by the base branch or other pull requests targeting that base branch. This happens any time a PR changes lock files (which are used to create cache keys in most cases), or when a cache key for the primary branch is evicted or missing.

Each repository is allotted 10 GB worth of cache storage for GHA. Once that is exceeded, GitHub will proceed to evict cache entries until total storage falls below the allowed threshold. As far as I can tell, how GHA decides which cache keys to evict is not clear. By proactively evicting caches that we know are unlikely to be used again, we can ensure more important ones remain available longer.

You can read more about caching within GitHub Actions and how this is managed in the documentation.

@desrosj desrosj self-assigned this Mar 20, 2023
@github-actions
Copy link

github-actions bot commented Mar 20, 2023

Flaky tests detected in c36ac67.
Some tests passed with failed attempts. The failures may not be related to this commit but are still reported for visibility. See the documentation for more information.

🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4490674888
📝 Reported issues:

@desrosj desrosj marked this pull request as ready for review March 21, 2023 22:18
@desrosj desrosj requested review from kevin940726 and dmsnell March 21, 2023 22:29
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sad we need a cleanup task, but glad we're getting one 😄

.github/workflows/cleanup-caches.yml Show resolved Hide resolved
.github/workflows/cleanup-caches.yml Show resolved Hide resolved
@kevin940726
Copy link
Member

Thank you for starting this! However, I'm not sure if we even need this. Based on the doc you linked in the PR:

If you exceed the limit, GitHub will save the new cache but will begin evicting caches until the total size is less than the repository limit.

I think it's basically an LRU cache. As long as the primary cache is used frequently (which is often the latest cache on trunk, the most common branch to be based on), it should rarely be evicted. On the other hand, a cache created by a closed PR will be evicted once it's no longer used by anyone and there's a new cache in town.

What this workflow does is proactively remove potentially unused caches. While this looks harmless, I'm not so sure about the benefits it brings. I think we can still try this, but we might need a way to collect numbers of the impact.

@desrosj
Copy link
Contributor Author

desrosj commented Mar 22, 2023

This is definitely somewhat experimental. But I think it's worth trying.

I reread the documentation page linked in the PR, and hidden within the last section is a more clear answer to how GHA goes about cleaning up cache keys.

Once a repository has reached its maximum cache storage, the cache eviction policy will create space by deleting the oldest caches in the repository.

That means caches created in trunk can be deleted if the key does not change and enough PRs are opened/time passes. My thinking is that if we proactively purge cache keys that we know are no longer needed, keys we know are beneficial have a higher likelihood of sticking around.

With this in mind, we could also add another step that detects when a change to a lock file occurs and deletes any caches with the old file hash in the key. I think that's a little much for this first iteration.

I'm sad we need a cleanup task, but glad we're getting one 😄

I don't necessarily think we need a workflow for this as GitHub does clean up caches on its own. But it's likely by implementing our own cleanup, we can cache longer and more effectively. I do wish they allowed us to configure priority for cleanup, though. Something like "closed PRs first, non-main branch next, etc." instead of just "oldest key first".

I also wanted to add some testing details as I forgot to include those. To test, I waited until there were no workflows running and found a cache key associated with trunk. I deleted it, and re-ran a workflow on this PR that utilized that key. When the workflow finished, a new key was added scoped to this PR. Then I removed the types filter on this PR and pushed. That triggered the workflow and I confirmed the cache was deleted in the repository's GHA Cache management page.

@kevin940726
Copy link
Member

Once a repository has reached its maximum cache storage, the cache eviction policy will create space by deleting the oldest caches in the repository.

I read this as "oldest" meaning the least recently used cache. It's the last updated cache but not the last created cache, I could be wrong though 😅 . IIUC, this means we shouldn't need to worry about trunk's cache as it's most frequently used.

@desrosj
Copy link
Contributor Author

desrosj commented Jan 5, 2024

I'm going to close this out since it's been almost a year. It's possible this may still be worth exploring, but our workflows and the GHA service has changed a bunch, so we'll need to evaluate from the start again.

@desrosj desrosj closed this Jan 5, 2024
@desrosj desrosj deleted the try/gha-cache-garbage-collection branch January 5, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants