Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GitHub Action caching methodology #5448

Merged
merged 4 commits into from
May 2, 2023

Conversation

kamilkrzyskow
Copy link
Collaborator

@kamilkrzyskow kamilkrzyskow commented Apr 30, 2023

Hi, 👋
I've been working on adding social cards to our documentation, and I've encountered a mistake in the advertised caching methodology, then I checked this project's workflow file and it had almost the same mistake present.

- uses: actions/cache@v2
with:
key: ${{ github.ref }}
path: .cache

- name: Set up build cache
uses: actions/cache@v3
id: cache
with:
key: ${{ runner.os }}-${{ hashFiles('.cache/**') }}
path: .cache

As seen in this action run the cache is being restored:
https://github.com/squidfunk/mkdocs-material/actions/runs/4837605087/jobs/8621534809#step:4:21
but it is not being saved:
https://github.com/squidfunk/mkdocs-material/actions/runs/4837605087/jobs/8621534809#step:14:2

It is not being saved, because the cache hit occurred on the primary key. Also it will always be the same empty Linux- key because during the hash check the .cache directory shall never be present.


Simply put, the approaches described above, either constantly create a new cache on each github.refs change or create it once and constantly reuse the same cache files and constantly recreate changes.

The solution to this are restore-keys https://github.com/actions/cache/blob/main/tips-and-workarounds.md#update-a-cache
This approach works based on a cache key prefix, which will be used as a fallback if the primary key isn't found.

      - name: Set up build cache
        uses: actions/cache@v3
        id: cache
        with:
          key: mkdocs-material-${{ github.sha }}
          path: .cache
          restore-keys: |
            mkdocs-material-

I've used the github.sha since it's the commit hash, similar output to the hashFiles function.

Same issue applied to other npm caches in the workflows, but I can also understand wanting to invalidate caches frequently, therefore I didn't change them.

@squidfunk
Copy link
Owner

Thanks for the PR! To clarify:

I've used the github.sha since it's the commit hash, similar output to the hashFiles function.

I've chosen github.ref because I thought it was a stable id per branch, but well, how do you know. Using github.sha means that there's a new cache created for each commit, but by using restore keys, the latest cache is used as a fallback? Doesn't this mean that there will be more and more caches created, one per commit? Additionally, wouldn't we have caches bleeding into other branches?

@kamilkrzyskow
Copy link
Collaborator Author

kamilkrzyskow commented May 2, 2023

I've chosen github.ref because I thought it was a stable id per branch

Maybe I did something incorrectly in our project's branch. The ref would typically be refs/pulls/branch-name with the occasional .../merge suffix. We don't use tags for versioning.
Using github.sha solved the issue of the almost static primary-key.
Maybe I missed some better alternative 🤔. Perhaps date or number of the week.

Doesn't this mean that there will be more and more caches created, one per commit?

True, the "cleanest" approach requires creating the hash based on the files that will actually change. But since the .cache directory stores data from different plugins it's rather hard to setup cache fully correctly. For our project I considered creating a file containing all the file names and page titles, and then hash it, but this would only have merit for the social cards plugin, also such setup isn't really feasible for most people.
According to the docs caches are stored for up to 7 days (unused) or up to 10GB. Therefore I don't consider the always changing nature of github.sha an issue.

https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

Additionally, wouldn't we have caches bleeding into other branches?

According to the docs the caches are bound to their respective branch / scope, and they can only restore caches from their "upper" branches. So in case of this repository a feature branch will only have read access to the master branch and write access to its own caches, and the master branch won't have access to any other cache than its own.

https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache

@squidfunk
Copy link
Owner

Thanks for the quick explainer! Sounds all good to me, let's give it a try ☺️

@squidfunk squidfunk merged commit 3e27ceb into squidfunk:master May 2, 2023
@kamilkrzyskow
Copy link
Collaborator Author

kamilkrzyskow commented May 2, 2023

@squidfunk

It seems I made a mistake reading the docs previously.

According to the docs the github.ref for merges would be refs/pull/<pr_number>/merge
I only tested it with commits previously, so the ref was too static for me 😅. Also there surely are people that only do commits and use a rebase workflow.

The restore-keys are a must it seems, unless I missed some forceSave property, but the github.ref could stay for projects with a PR & Merge workflow.

@squidfunk
Copy link
Owner

Oh, no worries. Could you create a new PR?

@kamilkrzyskow
Copy link
Collaborator Author

Sure, I will do that shortly ~1 hour.
I did some more reading on various options, and unfortunately creating a YYYY-MM-DD or #week_number would require using Linux commands to create a environmental variable.
Only a timestamp could be accessed via github.event.repository.updated_at, but this would be as random as the hash 😑

Considering the amount of people who use the project I do see that adverting github.sha might lead to higher cache storage usage on GitHub overall.
Suddenly that 10GB storage cap isn't so small when I multiply it by 5 million users 😱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants