-
-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI(macOS): Cache micromamba environment on same week #4342
Conversation
There was caching in the original Mac runner with setup-miniconda, it was somehow lost in the migration to setup-micromamba. I don't see the advantage of re-caching every day, the environment very rarely changes. |
The problem with that is that it never expires, so no patch updates gets through at all. So you think waiting a week would make sense instead? |
Right, the original version was based on "date + hash on macos_dependencies.txt", see: |
Looks like the week number is |
Using the hash too, forces re-caching with changes of macos_dependencies.txt. |
It is already created by the action, it is only the cache key "prefix" (see the note in this section: https://github.com/mamba-org/setup-micromamba?tab=readme-ov-file#caching): Note Note that the specified cache key is only the prefix of the real cache key. |
Co-authored-by: Vaclav Petras <wenzeslaus@gmail.com>
This PR finally enables caching of micromamba environments for CI runs.
It uses the example of in README that adds the date to the cache key prefix. https://github.com/mamba-org/setup-micromamba?tab=readme-ov-file#caching
I’ve been using this on and off (without an extra key prefix) since this winter when doing repetitive CI-only work, since macOS was the fastest to finish running all tests due to the very fast cores on Apple Silicon.
With environment caching, on cache hit the setup-micromamba usually finishes in 18 seconds, instead of 2min+, so compilation starts way before Ubuntu runners even can finish installing through apt.
Why I never did the PR before: I was hoping better caching possibilities from the setup-micromamba action, but it never really improved. The way they offer to customize the cache key doesn’t play that well with partial hits (where we can match other cache keys that start with the same prefix, usually delimited by hyphens, and we fallback to more and more generic caches), and also that when there is a cache hit, the action doesn’t even try to (nor allows to) run the solving environment step, that would check if the environment cached is out of date or not. I would’ve liked that if it is out of date, that it will either use the existing environment and try to upgrade it with the missing pieces, or at worse wipe it and create it again.
As mentioned by others in the issues in their repo (or the previous action it replaced), this means that as long as their is a cache hit (since the environment file didn’t change, nor other config options that would make a different hash), the same environment is used ad vitam æternam, and since it is hit, it won’t be evicted soon.
The problem would be lessened if we used a lock file (like commonly used in the node ecosystem), but keeping the lock file updated is maybe a bit too much of a churn for us right now, and given that last time I checked, Renovate didn’t support updating these lock files.
So, a reasonable balance between simplicity of implementation and how much we tolerate old dependencies is to use the current date. The cache key doesn’t match after one day, so older caches go unused and can get evicted. The oldest outdated dependencies would be one day old. I find it short a bit, but one month (removing the day in the date) is too long, and I find that one week old is getting a bit old too if there’s an issue. Creating a string that is valid for two or three days is getting complicated…
Remember that GitHub Actions cache:
The cache entry for the micromamba environment we currently have is 540 MB, which is on the larger side. It is more difficult to have multiple copies of this 540 MB cache stored without having any impact on our existing usage of caches (one of the reasons I didn’t create this PR earlier).
The « happy path » I expect to happen is that:
In all cases, there won’t be any cache hit when the environment file (requirements file) or options of the action (the
with:
) changes, as the hash used in the key suffix would change.The less happy path would be when PRs gets commits before a run on main happened in the default branch on that day. The PR (or PRs) will each create their own 540 MB cache for that day until a first PR is merged on the default branch that day. When these PRs would be receiving new commits (even after a first merge of the day), they will use the cache created by their branches before falling back to the default branch (it won’t ever get to falling back as we don’t have partial restore keys, and both the branch and default branch cache should have the same key if no requirements change). Imagine 6 PRs before the first merge of the day, that’s 3GB right there, plus .5 GB in the main branch.
if ever this would become problematic, I could implement the solution to cleaning caches on PR merges like in the docs : https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries, or proactively schedule the creation of the cache entry at 00:00-ish UTC, plus deleting the cache of the previous day at the same moment l. But if we don’t need it yet, it would be better.