-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve caching strategy across the board of CI workflow #45289
Conversation
349e55b
to
b3fc69b
Compare
Looks like the stash action does not work exactly as advertised :) |
think we need to set env variable name |
ah head_name is derived from github ref. |
Yeah - it seems that the action likely gets into a race condition, or does not work as advertised: From: https://github.com/apache/infrastructure-actions/tree/main/stash#usage
It seems that we have error 409 conflict not handled (I tried with both override 'false' - there it fails obviously) and default
CC: @assignUser - is the right guess ? |
If that's the right guess, then we might try to handle it somehow - since we don't care which uploaded artifact will be uploaded - we can actually even ignore 409 when it happens - without retrying it because it means that someone else managed to upload the artifact in parallel and they "won". |
cc: @assignUser - seems we are stressing your action to the limit :) |
Wow that's wild, haven't seen that before :D But that's an issue with the artifact backend, nothing really I can change in the action itself. |
Let's see... PRs might be coming :) |
FYI. @assignUser and @gopidesupavan -> seems that this is a well known "feature" of the This is even explained here: https://github.com/actions/download-artifact/blob/main/docs/MIGRATION.md#multiple-uploads-to-the-same-named-artifact as solution. Example solution: actions/upload-artifact#478 (comment) But stash does not use the download-artitact action and "merge-multiple" ... And I really do not like the "solution". .. So we wil have to come up with a different approach. |
b3fc69b
to
c7041f9
Compare
OK. @assignUser and @gopidesupavan -> I think I found a solution (and actually this is a better one in general for performance, but slightly more "distributed" among the .yml files. Instead of heaving a clear save/restore around installation, I only do:
And I make sure to have one separate job that is prerequisite of all other jobs ( This way we get a little longer bootstrap, but then all the other jobs should use the cache uploaded by the prerequisite job. Plus the bootstrap will also use the artifact from previous runs (or target branch) if corresponding pyproject.toml / pre-commit config files did not change. |
c7041f9
to
d2bafc3
Compare
OK. I think i Implemented all workarounds to not have to wait for any of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice these workarounds LGTM :)
There is one more error with tar ... Pre-commit is currently stored with But K8S is already hugely impactful -> instaling the venv + all the tools is ~ 1 minute. Savingit is 30 seconds, restoring 10 seconds - since we have many k8S jobs, this will be huge improvement |
10f19da
to
ed286c5
Compare
Nice. 1m35 s -> 30s for pre-commit environment. |
bd210d4
to
1aa9d07
Compare
We are using various caches in our build and so far - due to the way how "standard" caching works, PRs from forks could not effectively use the cache from main Airflow repository - because caches are not shared with other repositories - so the PRs builds could only use cache effectively when they were rebased and continued running from the same fork. This PR improves caching strategy using "stash" action from the ASF. Unlike `cache` - the action uses artifacts to store cache, and that makes it possible for the stash action to use such cache uploaded from `main` canary builds in PRs coming from the fork. As part of this change all the places where setup-python was used and breeze installed afterwards were reviewed and updated to use only breeze installation action (it already installs python) and this action has been improved to use UV caching effectively. Overall this PR should decrease setup overhead for many jobs across the CI workflow. Follow-up after apache#45266
1aa9d07
to
740f63d
Compare
We are using various caches in our build and so far - due to the way how "standard" caching works, PRs from forks could not effectively use the cache from main Airflow repository - because caches are not shared with other repositories - so the PRs builds could only use cache effectively when they were rebased and continued running from the same fork. This PR improves caching strategy using "stash" action from the ASF. Unlike `cache` - the action uses artifacts to store cache, and that makes it possible for the stash action to use such cache uploaded from `main` canary builds in PRs coming from the fork. As part of this change all the places where setup-python was used and breeze installed afterwards were reviewed and updated to use only breeze installation action (it already installs python) and this action has been improved to use UV caching effectively. Overall this PR should decrease setup overhead for many jobs across the CI workflow. Follow-up after apache#45266
We are using various caches in our build and so far - due to the way how "standard" caching works, PRs from forks could not effectively use the cache from main Airflow repository - because caches are not shared with other repositories - so the PRs builds could only use cache effectively when they were rebased and continued running from the same fork. This PR improves caching strategy using "stash" action from the ASF. Unlike `cache` - the action uses artifacts to store cache, and that makes it possible for the stash action to use such cache uploaded from `main` canary builds in PRs coming from the fork. As part of this change all the places where setup-python was used and breeze installed afterwards were reviewed and updated to use only breeze installation action (it already installs python) and this action has been improved to use UV caching effectively. Overall this PR should decrease setup overhead for many jobs across the CI workflow. Follow-up after apache#45266
We are using various caches in our build and so far - due to the way how "standard" caching works, PRs from forks could not effectively use the cache from main Airflow repository - because caches are not shared with other repositories - so the PRs builds could only use cache effectively when they were rebased and continued running from the same fork.
This PR improves caching strategy using "stash" action from the ASF. Unlike
cache
- the action uses artifacts to store cache, and that makes it possible for the stash action to use such cache uploaded frommain
canary builds in PRs coming from the fork.As part of this change all the places where setup-python was used and breeze installed afterwards were reviewed and updated to use only breeze installation action (it already installs python) and this action has been improved to use UV caching effectively.
Overall this PR should decrease setup overhead for many jobs across the CI workflow.
Follow-up after #45266
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.