Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow appears to download all Maven dependencies directly from upstream repositories for the first build for every branch (including PRs) #31

Open
dwnusbaum opened this issue Aug 12, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@dwnusbaum
Copy link

dwnusbaum commented Aug 12, 2024

Reproduction steps

Open a PR for any plugin which uses this workflow and look at the log output.

Expected Results

Dependencies are either cached to avoid repeated downloads or retrieved through a mirror to avoid hitting the upstream Maven repositories

Actual Results

From a quick look, dependencies appear to be downloaded from scratch for the first build of every PR. For example, from the first run in jenkinsci/workflow-cps-plugin#918, the run failed after 15 minutes with this download error:

[2024-08-12 00:24:53] [autobuild] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project workflow-cps: Could not resolve dependencies for project org.jenkins-ci.plugins.workflow:workflow-cps:hpi:999999-SNAPSHOT: Could not transfer artifact org.jenkins-ci.plugins:subversion:jar:tests:1256.vee91953217b_6 from/to repo.jenkins-ci.org (https://repo.jenkins-ci.org/public/): GET request of: org/jenkins-ci/plugins/subversion/1256.vee91953217b_6/subversion-1256.vee91953217b_6-tests.jar from repo.jenkins-ci.org failed

The second run downloaded all of the dependencies again (took around 15 minutes), but then succeeded. The third build did not need to download any dependencies and finished in only 3 minutes. (12 minutes faster!)

Anything else?

My analysis may not be correct, I just wanted to file this in case it is correct and the current behavior is a problem. Originally observed in jenkinsci/workflow-cps-plugin#917 (comment).

I recall some previous efforts to reduce download bandwidth on repo.jenkins-ci.org, so I filed this because it seemed like it might be undesirable that the new action, which has recently been applied to many Jenkins plugins via automatically-filed PRs, is not using either a CI-specific mirror or some kind of pre-primed cache.

It also seems undesirable for the workflow to be flaky due to download failures.

@dwnusbaum dwnusbaum added the bug Something isn't working label Aug 12, 2024
@daniel-beck
Copy link
Contributor

daniel-beck commented Aug 13, 2024

Weird. Per https://github.com/actions/setup-java/?tab=readme-ov-file#caching-packages-dependencies the cache should be branch-independent. Is this a bad interaction with what incrementals does to caches?

https://github.com/jenkinsci/workflow-cps-plugin/actions/runs/10357962690/job/28671108844 appears to be the first JSS workflow for this particular PR and it downloads a cache in "Set up Java".

@dwnusbaum
Copy link
Author

dwnusbaum commented Aug 13, 2024

Hmm yes, maybe there is something more complicated going on. I did try to check a couple of other plugins though before I filed this, for example https://github.com/jenkinsci/workflow-api-plugin/actions/runs/10344669402/job/28630557826?pr=350#step:3:29 and https://github.com/jenkinsci/workflow-job-plugin/actions/runs/10355240770/job/28662407542#step:3:29 both failed to find the Maven cache in the first run for their PRs, even though their master branches had a successful scan before the PR was opened.

https://github.com/jenkinsci/workflow-cps-plugin/actions/runs/10357962690/job/28671108844 might be unusual because the PR was open before the JSS workflow was added to the repo, so the first scan is for a commit I pushed yesterday that also merged with master.

@dwnusbaum
Copy link
Author

Oh I think I understand now. Looking at the docs you linked, it seems that the cache key is based on a hash of all files matching **/pom.xml. So the behavior I observed should only apply to any PR that changes any pom.xml file, e.g. all Dependabot PRs.

This seems less than ideal, but it's not obvious from a quick look if there is a way to improve things, e.g. by using the base branch as a starting cache for PR builds or similar. I guess the current approach approach might be intended to prevent the cache from just growing indefinitely.

@daniel-beck
Copy link
Contributor

Based on https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache cache are shared inside of a repo/network only, based on some hierarchy rules apparently to maximize cache usefulness.

The workflow could expose cache-dependency-path from https://github.com/actions/setup-java/?tab=readme-ov-file#usage, or internally set it to a fixed value or a file known to be rarely updated, thereby effectively forcing cache re-use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants