Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

be nice to apache / avoid the tragedy of commons #39

Closed
jtnord opened this issue Oct 25, 2024 · 18 comments
Closed

be nice to apache / avoid the tragedy of commons #39

jtnord opened this issue Oct 25, 2024 · 18 comments
Labels
bug Something isn't working

Comments

@jtnord
Copy link

jtnord commented Oct 25, 2024

Reproduction steps

The Jenkins security-scan reusable workflow downloads maven for every run.
this will hit a community maintained resource and is not nice.

https://en.wikipedia.org/wiki/Tragedy_of_the_commons

Just as we use our own mirror for maven artifacts we should be nicer to Apache for downloading maven (and presumable eclipse when downloading java)

Expected Results

we do not hit apache servers every time the workflow runs

Actual Results

we hit apache servers every time the workflow runs

Anything else?

this can also cause some issues e.g. -> https://github.com/jenkinsci/oic-auth-plugin/actions/runs/11518246235/job/32064586667

@jtnord jtnord added the bug Something isn't working label Oct 25, 2024
@jtnord
Copy link
Author

jtnord commented Oct 25, 2024

@jglick
Copy link
Collaborator

jglick commented Oct 25, 2024

switch to https://github.com/marketplace/actions/setup-maven

No, see #36.

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows might be the easiest approach.

I wondered if we could download via HTTP, which would be cheaper for the server (since we validate the checksum anyway), but this does not work at least for the main download site (did not check mirrors)

$ curl -IL http://downloads.apache.org/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz
HTTP/1.1 301 Moved Permanently
Date: Fri, 25 Oct 2024 17:48:55 GMT
Server: Apache
Location: https://downloads.apache.org/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 200 OK
Date: Fri, 25 Oct 2024 17:48:55 GMT
Server: Apache
Last-Modified: Sat, 17 Aug 2024 18:44:55 GMT
ETag: "8ae661-61fe578da5420"
Accept-Ranges: bytes
Content-Length: 9102945
Access-Control-Allow-Origin: *
Content-Type: application/x-gzip

@jglick
Copy link
Collaborator

jglick commented Oct 25, 2024

Or we could use e.g. https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.9/apache-maven-3.9.9-bin.zip; not sure about relative scalability. Or does Jenkins Artifactory still mirror Central? If so, we could use that.

@jtnord
Copy link
Author

jtnord commented Oct 25, 2024

No, see #36.

what particularly in there - there is nothing about this that i can see it is all reffering to setup-java?

it mentions setup-java picks up some old maven version but I am taking about setup-*maven* which allows you to specify the maven version and uses the github tool-cache to avoid hammering systems (somehow).

@jtnord
Copy link
Author

jtnord commented Oct 25, 2024

I think I may have also filed this against the wrong repo (although the same issue may well apply) I meant to file it against jenkins-security-scan but it appears that is a cut and paste of this anyway.

@jtnord
Copy link
Author

jtnord commented Oct 25, 2024

No, see #36.

what particularly in there - there is nothing about this that i can see it is all reffering to setup-java?

it mentions setup-java picks up some old maven version but I am taking about setup-maven which allows you to specify the maven version and uses the github tool-cache to avoid hammering systems (somehow).

e.g. (untested)

  release:
    runs-on: ubuntu-latest
    needs: [validate]
    if: needs.validate.outputs.should_release == 'true'
    steps:
      - name: Check out
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          distribution: temurin
          java-version: 17
      - name: Set up Maven
        uses: stCarolas/setup-maven@v5
        with:
          maven-version: 3.9.9

@jglick
Copy link
Collaborator

jglick commented Oct 25, 2024

I am taking about setup-maven which allows you to specify the maven version

#36 (comment)

@olamy
Copy link

olamy commented Oct 27, 2024

Or we could use e.g. https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.9/apache-maven-3.9.9-bin.zip; not sure about relative scalability. Or does Jenkins Artifactory still mirror Central? If so, we could use that.

the scalability of https://repo.maven.apache.org/ is the same as Maven central as it's just a CNAME over Maven Central.
Bear in mind it's the default url for mvnw (Maven Wrapper) and the default url of central in Apache Maven distribution so I would not be scared about scalability.
BTW this what is used by the GHA stCarolas see https://github.com/stCarolas/setup-maven/blob/d6af6abeda15e98926a57b5aa970a96bb37f97d1/src/installer.ts#L36

another solution could to use mvnw instead of mvn (but this will probably more changes in multiple places)
First step running (as GHA nodes include some default mvn):

mvn org.apache.maven.plugins:maven-wrapper-plugin:3.3.2:wrapper -Dmaven=3.9.9

Now it's possible to mvnw.
The wrapper will download the version to ~/.m2/wrapper
BTW it would be good to setup some cache

      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          distribution: temurin
          java-version: 17
          cache: 'maven'

Or manually to even cache the distribution:
(not tested)

- name: Cache local Maven repository
  uses: actions/cache@v4
  with:
    path: ~/.m2/  (or at least ~/.m2/repository that's what done by actions/setup-java when using cache) 
    key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
    restore-keys: |
      ${{ runner.os }}-maven-

@jglick
Copy link
Collaborator

jglick commented Oct 29, 2024

it would be good to setup some cache

Please no, not for a release workflow. We want to be sure everything is built from scratch.

@olamy
Copy link

olamy commented Oct 30, 2024

it would be good to setup some cache

Please no, not for a release workflow. We want to be sure everything is built from scratch.

why?
the cache will contains releases only so it's immutable jars (such maven-compiler-plugin 3.x etc...) . As the hash is based on on pom files every single changes in a pom will invalidate the cache even for those immutable jars.
And this cache will be per gh repo

@jglick
Copy link
Collaborator

jglick commented Oct 30, 2024

the cache will contains releases only

mvn install can pollute that. And if something weird happens with repositories we want to be sure that every release build is doing exactly what it looks like. This is not something we want to risk trying to optimize.

@timja
Copy link
Member

timja commented Nov 17, 2024

Noting jenkinsci/database-plugin#266 (comment) which is somewhat related

@jglick
Copy link
Collaborator

jglick commented Nov 21, 2024

#41

@jtnord
Copy link
Author

jtnord commented Jan 3, 2025

@jglick This is not resolved and should be reopened.

Using dlcdn on every build is still downloading every build.

@timja
Copy link
Member

timja commented Jan 3, 2025

It’ll be resolved in a week or so once GitHub update GitHub actions

@jglick
Copy link
Collaborator

jglick commented Jan 3, 2025

Well, not automatically; we would need to revert the Maven download and switch back to using the default installation (so revert #37 etc.).

In practice I do not think this is a big deal, since we are using a scalable server now. Do we really want to go back to using whatever unknown version of Maven happens to be installed on the GHA runners on a given day? If there is some reason we really cannot use even the Apache CDNs, I would rather download from Jenkins Artifactory or something.

@jtnord
Copy link
Author

jtnord commented Jan 7, 2025

In practice I do not think this is a big deal, since we are using a scalable server now

regardless of the scalability there is costs involved to the third party, someone eventually has to pay to host the CDN.
Even if it is contributed, (like we have credits / infra from certain providers), there is a limit to expectations (see for example the JFrog removal of central from our repo manager)

Also https://www.sonatype.com/blog/maven-central-and-the-tragedy-of-the-commons

FWIW the binaries are available in maven-central which we no longer cache (which is another manifestation of the same issue).
but at least we are only relying on a single external resource.

@jglick
Copy link
Collaborator

jglick commented Jan 7, 2025

Well, alternate ideas are welcome. We could use GHA caches (for Maven install, not the local repo, and keyed by version) for the security scan workflow. I do not think we should use GHA caches for a release workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants