Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess Artifactory bandwidth reduction options #3599

Closed
11 tasks
MarkEWaite opened this issue May 26, 2023 · 45 comments
Closed
11 tasks

Assess Artifactory bandwidth reduction options #3599

MarkEWaite opened this issue May 26, 2023 · 45 comments
Assignees

Comments

@MarkEWaite
Copy link

MarkEWaite commented May 26, 2023

Service(s)

Artifactory

Summary

JFrog has asked us to reduce the outbound bandwidth used by https://repo.jenkins-ci.org . One of the ideas being explored is to make several of the repository mirrors private. We need to test that by announcing and executing a series of time limited tests that temporarily make the repository mirrors private and assess the impact on Jenkins developers.

The proposed sequence of repositories to make private include:

  • jgit-cache
  • npm-cache
  • maven-repo1
  • jcenter-cache
  • nodejs-dist-cache
  • maven-repo1-cache

The repositories in the list include a mix of large and small repositories with some that are known to be used for Jenkins development and others that are not as clear that they are used for Jenkins development.

Implementation plan

Announce the series of functionality reduction tests with each lasting for a relatively brief period (1 hour). Announce in

  • developers list
  • one or more chat channels

During the functionality reduction tests, we will specifically assess impact on

  • ci.jenkins.io jobs
  • infra.ci.jenkins.io jobs
  • developer environments
@MarkEWaite MarkEWaite added the triage Incoming issues that need review label May 26, 2023
@MarkEWaite MarkEWaite changed the title Announce and run a series of artifactory brownouts to assess bandwidth reduction options Assess Artifactory bandwidth reduction options May 26, 2023
@timja
Copy link
Member

timja commented May 26, 2023

Nodejs dist can’t be made private as it will break builds as far as I know.

npm is likely the same depending on if it’s caching the npm binary or dependencies, (binary is the one that would cause problems)

@dduportal
Copy link
Contributor

Nodejs dist can’t be made private as it will break builds as far as I know.

npm is likely the same depending on if it’s caching the npm binary or dependencies, (binary is the one that would cause problems)

What are the usages of NPM/NodeJS for artifactory? It's not really known by infra team

@timja
Copy link
Member

timja commented May 26, 2023

Every CI build for a plugin that uses node / npm and core will download it from the mirror:
https://github.com/jenkinsci/plugin-pom/blob/d749834a565493e17df4598e4d394749ff51dd00/pom.xml#L131-L132

@dduportal
Copy link
Contributor

Every CI build for a plugin that uses node / npm and core will download it from the mirror: https://github.com/jenkinsci/plugin-pom/blob/d749834a565493e17df4598e4d394749ff51dd00/pom.xml#L131-L132

Thanks for the explanation @timja !

As pointed by @lemeurherve, we should check to put this usage under ACP, like for Maven, to decrease the amount of data downloaded from JFrog

@dduportal dduportal added this to the infra-team-sync-2023-06-06 milestone May 30, 2023
@dduportal dduportal removed the triage Incoming issues that need review label May 30, 2023
@dduportal
Copy link
Contributor

Timing proposal discussed during the weekly meeting:

  • jgit brownout: Friday 2 June
  • maven-repo1 brownout: 5 or 6 June

@dduportal
Copy link
Contributor

Update: moved the brownout tests.

Proposal (ping @MarkEWaite @smerle33 @lemeurherve for voting +1 or -1 to this message)

  • jgit Thursday 8 June at 12h30 UTC (@MarkEWaite we can do it as we have a 1:1 at this time)
  • maven-repo1 Monday 12 June 12h30 if (and only if) the jgit test is successful.

If we got majority of vote, I'll open a status page + will send an email to the developers

@smerle33
Copy link
Contributor

smerle33 commented Jun 7, 2023

+1 if needed as a voting and not an emoji

@dduportal
Copy link
Contributor

Thanks folks! I need review (and approval if ok) on jenkins-infra/status#310 then, before I send the email

@MarkEWaite
Copy link
Author

+1 from me

@dduportal
Copy link
Contributor

email thread for jgit brownout: https://groups.google.com/g/jenkinsci-dev

@dduportal
Copy link
Contributor

Update:

@dduportal
Copy link
Contributor

Update:

@dduportal
Copy link
Contributor

dduportal commented Sep 1, 2023

Update: proposed timeline for the next brownout jenkins-infra/status#370

@dduportal
Copy link
Contributor

Brownout started:

  • Removed the 2 *repo1* mirrors from repo.jenkins-ci.org/public
  • Cleaned up the cache of the 6 ACPs replicas (2 per clouds) and restarted processes to avoid serving in-memory cache
  • Announced in usual Im channels
  • status.jenkins.io updated
  • added a messae on ci.jenkins.io

Next steps: testing builds

@dduportal
Copy link
Contributor

dduportal commented Sep 4, 2023

Brownout is finished: closed status.jenkins.io and sent an email on the mailing list.

TL;DR; results are really good, we only have one last build issue in jenkinsci/maven-hpi-plugin#529 (comment) but not blocking the planning as it is only a matter of Integration tests and settings.xml

(edit)
Detailled reports of what was done and tested during the brownout:

  • Artifactory: Removed repo1 and maven-repo1 mirror repositories from the public virtual repository

  • "ACP" (Artifact Caching Proxy) inside the Jenkins infrastructure: cleaned up the cache for each replica

    • 6 replicas: 2 per cloud provider (eks-public in AWS EKS, doks-public in Digital Ocean and publick8s in AKS)
    • On each pod, ran the rm -rf /data/nginx-cache/* then delete the pod to force the nginx process to be restarted
  • Tested the following jobs on ci.jenkins.io:

    • pipeline-library master branch

      • No error or warnings not already present on the previous builds
    • BOM master branch

      • No error or warnings not already present on the previous builds
    • ATH master branch

      • No error or warnings not already present on the previous builds
    • Plugin master branch (on jenkins-infra-test-plugin)

      • No error, no warnings
    • Plugin PR (on nexus-platform-plugin)

      • Saw the following warning but it does not seem to change the build or test behavior:
      [WARNING] Error resolving project artifact: The following artifacts could not be resolved: com.sun:tools:pom:1.8.0 (absent): Could not find artifact com.sun:tools:pom:1.8.0 in azure-proxy (https://repo.azure.jenkins.io/public/) for project com.sun:tools:jar:1.8.0
      
    • maven-hpi-plugin (the most trickier). As per Bump org.jenkins-ci:jenkins from 1.104 to 1.105 jenkinsci/maven-hpi-plugin#529, the integration tests are broken since feat(ci.jenkins.io) add a custom Maven profile in ACP settings to handle plugin repositories jenkins-infra#3041, but no new error/warning due to the brownout actions

  • Tested the following jobs on trusted.ci.jenkins.io:

    • update center
    • RPU
    • javadoc
    • crawler
    • jenkins.io
    • core-tags-lib-generator
  • Artifactory: Added back repo1 and maven-repo1 mirror repositories in the public virtual repository

  • "ACP" (Artifact Caching Proxy) inside the Jenkins infrastructure: cleaned up the cache for each replica

@dduportal
Copy link
Contributor

dduportal commented Sep 6, 2023

Update: next steps:

  • As discussed during the 2023-09-05 weekly infra team meeting there is no objection or blocker against definitively removing the Apache Central repositories (e.g. persisting the brownout setting)
  • JFrog confirmed that it looks good to them
  • Proposed deadline: Thursday September 7 2023 (around 08:00am UTC)
  • ⚠️ The result on the effective outbound bandwidth might not be immediate as we'll have to clean the ACP cache, which will give a burst in download until is it repopulated (usually takes 5 to 7 days)
  • Communication to end users:
  • Operation should be:
    • Artifactory: Remove repo1 and maven-repo1 mirror repositories from the public virtual repository
    • "ACP" (Artifact Caching Proxy) inside the Jenkins infrastructure: cleaned up the cache for each replica
      • 6 replicas: 2 per cloud provider (eks-public in AWS EKS, doks-public in Digital Ocean and publick8s in AKS)
      • On each pod, ran the rm -rf /data/nginx-cache/* then delete the pod to force the nginx process to be restarted
    • Set both repo1 and maven-repo1 as private mirrors

@dduportal
Copy link
Contributor

Work on jenkinsci/maven-hpi-plugin#537 is finished and merged thanks to the help of @basil .

It is a fix to allow running Integrations Tests by opt-ing out of ACP for this project (only).

Background work is needed for a clear fix (issue to track this in jenkinsci/maven-hpi-plugin#541).

Next step to close this issue: let's wait from @MarkEWaite 's analysis of the logs provided by JFrog and confirm with them that the new bandwitdh usage is fine for them.

@MarkEWaite
Copy link
Author

Done. Log file format changed and we've decided to not spend the effort to adapt to the changed log file format. Thanks to all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants