Scaling Renovate Bot on self-hosted GitLab #13172
Replies: 5 comments 11 replies
-
We've literally just solved this issue ourselves. What we did was run a python script to grab all the repo's via the gitlab API and then spawn a child pipeline that ran an instance of renovate against each repo individually. This has taken our runtime from >24 hrs for 12k repos to ~4hrs. |
Beta Was this translation helpful? Give feedback.
-
My approach is to have 2 jobs: .template:
variables:
RENOVATE_CONFIG_FILE: config.js
image: renovate/renovate:32.0.3@sha256:6ee56f7ff58fd515e5c521c57f85284a96a32b2789335d6241307af7625a8b64
renovate-discover:
extends: .template
stage: discover
script:
- renovate --base-dir .renovate --write-discovered-repos=renovate-repos.json
artifacts:
paths:
- renovate-repos.json
only:
- schedules
renovate-run:
parallel: 10
extends: .template
stage: run
services:
- name: docker:20.10.12-dind@sha256:6f2ae4a5fd85ccf85cdd829057a34ace894d25d544e5e4d9f2e7109297fedf8d
alias: docker
variables:
DOCKER_HOST: "tcp://docker:2375"
DOCKER_TLS_CERTDIR: ""
script:
- renovate --base-dir .renovate
dependencies:
- renovate-discover and at the bottom of const fs = require('fs');
if (fs.existsSync("renovate-repos.json")) {
if(! "CI_NODE_INDEX" in process.env || ! "CI_NODE_TOTAL" in process.env) {
console.log("renovate-repos.json exists, but CI_NODE_INDEX and CI_NODE_TOTAL are not set. See https://docs.gitlab.com/ee/ci/yaml/#parallel");
process.exit(1);
}
const segmentNumber = Number(process.env.CI_NODE_INDEX); // CI_NODE_INDEX is 1 indexed
const segmentTotal = Number(process.env.CI_NODE_TOTAL);
allRepositories = JSON.parse(fs.readFileSync("renovate-repos.json"));
allSize = allRepositories.length;
chunkSize = parseInt(allSize / segmentTotal);
chunkStartIndex = chunkSize * ( segmentNumber - 1 );
chunkEndIndex = chunkSize * segmentNumber;
if(chunkEndIndex > allSize) {
chunkEndIndex = allSize;
}
const segmentNumber = Number(process.env.CI_NODE_INDEX); // CI_NODE_INDEX is 1 indexed
const segmentTotal = Number(process.env.CI_NODE_TOTAL);
const allRepositories = JSON.parse(fs.readFileSync("renovate-repos.json"));
const repositories = allRepositories.filter((_,i)=> (segmentNumber - 1) === (i % segmentTotal))
module.exports.repositories = repositories;
module.exports.autodiscover = false;
console.log(`renovate-repos.json contains ${allRepositories.length} repositories. This is chunk number ${segmentNumber} of ${segmentTotal} total chunks. Processing ${repositories.length} repositories.`);
} else {
module.exports.autodiscover = true;
} |
Beta Was this translation helpful? Give feedback.
-
This discussion was super valuable for us to work around the extremely low Bitbucket Cloud API rate limits. We're following a similar approach to that above. Thank you 👏 |
Beta Was this translation helpful? Give feedback.
-
Are you aware of an option to use the |
Beta Was this translation helpful? Give feedback.
-
I found this article quite useful: Optimizing Renovate for GitLab with 500+ Repositories |
Beta Was this translation helpful? Give feedback.
-
Hi,
we are running a self-hosted GitLab instance, and a scheduled GitLab CI job triggers the Renovate bot once per hour. We are using the "autodiscover" mode of Renovate, so that the other users on our GitLab server simply need to invite the bot user to their repo, for the scanning to work.
We are approaching the limit of what the bot can do within one hour, because many repos have invited the bot already. While we could simply change the schedule of the job to simply be every 2 hours (instead of running every hour), this would negatively impact the reaction time.
Is there any scaling mechanism we can use? We are thinking about implementing a self-made "sharding" mechanism, where we run several (hourly) scheduled jobs in parallel, and subdivide the repositories. E.g. CI job no. 1 could run for the first 50% of the repos returned by
writeDiscoveredRepos
, and CI job no. 2 runs for the second 50%.Beta Was this translation helpful? Give feedback.
All reactions