Skip to content

Resource packs and quotas

Fred G edited this page Sep 9, 2024 · 2 revisions

Our cluster-based infrastructure offers us the ability to prevent one of the major drawback of the old infrastructure: resource starvation. On the old infrastructure, one project could eat up all the resources of the machine and starve others, resulting in unpredictable build times, machine instabilities and unfairness. As such, we've defined resource quotas on the clustered infrastructure so that all projects get a fair share of resources.

What's a resource pack?

A resource pack is the indivisible base unit of compute cycles and memory (vCPU/RAM) that we allocate to projects for build jobs. All of them combined makes a pool of resources (its quota) available to a project to run builds at any given time. More information about how many resource packs a project can get, can be found on the CBI wiki page.

What about running build jobs concurrently?

First, a bit of Jenkins terminology (see Jenkins Glossary for more details):

  • The master is the central, coordinating process which stores configuration, loads plugins, and renders the various user interfaces for Jenkins.
  • An agent (formerly called slave) is typically a (virtual) machine, or container, which connects to a Jenkins master and executes tasks when directed by the master. There are 2 kinds of agents: dynamic and static. Dynamic agents are created on-demand when a build job require one. All agents on the clustered infrastructure are dynamic. Both dynamic and static agents can exist on the same master.
  • An executor is a slot for executing a build job defined by a pipeline or a job on an agent. An agent may have zero or more executors configured, which corresponds to how many concurrent projects or pipelines are able to execute on that agent. All dynamic agents in the cluster have a single executor. Note that Jenkins masters can also have executors, but this is considered a bad practice for a long time, and masters have no executors in the clustered infrastructure.

By default, all Eclipse projects run with a master able to dynamically create 2 agents in the cluster at the same time. Each of them of a single executor, meaning that projects can run 2 jobs at the same time. The resources required by both agents must be less or equal to the resource quota assigned to the project.

What is the relationship between resource quota and concurrency level?

We set the concurrency level on the cluster (i.e. the number of dynamic agents that can exist simultaneously in the cluster) to a number that depends on the resource quota your project get. We set it to a value equals to the number of vCPU you get. Furthermore, we don't think it's desirable to run a build job with less than 1 vCPU.

How do I decide how many resources a build job will get to run?

For freestyle jobs, you can't configure it. You must stick to what we've configured: freestyle jobs get 1vCPU (burst to 2vCPU) and 4GB of RAM. It is aligned with the default concurrency level we set for all (non-sponsored) projects and the resources they get.

If projects want to customize the resources for a build job, projects need to use a Jenkins pipeline. See the What is killing my build? I'm using custom containers? section to learn how to do that.

What does CPU burst means?

When a build is scheduled, a new Jenkins agent is dynamically created in the cluster. The agent is scheduled on a physical node by Kubernetes (See Kubernetes compute resource management documentation for details). In general, Kubernetes tries to allocate agents on the least busy node. Once an agent is created on a node, it won't move until the end of the build. During the build, if there are some spare CPU cycles (i.e. cycles that have not been reserved by others) on the node, the agent will get more CPU up to the burst limit. So, while projects don't compete with each other for the requested vCPU, they compete for the CPU burst. Note that the burst should be shared fairly between projects. Globally, it means that the availability of the upper limit of the burst mode depends on the global load of the cluster.

What's the priority: the resource quota or the concurrency limit?

The resource quota is the limiting factor. Concurrency limit is an upper bound. Let's take a project that has 2 resource packs (1 "free" + 1 "sponsored") as an example. It currently means that it has 4vCPU and 16GB or RAM to run its builds and has a concurrency limit of 4. It can configure its build jobs in several ways:

  • configure them all to use 2vCPU and 8GB of RAM. It means that only 2 of them can run at the same time.
  • configure them all to use 1vCPU and 4GB of RAM. Then 4 jobs can run concurrently.
  • configure them all but one to use 1vCPU/4GB RAM, the last one being a resource hog configured with 3vCPU and 12GB RAM. When small jobs are running, 4 of them run can concurrently, but when the resource hog runs, there is only enough resources left for 1 smaller build job.
  • configure them all to use 0.5vCPU and 2GB of RAM. Still, only 4 jobs can run simultaneously because the concurrency limit is 4.

Note that burst resources (a.k.a. CPU limit in Kubernetes words) has to be treated the same way. It needs to be shared between jobs.

Can I connect a static agent to the Jenkins instance of my project? Does it count for the concurrency limit?

Yes, projects can add as many external static agents as they want. Agents of this kind need to have a SSH port open to the internet and be signed off by the security team. Also, it does not count for the concurrency limit (as this is specific to the dynamically provisioned agents in the cluster).

Do you do overbooking?

Of course, we do. CI jobs are something that fit very well with overbooking: not everybody need all their resources 100% of the time. It means that when the cluster runs at capacity, no new jobs can be scheduled and jobs are being queued until its requested resources are freed by other(s) job(s). Note that our goal is to size the cluster so that it can handle peak times though. So the wait time in the queue should be minimal.

Is there a limit for how large an agent can be?

Yes. You cannot allocate more than 8vCPU and 16GiB of RAM to a single agent/container. If you do so, the agent will never start. If you need larger agents, please open a HelpDesk issue and we can discuss the alternatives.