You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I looked into scaling the number of agents based on the load:
Kubernetes scaling in general:
Kubernetes has different ways of scaling pods (and agents in them) up/down. So either we can configure a custom metric or we need a separate process that monitors the queue length and then scales the agents.
The agent process need to react to SIGTERM as indication to shut down. Ideally it finishes the current build and then exits.
Then Kubernetes waits for terminationGracePeriodSeconds before sending a SIGKILL, this can be configured in the deployment.
So we need to set the terminationGracePeriodSeconds to the longest build time on a certain machine (e.g. 1-2 hours).
For Jenkins swarm agents:
They are not reacting to SIGTERM at all.
So we would need to script this: create a preStop hook that: 1) marks the agent as offline on the master so it does not get any new jobs and 2) once the node is not building anything: killall java.
I would avoid the effort here and wait until we're on Buildkite, that makes things much easier.
Buildkite agents should be doing fine. On SIGTERM they stop accepting new jobs and exit once the current job finishes. So we just need to set a good terminationGracePeriodSeconds and we're done.
The Windows agents are currently running in docker containers in normal Windows VMs. For scaling we would have to migrate them to Kubernetes first, see See if we can use Kubernetes for Windows #115.
It would be nice to automatically scale up and down the number of agents based on the length of the build queue.
The text was updated successfully, but these errors were encountered: