-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RunnerSet backed by StatefulSet #629
Conversation
50019b1
to
96f0a08
Compare
…r-controller got failed after the mutating webhook has been registered
…DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false
…changes when there were no changes in runnerset spec
@@ -158,20 +164,20 @@ acceptance: release/clean acceptance/pull docker-build release | |||
acceptance/run: acceptance/kind acceptance/load acceptance/setup acceptance/deploy acceptance/tests acceptance/teardown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all the images used in acceptance/load
aren't already on your local machine then this fails. acceptance/pull
needs to be done after acceptance/kind
Makefile
Outdated
kind load docker-image quay.io/jetstack/cert-manager-controller:v1.0.4 --name ${CLUSTER} | ||
kind load docker-image quay.io/jetstack/cert-manager-cainjector:v1.0.4 --name ${CLUSTER} | ||
kind load docker-image quay.io/jetstack/cert-manager-webhook:v1.0.4 --name ${CLUSTER} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth bumping to v1.1.1 in this PR?
v1.1.1 is the last of the v1.X.X series. I run v1.1.1 on EKS and have done across multiple controller versions. v1.1.1 is fairly old at this point but we should consider bumping it but a newer major version outside of this PR so if there are issues (I don't see why there would be tbh) they are dealt with seperately to this work. I can vouch for v1.1.1 so it would be nice to bump to latest of that series in this PR seen as we have done various bumps already.
Perhaps it's worth having CERT_MANAGER_VERSION = v1.1.1
at the top and the version to be deployed pulled from that making it easier to bump next time?
Hi @mumoshu. I am testing the controller using the
I see that https://github.com/actions-runner-controller/actions-runner-controller/blob/8b90b0f0e3a4a254c096f8d9ecd8aeed0ee3c00e/controllers/runnerset_controller.go#L68 is commented out. Is that needed? |
This was my fault. I used an outdated chart (changed controller tag to canary). |
@esvirskiy Wow! Thanks a lot for testing Note that there's a few unimplemented things as explained in the pr description: I'm working for the HRA support at #647. I'll tackle the auto-recovery feature next. Scale-from/to-zero is at the lowest priority and I may skip working on it entirely, because a potential enhancement on the GitHub side can make it unnecessary. |
`HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet. It defaults to `RunnerDeployment` for backward compatibility. ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: myhra spec: scaleTargetRef: kind: RunnerSet name: myrunnerset ``` Ref #629 Ref #613 Ref #612
TL;DR; RunnerSet is a more feature-rich, flexible, easy to configure, and maintainable alternative to RunnerDeployment.
A runnerset can manage a set of "stateful" runners by combining a statefulset and an admission webhook. A statefulset is a standard Kubernetes construct that manages a set of pods and a pool of persistent volumes. We use that to manage runner pods, while using the admission webhook mutates each pod to have required environment variables and registration tokens.
It is considered to be a complete replacement to the former method of deploying a set of runners, RunnerDeployment, which also creates pods with the required environment variables and registration tokens.
Differences between RunnerSet and RunnerDeployment
The only and big functional difference between RunnerSet and RunnerDeployment is that the former has support for
volumeClaimTemplates
, which allows actions-runner-controller to manage a pool of dynamically provisioned persistent volumes. This should be useful to make certain types of actions workflows faster by utilizing per-pod-identity cache, like docker layer caches in/var/lib/docker
persistent across pod restarts.The basic usage of RunnerSet is very similar to that of RunnerDeployment.
This RunnerDeployment:
can be rewritten to:
Also note that, unlike RunnerDeployment, you can write the full StatefulSet spec inside RunnerSet. Configure the pod template however you like, and the runnerset controller reads and tweaks the pod template to create a complete runner pod spec. This makes it unnecessary to add every pod spec fields to runner spec.
How to configure your RunnerSet
You might have written a RunnerDeployment like the below with various tweaks:
In RunnerDeployment API, you have 4 things to declare in 2 places. 1 thing under
spec
and 3 things underspec.template.spec
:replicas
underspec
repository
,organization
,enterprise
,dockerdWithinRunnerContainer
, and so on underspec.template.spec
securityContext
,volumes
underspec.template.spec
resources
,dockerdContainerResources
,image
,dockerImage
, and so on underspec.template.spec
In RunnerSet API, you have 3 things to declare in 3 places:
replicas
,repository
,organization
,enterprise
, and so on underspec
spec.template.spec
spec.template.spec.containers[]
dockerdContainer*
settings in RunnerDeployment goes to thecontainers
entry whosename
isdocker
, for example.2 and 3 might be more familiar to many users and therefore it will be easy to write, as it's a standard
pod template
syntax used widely in Kubernetes Deployment, ReplicaSet, and StatefulSet.That being said, the above example can be rewritten in RunnerSet like the following:
Planned but not yet implemented
The following features are planned but not implemented. Please use RunnerDeployment for now if you need any of them.
HRA support:
The support for HorizontalRunnerAutoscaler is planned but not done yet.
Scale-from/to-zero:
Scale-from/to-zero is planned but not implemented yet.
Auto-recovery runner pods stuck while registering:
Planned but not implemented yet.
Call for help
I've already verified this to work manually using the updated helm chart and my own build of the actions-runner-controller container image. But, as a lot of changes are made to the code-base, I don't think this is tested enough.
If you want this feature to get merged at all, or get merged earlier, please test and report any problems you encounter!
Changelog
Add API types, CRDs,
controllers/runnerset_contorller.go
andcontrollers/pod_runner_token_injector.go
Added
acceptance/testdata/runnerset.yaml
and updatedacceptance/deploy.sh
. Run a deployment withUSE_RUNNERSET=1
to deploy test runners using RunnerSet instead of RunnerDeployment.Manually update
manager_role.yaml
chart template to reflect changes in kustomizeconfig/rbac/role.yaml
Update controller-gen from 0.3.0 to 0.4.1 to avoid errors generating RunnerSet CRD. The side-effect of this is that all the other CRDs upgraded to v1 API
Update controller-gen to 0.6.0 to avoid issues on runnerset.spec.template.metaata being empty
Manually update
webhook_configs.yaml
char template to reflect changes in kustomizeconfig/webhook/manifests.yaml
Upgraded controller-runtime to 0.9.0 to prevent
envtest
failing to install v1 CRDsIssues I have encountered while developing this:
Default value not allowed for in x-kubernetes-list-map-keys kubernetes-sigs/controller-tools#444 (comment)
Single-version generated CRDs cannot be installed kubernetes-sigs/controller-tools#302
Related issues
Resolves #613
Ref #612
Revival of #4