-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add functionality to provide optional pod template #50
feat: add functionality to provide optional pod template #50
Conversation
Hey @nielstenboom, Thank you for submitting this PR! We are currently looking at how we can use workflow inputs to help with both of these issues. I will leave this PR open as a reference, but we may take a different direction here |
No worries. Internally we temporarily will keep running this fork until you guys made the changes. Looking forward to the fix! 😄 |
Any update on a path forward on this? My use case mainly surrounds being able to set resources on the workflow pod. Right now I don't see an easy way to set a CPU request or bind a GPU to the pod. |
Hey @MichaelHudgins, No updates yet, but it is on our backlog and will definitely be worked on in the near future. |
Hi, |
We run the fork internally as follows:
apiVersion: v1
kind: Pod
metadata:
name: thisvaluewillbeignored
spec:
containers:
# the runner will override the "image", "name" and "command" fields
- image: "test/test"
name: "thisvaluewillbeignored"
command:
- "these"
- "are"
- "overridden"
env:
- name: POETRY_CACHE_DIR
value: "/ci-cache/poetry"
volumeMounts:
- name: ci-cache
mountPath: /ci-cache
volumes:
- name: ci-cache
persistentVolumeClaim:
claimName: ci-cache
Good luck! |
Thanks Niels. I temporarily got around the cpu/mem resource issue another way, by setting a LimitRange on the namespace. The next issue is GKE Autopilot is very conservative about the size and quantity of nodes it launches. The minimum. When a job pod with a lot of cpu/mem is provisioned, what happens? What should happen is that the HPA autoscaler adds more nodes. However, what actually happens is a sudden "OutofCPU" error, and the job fails. |
Maybe in your case it could already work if you set the resources on the runner pods itself since the job pods are forced onto the exact same node (see code here)? |
@nielstenboom very interesting! Rather than distract too much in this pull request, I created a new discussion thread about the outofcpu errors. |
Added some related discussion here: actions/actions-runner-controller#2592 If this doesn't move quick, I'm planning to write a "mutating webhook admission controller" which would intercept the "-workload" pods, and will read the env variables (because there is no way to set labels) and reconfigure the pod spec dynamically before the pod is actually created. So for example, this github actions job definition: jobs:
# This workflow contains a single job called "build"
build:
container:
image: node:14.16
env:
NODE_ENV: development
RUNNER_NODE_SELECTORS: "node.kubernetes.io/instance-type: g4dn.xlarge"
RUNNER_TOLERATIONS: "[{ key: cccis.com/karpenter, operator: Exists, effect: NoSchedule}]"
RUNNER_LABELS: "owner: jaime, type: job"
RUNNER_CPU_REQUESTS: "1"
RUNNER_GPUS: "1" Would mutate the pod and inject fields like: nodeSelector:
node.kubernetes.io/instance-type: g4dn.xlarge
tolerations:
- key: cccis.com/karpenter
operator: Exists
effect: NoSchedule resources:
requests:
cpu: 1
memory: 1Gi
nvidia.com/gpu: 1
limits:
memory: 2Gi
nvidia.com/gpu: 1 And the others But not sure if I'll have time to do that, but should not be complicated and might be the best (because it is the simplest) for our use case. |
Hey @jaimehrubiks, The work is mostly done. You can refer to this PR. Basically, we will extend the yaml job definition for you to specify the pod spec in your workflow yaml file. We just need to finalize the interface (the way you will provide a definition in a workflow). But if that implementation does not work for you, you can choose to implement it however you would like |
Nevermind, it is actually possible if you specify the variable in this specific location:
It won't be added to the pod spec if added in these other 3 locations:
(And of course "services" as those don't work on kube right now) |
Closing this one in favor of #75 |
Hi @nikola-jokic, as far as I can tell there is no workflow support for this yet. Is there any issue or PR I can watch to follow this? I'd like to set requests/limits per-job in the kind of way that this issue proposed. |
Hi there!
This PR contains changes to support a new env variable
ACTIONS_RUNNER_POD_TEMPLATE_PATH
which you can use to point to a template pod spec. This template can be used to enrich the created pod with all the fields that are required. We personally use it to set a bunch of default env variables on every job pod and to mount cache volumes into it. But it can also be used to set securityContext or serviceaccountName for example.I've chosen to solve this using a template file because I think this will require the least amount of changes to implement this upstream in https://github.com/actions/actions-runner-controller. Otherwise you would have to deal with passing the values of every field (serviceaccount, resources, mounts, etc. etc.) separately.
It uses the lodash mergeWith function to merge the template with the pod spec. It concatenates lists, so you can add extra env variables or volumemounts without overwriting the ones added by this project.
Please let me know what you think! I'd be happy to provide more info or write more tests if you think this is required. Thanks!
Fixes #46 & #33