feat: add functionality to provide optional pod template #50

nielstenboom · 2022-12-29T12:39:30Z

Hi there!

This PR contains changes to support a new env variable ACTIONS_RUNNER_POD_TEMPLATE_PATH which you can use to point to a template pod spec. This template can be used to enrich the created pod with all the fields that are required. We personally use it to set a bunch of default env variables on every job pod and to mount cache volumes into it. But it can also be used to set securityContext or serviceaccountName for example.

I've chosen to solve this using a template file because I think this will require the least amount of changes to implement this upstream in https://github.com/actions/actions-runner-controller. Otherwise you would have to deal with passing the values of every field (serviceaccount, resources, mounts, etc. etc.) separately.

It uses the lodash mergeWith function to merge the template with the pod spec. It concatenates lists, so you can add extra env variables or volumemounts without overwriting the ones added by this project.

Please let me know what you think! I'd be happy to provide more info or write more tests if you think this is required. Thanks!

Fixes #46 & #33

nikola-jokic · 2023-01-05T10:31:39Z

Hey @nielstenboom,

Thank you for submitting this PR! We are currently looking at how we can use workflow inputs to help with both of these issues. I will leave this PR open as a reference, but we may take a different direction here ☺️

nielstenboom · 2023-01-05T11:21:33Z

Hey @nielstenboom,

Thank you for submitting this PR! We are currently looking at how we can use workflow inputs to help with both of these issues. I will leave this PR open as a reference, but we may take a different direction here ☺️

No worries. Internally we temporarily will keep running this fork until you guys made the changes. Looking forward to the fix! 😄

MichaelHudgins · 2023-02-07T19:42:20Z

Hey @nielstenboom,

Thank you for submitting this PR! We are currently looking at how we can use workflow inputs to help with both of these issues. I will leave this PR open as a reference, but we may take a different direction here ☺️

Any update on a path forward on this? My use case mainly surrounds being able to set resources on the workflow pod. Right now I don't see an easy way to set a CPU request or bind a GPU to the pod.

nikola-jokic · 2023-02-08T09:43:21Z

Hey @MichaelHudgins,

No updates yet, but it is on our backlog and will definitely be worked on in the near future.

sdarwin · 2023-04-22T00:29:32Z

Hi,
Experimenting with https://github.com/actions/actions-runner-controller , I fairly quickly ran into a roadblock that the job pods have insufficient memory. All the jobs crash. So how do you set Resources on the workflow job pods? The github issue discussing it leads back to this pull request. Looks like an important bug to solve.

nielstenboom · 2023-04-22T09:53:16Z

We run the fork internally as follows:

Create a template.yaml file with the fields you'd like to set, we use the following template but you can also set the resources or any other field.

apiVersion: v1
kind: Pod
metadata:
  name: thisvaluewillbeignored
spec:
  containers:
    # the runner will override the "image", "name" and "command" fields
    - image: "test/test"
      name: "thisvaluewillbeignored"
      command:
        - "these"
        - "are"
        - "overridden"
      
      env:
       - name: POETRY_CACHE_DIR
          value: "/ci-cache/poetry"
      
      volumeMounts:
      - name: ci-cache
        mountPath: /ci-cache

  volumes:
  - name: ci-cache
    persistentVolumeClaim:
      claimName: ci-cache

Build your own runners with the following Dockerfile:

FROM summerwind/actions-runner:v2.299.1-ubuntu-20.04

ARG RUNNER_CONTAINER_HOOKS_VERSION=0.3.5


RUN cd "$RUNNER_ASSETS_DIR" \
    && sudo rm -rf ./k8s && pwd \
    && curl -fLo runner-container-hooks.zip https://github.com/nielstenboom/runner-container-hooks/releases/download/v${RUNNER_CONTAINER_HOOKS_VERSION}/actions-runner-hooks-k8s-${RUNNER_CONTAINER_HOOKS_VERSION}.zip \
    && unzip ./runner-container-hooks.zip -d ./k8s \
    && rm -f runner-container-hooks.zip


COPY template.yaml /home/runner/template.yaml
ENV ACTIONS_RUNNER_POD_TEMPLATE_PATH="/home/runner/template.yaml"

USER runner

ENTRYPOINT ["/bin/bash", "-c"]
CMD ["entrypoint.sh"]

Set the image.actionsRunnerRepositoryAndTag to your internally built image tag in the helm chart. If all is well then your new pods should have the fields set from the template.

Good luck!

sdarwin · 2023-04-22T15:28:43Z

Thanks Niels. I temporarily got around the cpu/mem resource issue another way, by setting a LimitRange on the namespace. The next issue is GKE Autopilot is very conservative about the size and quantity of nodes it launches. The minimum. When a job pod with a lot of cpu/mem is provisioned, what happens? What should happen is that the HPA autoscaler adds more nodes. However, what actually happens is a sudden "OutofCPU" error, and the job fails.

nielstenboom · 2023-04-23T09:01:32Z

Thanks Niels. I temporarily got around the cpu/mem resource issue another way, by setting a LimitRange on the namespace. The next issue is GKE Autopilot is very conservative about the size and quantity of nodes it launches. The minimum. When a job pod with a lot of cpu/mem is provisioned, what happens? What should happen is that the HPA autoscaler adds more nodes. However, what actually happens is a sudden "OutofCPU" error, and the job fails.

Maybe in your case it could already work if you set the resources on the runner pods itself since the job pods are forced onto the exact same node (see code here)?

sdarwin · 2023-04-23T16:17:41Z

@nielstenboom very interesting! Rather than distract too much in this pull request, I created a new discussion thread about the outofcpu errors.

jjhidalgar · 2023-05-17T15:45:06Z

Added some related discussion here: actions/actions-runner-controller#2592

If this doesn't move quick, I'm planning to write a "mutating webhook admission controller" which would intercept the "-workload" pods, and will read the env variables (because there is no way to set labels) and reconfigure the pod spec dynamically before the pod is actually created. So for example, this github actions job definition:

jobs:
  # This workflow contains a single job called "build"
  build:
    container:
      image: node:14.16
      env:
        NODE_ENV: development
        RUNNER_NODE_SELECTORS: "node.kubernetes.io/instance-type: g4dn.xlarge"
        RUNNER_TOLERATIONS: "[{ key: cccis.com/karpenter, operator: Exists, effect: NoSchedule}]"
        RUNNER_LABELS: "owner: jaime, type: job"
        RUNNER_CPU_REQUESTS: "1"
        RUNNER_GPUS: "1"

Would mutate the pod and inject fields like:

    nodeSelector:
      node.kubernetes.io/instance-type: g4dn.xlarge
    tolerations:
    - key: cccis.com/karpenter
      operator: Exists
      effect: NoSchedule

      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          memory: 2Gi
          nvidia.com/gpu: 1

And the others

But not sure if I'll have time to do that, but should not be complicated and might be the best (because it is the simplest) for our use case.

nikola-jokic · 2023-05-17T18:31:25Z

Hey @jaimehrubiks,

The work is mostly done. You can refer to this PR. Basically, we will extend the yaml job definition for you to specify the pod spec in your workflow yaml file. We just need to finalize the interface (the way you will provide a definition in a workflow). But if that implementation does not work for you, you can choose to implement it however you would like ☺️. The intention of the hook was to allow you to customize behaviour.

jjhidalgar · 2023-05-25T19:45:40Z

Added some related discussion here: actions/actions-runner-controller#2592

If this doesn't move quick, I'm planning to write a "mutating webhook admission controller" which would intercept the "-workload" pods, and will read the env variables (because there is no way to set labels) and reconfigure the pod spec dynamically before the pod is actually created. So for example, this github actions job definition:
jobs:
  # This workflow contains a single job called "build"
  build:
    container:
      image: node:14.16
      env:
        NODE_ENV: development
        RUNNER_NODE_SELECTORS: "node.kubernetes.io/instance-type: g4dn.xlarge"
        RUNNER_TOLERATIONS: "[{ key: cccis.com/karpenter, operator: Exists, effect: NoSchedule}]"
        RUNNER_LABELS: "owner: jaime, type: job"
        RUNNER_CPU_REQUESTS: "1"
        RUNNER_GPUS: "1"
Would mutate the pod and inject fields like:
    nodeSelector:
      node.kubernetes.io/instance-type: g4dn.xlarge
    tolerations:
    - key: cccis.com/karpenter
      operator: Exists
      effect: NoSchedule
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          memory: 2Gi
          nvidia.com/gpu: 1
And the others

But not sure if I'll have time to do that, but should not be complicated and might be the best (because it is the simplest) for our use case.

I'll wait until the functionality is added, because my approach cannot be done. There is nothing on the pod template apart from the image, I thought the env variables would be there but they are read from files shared with the main runner pod in a volume somewhere else, they are not in the pod template itself.

Nevermind, it is actually possible if you specify the variable in this specific location:

jobs.<job_id>.container.env

It won't be added to the pod spec if added in these other 3 locations:

env
jobs.<job_id>.env
jobs.<job_id>.steps[*].env

(And of course "services" as those don't work on kube right now)

nielstenboom · 2023-09-25T12:35:32Z

Closing this one in favor of #75

isker · 2024-04-05T19:06:29Z

We just need to finalize the interface (the way you will provide a definition in a workflow)

Hi @nikola-jokic, as far as I can tell there is no workflow support for this yet. Is there any issue or PR I can watch to follow this? I'd like to set requests/limits per-job in the kind of way that this issue proposed.

nielstenboom added 2 commits December 19, 2022 11:15

setup

5864248

squash

729c155

nielstenboom requested review from a team as code owners December 29, 2022 12:39

nielstenboom mentioned this pull request Dec 29, 2022

Cannot pass nodeSelector, tolarations and resources in containerMode: kubernetes actions/actions-runner-controller#1730

Open

4 tasks

tedchang77 mentioned this pull request Jan 9, 2023

containerMode option to allow running jobs in k8's instead of docker actions/actions-runner-controller#1546

Merged

veronicatjan mentioned this pull request Jul 21, 2023

Ability to share service account from runner pod to job pod in Kubernetes mode actions/actions-runner-controller#2754

Closed

nielstenboom mentioned this pull request Sep 25, 2023

Implement yaml extensions overwriting the default pod/container spec #75

Merged

nielstenboom closed this Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add functionality to provide optional pod template #50

feat: add functionality to provide optional pod template #50

nielstenboom commented Dec 29, 2022

nikola-jokic commented Jan 5, 2023

nielstenboom commented Jan 5, 2023

MichaelHudgins commented Feb 7, 2023

nikola-jokic commented Feb 8, 2023

sdarwin commented Apr 22, 2023

nielstenboom commented Apr 22, 2023

sdarwin commented Apr 22, 2023

nielstenboom commented Apr 23, 2023

sdarwin commented Apr 23, 2023

jjhidalgar commented May 17, 2023 •

edited

Loading

nikola-jokic commented May 17, 2023

jjhidalgar commented May 25, 2023 •

edited

Loading

nielstenboom commented Sep 25, 2023

isker commented Apr 5, 2024 •

edited

Loading

feat: add functionality to provide optional pod template #50

feat: add functionality to provide optional pod template #50

Conversation

nielstenboom commented Dec 29, 2022

nikola-jokic commented Jan 5, 2023

nielstenboom commented Jan 5, 2023

MichaelHudgins commented Feb 7, 2023

nikola-jokic commented Feb 8, 2023

sdarwin commented Apr 22, 2023

nielstenboom commented Apr 22, 2023

sdarwin commented Apr 22, 2023

nielstenboom commented Apr 23, 2023

sdarwin commented Apr 23, 2023

jjhidalgar commented May 17, 2023 • edited Loading

nikola-jokic commented May 17, 2023

jjhidalgar commented May 25, 2023 • edited Loading

nielstenboom commented Sep 25, 2023

isker commented Apr 5, 2024 • edited Loading

jjhidalgar commented May 17, 2023 •

edited

Loading

jjhidalgar commented May 25, 2023 •

edited

Loading

isker commented Apr 5, 2024 •

edited

Loading