Skip to content

Runners are being removed for being idle before its job has had a chance to be assigned to it #4000

Open
@JohnLBergqvist

Description

@JohnLBergqvist

Checks

Controller Version

0.10.1 and upwards

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

To reproduce, simply deploy the scale sets as normal (I was using the Quickstart Guide), and begin running jobs. No change was made to our K8S cluster or the docker images we were using for the runners before this bug began.

Describe the bug

Ephemeral runners are correctly brought up and begin advertising themselves to the repository/organisation as expected, however if a job hasn't begun running on them within 10 seconds, the ARC will kill the runners because it thinks they're idle.

While the data below refers to Windows runners, we also have Ubuntu runners where i've observed the issue happening - just with much less frequency (around 5/10% of the time).

Describe the expected behavior

The controller should wait a bit longer before killing the jobs because they are idle. The fact that jobs are assigned correctly approx. 50% of the time implies there's a tiny threshold that's being missed somwhere along the line. Unfortunately I can't control the delay at which GitHub will recognise there's now a free runner that's come online, but it would be helpful if the controller didn't wait for what seems as little as 10 seconds after creation before it kills a runner for being apparently Idle.

Additional Context

githubConfigUrl: https://github.com/redacted
githubConfigSecret: redacted
runnerGroup: redacted
minRunners: 1
template:
  spec:
    containers:
      - name: runner
        image: redacted
        command: ["run.cmd"]
    serviceAccountName: redacted
    nodeSelector: # Ensures the pods can only run on nodes that have this label
      runner-os: windows
      iam.gke.io/gke-metadata-server-enabled: "true"
    tolerations: # Ensures that the pods can only run on nodes that have this taint
      - key: runners-fooding
        operator: Equal
        value: "true"
        effect: NoSchedule
      - key: node.kubernetes.io/os
        operator: Equal
        value: "windows"
        effect: NoSchedule

Controller Logs

https://gist.github.com/JohnLBergqvist/46553ba6043449e704af88f1a706228e

Runner Pod Logs

Logs: 

√ Connected to GitHub

Current runner version: '2.323.0'
2025-03-27 20:27:35Z: Listening for Jobs


Describe output

Name:             redacted-m2xmj-runner-2sb5k
Namespace:        arc-runners
Priority:         0
Service Account:  redacted
Node:             gke-49a8bb-scng/10.128.0.10
Start Time:       Thu, 27 Mar 2025 20:23:24 +0000
Labels:           actions-ephemeral-runner=True
                  actions.github.com/organization=redacted
                  actions.github.com/scale-set-name=redacted
                  actions.github.com/scale-set-namespace=arc-runners
                  app.kubernetes.io/component=runner
                  app.kubernetes.io/instance=redacted
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=redacted
                  app.kubernetes.io/part-of=gha-runner-scale-set
                  app.kubernetes.io/version=0.11.0
                  helm.sh/chart=gha-rs-0.11.0
                  pod-template-hash=79798d59cd
Annotations:      actions.github.com/patch-id: 0
                  actions.github.com/runner-group-name: Cover
                  actions.github.com/runner-scale-set-name: redacted
                  actions.github.com/runner-spec-hash: 78d4b6447
Status:           Terminating (lasts <invalid>)
Termination Grace Period:  30s
IP:               10.36.2.11
IPs:
  IP:           10.36.2.11
Controlled By:  EphemeralRunner/redacted-m2xmj-runner-2sb5k
Containers:
  runner:
    Container ID:  containerd://redacted
    Image:         redacted
    Image ID:      redacted@sha256:redacted
    Port:          <none>
    Host Port:     <none>
    Command:
      run.cmd
    State:          Running
      Started:      Thu, 27 Mar 2025 20:27:30 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     2
      memory:  10Gi
    Environment:
      ACTIONS_RUNNER_INPUT_JITCONFIG:          <set to the key 'jitToken' in secret 'redacted-m2xmj-runner-2sb5k'>  Optional: false
      GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT:  actions-runner-controller/0.11.0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-clv4p (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-clv4p:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              iam.gke.io/gke-metadata-server-enabled=true
                             runner-os=windows
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/os=windows:NoSchedule
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                             runners-fooding=true:NoSchedule
Events:
  Type     Reason            Age                    From                Message
  ----     ------            ----                   ----                -------
  Normal   Scheduled         4m21s                  default-scheduler   Successfully assigned arc-runners/redacted-m2xmj-runner-2sb5k to gke-49a8bb-scng
  Normal   Pulling           4m19s                  kubelet             Pulling image "redacted"
  Normal   Pulled            18s                    kubelet             Successfully pulled image "redacted" in 4m1.518s (4m1.518s including waiting). Image size: 3372778201 bytes.
  Normal   Created           18s                    kubelet             Created container: runner
  Normal   Started           15s                    kubelet             Started container runner
  Normal   Killing           5s                     kubelet             Stopping container runner

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions