Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.10.1 and upwards
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
To reproduce, simply deploy the scale sets as normal (I was using the Quickstart Guide), and begin running jobs. No change was made to our K8S cluster or the docker images we were using for the runners before this bug began.
Describe the bug
Ephemeral runners are correctly brought up and begin advertising themselves to the repository/organisation as expected, however if a job hasn't begun running on them within 10 seconds, the ARC will kill the runners because it thinks they're idle.
While the data below refers to Windows runners, we also have Ubuntu runners where i've observed the issue happening - just with much less frequency (around 5/10% of the time).
Describe the expected behavior
The controller should wait a bit longer before killing the jobs because they are idle. The fact that jobs are assigned correctly approx. 50% of the time implies there's a tiny threshold that's being missed somwhere along the line. Unfortunately I can't control the delay at which GitHub will recognise there's now a free runner that's come online, but it would be helpful if the controller didn't wait for what seems as little as 10 seconds after creation before it kills a runner for being apparently Idle.
Additional Context
githubConfigUrl: https://github.com/redacted
githubConfigSecret: redacted
runnerGroup: redacted
minRunners: 1
template:
spec:
containers:
- name: runner
image: redacted
command: ["run.cmd"]
serviceAccountName: redacted
nodeSelector: # Ensures the pods can only run on nodes that have this label
runner-os: windows
iam.gke.io/gke-metadata-server-enabled: "true"
tolerations: # Ensures that the pods can only run on nodes that have this taint
- key: runners-fooding
operator: Equal
value: "true"
effect: NoSchedule
- key: node.kubernetes.io/os
operator: Equal
value: "windows"
effect: NoSchedule
Controller Logs
https://gist.github.com/JohnLBergqvist/46553ba6043449e704af88f1a706228e
Runner Pod Logs
Logs:
√ Connected to GitHub
Current runner version: '2.323.0'
2025-03-27 20:27:35Z: Listening for Jobs
Describe output
Name: redacted-m2xmj-runner-2sb5k
Namespace: arc-runners
Priority: 0
Service Account: redacted
Node: gke-49a8bb-scng/10.128.0.10
Start Time: Thu, 27 Mar 2025 20:23:24 +0000
Labels: actions-ephemeral-runner=True
actions.github.com/organization=redacted
actions.github.com/scale-set-name=redacted
actions.github.com/scale-set-namespace=arc-runners
app.kubernetes.io/component=runner
app.kubernetes.io/instance=redacted
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=redacted
app.kubernetes.io/part-of=gha-runner-scale-set
app.kubernetes.io/version=0.11.0
helm.sh/chart=gha-rs-0.11.0
pod-template-hash=79798d59cd
Annotations: actions.github.com/patch-id: 0
actions.github.com/runner-group-name: Cover
actions.github.com/runner-scale-set-name: redacted
actions.github.com/runner-spec-hash: 78d4b6447
Status: Terminating (lasts <invalid>)
Termination Grace Period: 30s
IP: 10.36.2.11
IPs:
IP: 10.36.2.11
Controlled By: EphemeralRunner/redacted-m2xmj-runner-2sb5k
Containers:
runner:
Container ID: containerd://redacted
Image: redacted
Image ID: redacted@sha256:redacted
Port: <none>
Host Port: <none>
Command:
run.cmd
State: Running
Started: Thu, 27 Mar 2025 20:27:30 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 2
memory: 10Gi
Environment:
ACTIONS_RUNNER_INPUT_JITCONFIG: <set to the key 'jitToken' in secret 'redacted-m2xmj-runner-2sb5k'> Optional: false
GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT: actions-runner-controller/0.11.0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-clv4p (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-clv4p:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: iam.gke.io/gke-metadata-server-enabled=true
runner-os=windows
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/os=windows:NoSchedule
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
runners-fooding=true:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m21s default-scheduler Successfully assigned arc-runners/redacted-m2xmj-runner-2sb5k to gke-49a8bb-scng
Normal Pulling 4m19s kubelet Pulling image "redacted"
Normal Pulled 18s kubelet Successfully pulled image "redacted" in 4m1.518s (4m1.518s including waiting). Image size: 3372778201 bytes.
Normal Created 18s kubelet Created container: runner
Normal Started 15s kubelet Started container runner
Normal Killing 5s kubelet Stopping container runner