-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runner-Scale-Set in Kubernetes mode fails when writing to /home/runner/_work #2890
Comments
Hello! Thank you for filing an issue. The maintainers will triage your issue shortly. In the meantime, please take a look at the troubleshooting guide for bug reports. If this is a feature request, please review our contribution guidelines. |
Hey @bobertrublik, I think that you are missing the kubernetesModeWorkVolumeClaim |
According to the values.yaml file Just to make sure I added your suggestion but the |
Oh, that is right, but the The idea behind the customization is that if there are requirements for customization that are difficult to expand properly, for example dind with custom volume mounts, the |
I applied your suggestion, which made better use of the Helm chart template but doesn't seem to have had any immediate effect. values.yamlgithubConfigUrl: "https://github.com/privaterepo"
minRunners: 1
maxRunners: 3
runnerGroup: "runners"
containerMode:
type: "kubernetes"
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: "zrs-delete"
resources:
requests:
storage: 4Gi
template:
spec:
securityContext:
fsGroup: 1001
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
volumeMounts:
- name: work
mountPath: /home/runner/_work
controllerServiceAccount:
namespace: github-arc
name: github-arc autoscalingrunnerset.yamlapiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
name: github-runners
namespace: github-runners
labels:
app.kubernetes.io/component: "autoscaling-runner-set"
helm.sh/chart: gha-rs-0.5.0
app.kubernetes.io/name: gha-rs
app.kubernetes.io/instance: github-runners
app.kubernetes.io/version: "0.5.0"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: gha-rs
actions.github.com/scale-set-name: github-runners
actions.github.com/scale-set-namespace: github-runners
annotations:
actions.github.com/cleanup-github-secret-name: github-runners-gha-rs-github-secret
actions.github.com/cleanup-manager-role-binding: github-runners-gha-rs-manager
actions.github.com/cleanup-manager-role-name: github-runners-gha-rs-manager
actions.github.com/cleanup-kubernetes-mode-role-binding-name: github-runners-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-role-name: github-runners-gha-rs-kube-mode
actions.github.com/cleanup-kubernetes-mode-service-account-name: github-runners-gha-rs-kube-mode
spec:
githubConfigUrl: https://github.com/privaterepo
githubConfigSecret: github-runners-gha-rs-github-secret
runnerGroup: runners
maxRunners: 3
minRunners: 1
template:
spec:
securityContext:
fsGroup: 1001
serviceAccountName: github-runners-gha-rs-kube-mode
containers:
- name: runner
command:
- /home/runner/run.sh
image:
ghcr.io/actions/actions-runner:latest
env:
-
name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
-
name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
-
name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
volumeMounts:
-
mountPath: /home/runner/_work
name: work
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: zrs-delete |
@bobertrublik Thank you for raising the issue, I too facing same issue perhaps many others.@nikola-jokic Could you please help us to understand where we are going wrong, also if above information is not sufficient please let us know what all output info you required to analyze the issue.Thank you!! |
We have a policy which automatically does not use user "root" and overwrites it with a user with id 1234 We were able to get around this for the kubernetes runner pod itsel fwith: template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
securityContext:
runAsUser: 1001
runAsGroup: 123 However, when using e.g a container workflow where another pod is spawned: jobs:
arc-runner-job:
strategy:
fail-fast: false
matrix:
job: [1, 2, 3]
runs-on: ${{ inputs.arc_name }}
container: ubuntu
services:
redis:
image: redis
ports:
- 6379:6379
steps:
- uses: actions/checkout@v3
- run: echo "Hello World!" The checkout step fails: ##[debug]Evaluating condition for step: 'Post Run actions/checkout@v3'
##[debug]Evaluating: always()
##[debug]Evaluating always:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Run actions/checkout@v3
##[debug]Loading inputs
##[debug]Evaluating: github.repository
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'repository'
##[debug]=> 'GithubRunnerTest/arc-testing-workflows'
##[debug]Result: 'GithubRunnerTest/arc-testing-workflows'
##[debug]Evaluating: github.token
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'token'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
##[debug]Running JavaScript Action with default external tool: node16
Run '/home/runner/k8s/index.js'
##[debug]/home/runner/externals/node16/bin/node /home/runner/k8s/index.js
node:internal/fs/utils:347
throw err;
^
Error: EACCES: permission denied, open '/__w/_temp/_runner_file_commands/save_state_da0afb[2](https://***arc-testing-workflows/actions/runs/84042/job/179911#step:7:2)9-9bfa-48[3](***arc-testing-workflows/actions/runs/84042/job/179911#step:7:3)0-b67[4](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:4)-26018b714d93'
at Object.openSync (node:fs:[5](**arc-testing-workflows/actions/runs/84042/job/179911#step:7:5)90:3)
at Object.writeFileSync (node:fs:2202:35)
at Object.appendFileSync (node:fs:22[6](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:6)4:6)
at Object.issueFileCommand (/__w/_actions/actions/checkout/v3/dist/index.js:2950:8)
at Object.saveState (/__w/_actions/actions/checkout/v3/dist/index.js:286[7](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:7):31)
at Object.[8](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:8)647 (/__w/_actions/actions/checkout/v3/dist/index.js:2326:10)
at __nccwpck_require__ (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43)
at Object.2565 (/__w/_actions/actions/checkout/v3/dist/index.js:146:34)
at __nccwpck_require__ (/__w/_actions/actions/checkout/v3/dist/index.js:18256:43)
at Object.[9](**/arc-testing-workflows/actions/runs/84042/job/179911#step:7:9)2[10](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:10) (/__w/_actions/actions/checkout/v3/dist/index.js:[11](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:11)41:36) {
errno: -[13](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:13),
syscall: 'open',
code: 'EACCES',
path: '/__w/_temp/_runner_file_commands/save_state_da0afb29-9bfa-4830-b674-26018b7[14](***/arc-testing-workflows/actions/runs/84042/job/179911#step:7:14)d93'
}
##[debug]{"message":"command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/68de3ca0-5ec4-11ee-9966-bdeda596caf8.sh], exit code 1","details":{"causes":[{"reason":"ExitCode","message":"1"}]}}
Error: Error: failed to run script step: command terminated with non-zero exit code: error executing command [sh -e /__w/_temp/68de3ca0-5ec4-11ee-9966-bdeda596caf8.sh], exit code 1
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'RunScriptStep' did not execute successfully
##[debug] at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug] --- End of inner exception stack trace ---
##[debug] at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug] at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.RunScriptStepAsync(IExecutionContext context, ContainerInfo container, String workingDirectory, String entryPoint, String entryPointArgs, IDictionary`2 environmentVariables, String prependPath)
##[debug] at GitHub.Runner.Worker.Handlers.ContainerStepHost.ExecuteAsync(IExecutionContext context, String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, String standardInInput, CancellationToken cancellationToken)
##[debug] at GitHub.Runner.Worker.Handlers.NodeScriptActionHandler.RunAsync(ActionRunStage stage)
##[debug] at GitHub.Runner.Worker.ActionRunner.RunAsync()
##[debug] at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Post Run actions/checkout@v3 Question is can I control somehow the pods that are created which users they shall use in kubernetes mode? |
Hey @bobertrublik, Try not to specify volumes. The field under githubConfigUrl: "https://github.com/privaterepo"
minRunners: 1
maxRunners: 3
runnerGroup: "runners"
containerMode:
type: "kubernetes"
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: "zrs-delete"
resources:
requests:
storage: 4Gi
template:
spec:
securityContext:
fsGroup: 1001
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
controllerServiceAccount:
namespace: github-arc
name: github-arc |
Hey @Ravio1i, This is a slightly more difficult problem. One possible way that you can overcome this issue is by using container hook 0.4.0`. We introduced a hook extension that will take a template and modify the default pod spec created by the hook. You can specify securityContext there, that will be applied to the job container. |
Your suggestion returns the exact same |
Can you try to run init container that will apply correct permissions to all files under |
Using this config to set an initContainer: githubConfigUrl: "https://github.com/private"
minRunners: 1
maxRunners: 3
runnerGroup: "runners"
containerMode:
type: "kubernetes"
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: "zrs-delete"
resources:
requests:
storage: 4Gi
template:
spec:
initContainers:
- name: kube-init
image: ghcr.io/actions/actions-runner:latest
command: ["sudo", "chown", "-R", "1001:123", "/home/runner/_work"]
volumeMounts:
- name: work
mountPath: /home/runner/_work
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
controllerServiceAccount:
namespace: github-arc
name: github-arc I get this error in the checkout step:
and this error multiple times in the runner pod logs with varying PIDs.
When setting |
I am failing to reproduce the issue. Can you please switch the storage class and see if the issue persists? |
Thank you very much for the insights. I think we are almost succeeding at it We are using 0.6.1 which from the release-notes states it utilizes webhook version 0.4.0 I've create a pod template manifest, because it sounded from here it uses a PodTemplate apiVersion: v1
kind: PodTemplate
metadata:
name: runner-pod-template
labels:
app: runner-pod-template
spec:
securityContext:
runAsUser: 1001
runAsGroup: 123 I've created a simple FROM ghcr.io/actions/actions-runner:latest
COPY pod-template.yml /home/runner/pod-template.yml
RUN sudo chown -R runner:runner /home/runner/pod-template.yml As suggested here, I've used template:
spec:
containers:
- name: runner
image: <MYPRIVATEREGISTRY>/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: "/home/runner/pod-template.yml"
securityContext:
runAsUser: 1001
runAsGroup: 123
imagePullSecrets:
- name: regcred However, it still faces the same issue, that it is not using the user |
Hey @Ravio1i, This one comes from the runner itself. We have released the hook |
I see, thank you for the hint! Is there someway to see within the docker image itself which hook version is used? E.g I'm imaging something like a metadata file which says I've extended my ARG RUNNER_CONTAINER_HOOKS_VERSION=0.4.0
RUN sudo rm -rf ./k8s
RUN curl -f -L -o runner-container-hooks.zip https://github.com/actions/runner-container-hooks/releases/download/v${RUNNER_CONTAINER_HOOKS_VERSION}/actions-runner-hooks-k8s-${RUNNER_CONTAINER_HOOKS_VERSION}.zip \
&& unzip ./runner-container-hooks.zip -d ./k8s \
&& rm runner-container-hooks.zip I also forget the apiVersion: v1
kind: PodTemplate
metadata:
name: runner-pod-template
labels:
app: runner-pod-template
template:
spec:
securityContext:
runAsUser: 1001
runAsGroup: 123 However, no luck. I guess I'm still missing something! It's still not using the container template. Although the pod-template and the new k8s/index.js |
It was okay before, you can see an example template here. |
Well I got it working with: metadata:
annotations:
annotated-by: "extension"
labels:
labeled-by: "extension"
spec:
securityContext:
runAsUser: 1000
runAsGroup: 1000
restartPolicy: Never However, one downer which I still need to workaround is the user management of containers. When I'm using an image without the actual user (e.g debian:bullseye which only has root user) I'm running into a gzip error. Presumably, because something similar happens to this. So I may need to tweak around with the gzip: stdin: not in gzip format
/bin/tar: Child returned status 1
/bin/tar: Error is not recoverable: exiting now
Error: The process '/bin/tar' failed with exit code 2 While testing I found this
metadata:
annotations:
annotated-by: "extension"
labels:
labeled-by: "extension"
spec:
securityContext:
runAsUser: 0
runAsGroup: 0
restartPolicy: Never However due to my kubernetes hardening measures: │ Warning SyncError 5s (x13 over 27s) pod-syncer Error syncing to physical cluster: admission webhook "validation.gatekeeper.sh" denied the request: [psp-pods-allowed-user-ranges] Container job is attempting to run a │
│ s disallowed user 0. Allowed runAsUser: {"rule": "MustRunAsNonRoot"} │
│ [psp-pods-allowed-user-ranges] Container redis is attempting to run as disallowed user 0. Allowed runAsUser: {"rule": "MustRunAsNonRoot"} So yet again, I do have to find a way to get forward. Does the container extension support, init containers which can create users? I do have an additional question for the template**, which wildcard vars are there like - name: $job # overwirtes job container |
Maybe related to original issue, I tried to resolve my error from #2890 (comment) by setting However, when setting template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
securityContext:
runAsUser: 1001
runAsGroup: 123
fsGroup: 123 it disappears once applied, as if the runner-set chart is not using it |
Hey @Ravio1i,
This limitation only applies when you are using the runner itself, not when you are using it with the hook. But also, building docker images is not supported by the container hook. It can be modified to use kaniko for example, but it is not officially supported. As long as you are using an already built image, I would assume that you shouldn't have any issues running as any user you'd like. Hook extension should support init containers. The Security context within the helm chart is used to specify the runner image, not to pass that information to the hook. To deliver the hook extension, you need a file on the runner with the extension spec. Then you need to set the env |
Hello, thanks to the comments by @Ravio1i I noticed that my workflow was also being ran in a container under a different user than root. I'm sorry for overseeing this and taking your time. Once I re-created the image with user root the workflow ran successfully. Some additional context: Thank you for your help @nikola-jokic. |
Thank you for confirmation @bobertrublik. I will close this issue now since it is unrelated to the ARC itself |
@nikola-jokic Im also receiving these error, my container does really run on a nonroot user but im failing even before the checkout step. Ill even go and say that there is no way that it even pulled the image because its in a private repo and I havent given him the pullsecret.
|
Update: I must say adding the below
|
I'm wondering why this is not included as init-container in the chart. @nikola-jokic |
Checks
Controller Version
0.5.0
Helm Chart Version
0.5.0
CertManager Version
1.12.1
Deployment Method
Helm
cert-manager installation
Yes, it's also used in production.
Checks
Resource Definitions
To Reproduce
I deployed the
gha-runner-scale-set-controller
in the standard Helm chart configuration andgha-runner-scale-set
chart with the following values:The storage is provisioned by Azure
StandardSSD_ZRS
.Describe the bug
When I run a workflow on a self-hosted runner it always fails at the
actions/checkout@v3
action with this error:Looking inside the pod I see that the owner of
_work
is user root.4.0K drwxrwsr-x 3 root runner 4.0K Sep 12 08:06 _work
Describe the expected behavior
The checkout action should have no issues checking out a repository by writing to
/home/runner/_work/
inside a runner pod.I found this issue in the runner repository which proposes to set user ownership to the
runner
user. I'm not sure how to do that and why it's necessary with a rather standard deployment of the runner scale set. I already configuredfsGroup
as per troubleshooting docs.According to this comment I'm not supposed to set
containerMode
when configuring thetemplate
section. However this disables the kube mode role, rolebinding and serviceaccount in the chart, creates thenoPermissionServiceAccount
and the runner doesn't work at all.Whole Controller Logs
https://gist.github.com/bobertrublik/4ee34181ceda6da120bd91fd8f68754c
Whole Runner Pod Logs
https://gist.github.com/bobertrublik/d770a62c64679db5b9eab5644f0cfebc
The text was updated successfully, but these errors were encountered: