Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod startup failing due to new startup.sh #1557

Open
4 tasks done
rich-bain opened this issue Jun 23, 2022 · 5 comments
Open
4 tasks done

Pod startup failing due to new startup.sh #1557

rich-bain opened this issue Jun 23, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@rich-bain
Copy link

rich-bain commented Jun 23, 2022

Controller Version

0.24.1

Helm Chart Version

0.19.1

CertManager Version

1.8.1

Deployment Method

Other

cert-manager installation

  • Installation process with Helm Via flux. Tested and working correctly with custom domain.

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: default-org-runners
  namespace: actions-runner-system
spec:
  replicas: 1
  template:
    spec:
      image: summerwind/actions-runner-dind # <-- was unpinned
      dockerdWithinRunnerContainer: true
      organization: <redacted>
      env: []

To Reproduce

Try to run on a GKE cluster with the latest image `latest@sha256:331d12b3c2bb35436f82dccfc661afd13bb093afd3461d19cc5fce74b64e896c`. Pod correctly schedules but `runner`

Describe the bug

See textPayload key below.

{
  "textPayload": "tee: 'standard output': Bad file descriptor",    
  "insertId": "vrxebg7zrlowahpw",
  "resource": {
    "type": "k8s_container",
    ...
    }
  },
  "timestamp": "2022-06-23T03:45:09.572895611Z",
  "severity": "ERROR",
  "labels": {
       ...
  },
   ...
}

This is a k8s_container resource which means the pod was correctly scheduled but failed during startup.

Symptoms are a restart loop of pods every ~1 second.

Describe the expected behavior

Pod should start. Changing nothing but going back to summerwind/actions-runner-dind:v2.293.0-ubuntu-20.04-933b0c7@sha256:635aa33ed5fc83f5df7a27986f654500fc28eeb619498888f3442a133b54258b fixes the issue.

Controller Logs

https://gist.github.com/rich-bain/440988ac221992e196fba6aa4faeb711

Runner Pod Logs

Unavailable due to immediate deletion.

See stackdriver log (only a single item produced):
https://gist.github.com/rich-bain/440988ac221992e196fba6aa4faeb711

Additional Context

Worked fine yesterday. Spun up some new nodes which invalided my docker cache. Pinning back to the old version fixes the issue.

Issue either https://github.com/actions-runner-controller/actions-runner-controller/blob/master/runner/startup.sh#L30 or https://github.com/actions-runner-controller/actions-runner-controller/blob/master/runner/startup.sh#L45

@rich-bain rich-bain added the bug Something isn't working label Jun 23, 2022
@shaikatz
Copy link

Seeing the same behavior, the pod logs shows: tee: 'standard output': Bad file descriptor

@ChrisBr
Copy link

ChrisBr commented Jun 23, 2022

@rich-bain can you show which version you pinned to?

@ChrisBr
Copy link

ChrisBr commented Jun 23, 2022

Pinning the image to

image: summerwind/actions-runner-dind:v2.293.0-ubuntu-20.04-933b0c7@sha256:635aa33ed5fc83f5df7a27986f654500fc28eeb619498888f3442a133b54258b

resolved the issue for us!

@kyontan
Copy link

kyontan commented Jun 23, 2022

We faced the same error on new image actions-runner-dind:v2.293.0-ubuntu-20.04.
helm chart version: 0.19.1

The error log is one line:

environment: AWS EKS v1.22.9-eks-a64ea69 with Karpenter (containerd).

tee: 'standard output': Bad file descriptor

We've already reverted actions-runner-dind image version to v2.293.0-ubuntu-20.04 and everything goes back well.

@mattpopa
Copy link

Getting the same issue, reverted to 2.293.0-ubuntu-20.04-933b0c7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants