Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argo doesn't work in clusters using CRI other than Docker #3500

Closed
scr-oath opened this issue Jul 17, 2020 · 7 comments
Closed

argo doesn't work in clusters using CRI other than Docker #3500

scr-oath opened this issue Jul 17, 2020 · 7 comments
Labels

Comments

@scr-oath
Copy link

MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file

Version: v2.9.3
CRI: cri-o://1.18.1
kubernetes: v1.18.2
workflow: https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
Logs: none - only from kubectl describe - it's still trying to create the container.

Checklist:

What happened:

In a cluster set up with CRI-O, there is no docker socket - pods created by argo fail to create as they cannot mount the docker socket.

What you expected to happen:

Shouldn't need to mount the docker socket

How to reproduce it (as minimally and precisely as possible):

Create a kubernetes cluster with CRI-O as the container runtime interface https://github.com/cri-o/cri-o#getting-started, install argo and run the hello-world workflow.

Anything else we need to know?:

Environment:

  • Argo version:
$ argo version
argo: v2.8.1
  BuildDate: 2020-05-28T23:40:32Z
  GitCommit: 0fff4b21c21c5ff5adbb5ff62c68e67edd95d6b8
  GitTreeState: clean
  GitTag: v2.8.1
  GoVersion: go1.13.4
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"archive", BuildDate:"2020-01-27T22:06:01Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Other debugging information (if applicable):

  • workflow result:
argo --loglevel DEBUG get <workflowname>
DEBU[0000] CLI version                                   version="{v2.9.3 2020-07-15T01:16:21Z 9407e19b3a1c61ad4043e382484fd0b6b15574f2 v2.9.3 clean go1.13.4 gc darwin/amd64}"
DEBU[0000] Client options                                opts="{{ false false}  0x2277640 0xc000103900}"
Name:                hello-world-zz9d4
Namespace:           default
ServiceAccount:      default
Status:              Running
Created:             Thu Jul 16 19:39:28 -0700 (57 seconds ago)
Started:             Thu Jul 16 19:39:28 -0700 (57 seconds ago)
Duration:            57 seconds

STEP                  TEMPLATE  PODNAME            DURATION  MESSAGE
 ◷ hello-world-zz9d4  whalesay  hello-world-zz9d4  57s       ContainerCreating
  • executor logs:
kubectl logs <failedpodname> -c init
Error from server (BadRequest): container init is not valid for pod hello-world-zz9d4
kubectl logs <failedpodname> -c wait
Error from server (BadRequest): container "wait" in pod "hello-world-zz9d4" is waiting to start: ContainerCreating
  • workflow-controller logs:
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)
time="2020-07-17T02:41:47Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Updated phase  -> Running" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Pod node hello-world-h2xhp initialized Pending" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Created pod: hello-world-h2xhp (hello-world-h2xhp)" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=16134964 workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Skipped pod hello-world-h2xhp (hello-world-h2xhp) creation: already exists" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=16134967 workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:47Z" level=info msg="Skipped pod hello-world-h2xhp (hello-world-h2xhp) creation: already exists" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:48Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:48Z" level=info msg="Updating node hello-world-h2xhp message: ContainerCreating"
time="2020-07-17T02:41:48Z" level=info msg="Skipped pod hello-world-h2xhp (hello-world-h2xhp) creation: already exists" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:48Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=16134981 workflow=hello-world-h2xhp
time="2020-07-17T02:41:48Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-h2xhp
time="2020-07-17T02:41:48Z" level=info msg="Skipped pod hello-world-h2xhp (hello-world-h2xhp) creation: already exists" namespace=default workflow=hello-world-h2xhp
[scr@C02VD2N0HTDD]$

Logs

argo get <workflowname>
kubectl logs <failedpodname> -c init
kubectl logs <failedpodname> -c wait
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@alexec
Copy link
Contributor

alexec commented Jul 17, 2020

We use the PNS executor, try that config?

https://argoproj.github.io/argo/workflow-executors/

@rsassPwC
Copy link

Im currently runing Argo in Kind and for the docker executor I get the same error. Can I manipulate the docker image kind to still use the docker executor?

@scr-oath
Copy link
Author

I've tried both kubelet and pws to no avail.

I seem to get something like this though (with pws) which is different…

 ⚠ hello-world-wz97k  whalesay  hello-world-wz97k  38s       failed to save outputs: Get https://10.253.11.82:10250/pods: x509: cannot validate certificate for 10.253.11.82 because it doesn't contain any IP SANs

@scr-oath
Copy link
Author

Hmm… this seems like this version of argo can't get past it no matter what I do - I made some nodes have docker and changed back, but they're still getting the cannot validate. certificate error; I guess I'll have to remove argo altogether and start again.

@alexec
Copy link
Contributor

alexec commented Jul 24, 2020

@scr-oath am I right in thinking that you have tried kublet and pns? Did you manage to run without using artifacts? Did you try k8sapi?

@scr-oath
Copy link
Author

scr-oath commented Jul 24, 2020

Well after reinstalling argo with the namespace installation and using pns, I was able to run the hello world, but the pipelines need, if not full-fledged artifacts, at least the ability to pass info to each other; it currently gets:

failed to save outputs: failed to chroot to main filesystem: operation not permitted

with this snippet

  - name: get-job-nums
    script:
      image: "{{workflow.parameters.modeling-pipeline-docker-image}}:{{workflow.parameters.modeling-pipeline-version}}"
      command: [bash]
      source: |
        jq -n "[range({{workflow.parameters.parallelism}}) | tostring]" | tee /tmp/job-nums.json
    outputs:
      parameters:
      - name: job-nums
        valueFrom:
          path: /tmp/job-nums.json

@stale
Copy link

stale bot commented Sep 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants