Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example is trying to mount hostPath for docker in docker #561

Closed
jlewi opened this issue Dec 18, 2018 · 20 comments
Closed

Example is trying to mount hostPath for docker in docker #561

jlewi opened this issue Dec 18, 2018 · 20 comments
Assignees
Labels
area/samples kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label. priority/p1

Comments

@jlewi
Copy link
Contributor

jlewi commented Dec 18, 2018

User reported this problem in this thread.
https://groups.google.com/forum/#!topic/kubeflow-discuss/5Y_7lhoQLIo

Example is failing because it is trying to mount the docker socket via hostPath.

They are running this example:
https://github.com/kubeflow/pipelines/blob/master/samples/notebooks/Lightweight%20Python%20components%20-%20basics.ipynb

The pod spec is below. The spec shows that it is trying to mount the docker socket. I'm guessing this is for docker in docker to build containers.

I'm not sure where this is coming from. The example in the notebook isn't explicitly building containers so not sure why it would need to do docker in docker.

Are Kubeflow pipelines always doing docker in docker?

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: privileged
    workflows.argoproj.io/node-name: pipeline-flip-coin-xlkfl.flip
    workflows.argoproj.io/outputs: >-
      {"parameters":[{"name":"flip-output","value":"tails","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"name":"mlpipeline-ui-metadata","path":"/mlpipeline-ui-metadata.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-ui-metadata.tgz"}},{"name":"mlpipeline-metrics","path":"/mlpipeline-metrics.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-metrics.tgz"}}]}
    workflows.argoproj.io/template: >-
      {"name":"flip","inputs":{},"outputs":{"parameters":[{"name":"flip-output","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"name":"mlpipeline-ui-metadata","path":"/mlpipeline-ui-metadata.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-ui-metadata.tgz"}},{"name":"mlpipeline-metrics","path":"/mlpipeline-metrics.json","s3":{"endpoint":"minio-service.kubeflow:9000","bucket":"mlpipeline","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"runs/30850dfb-0180-11e9-bd47-063a66a580a8/pipeline-flip-coin-xlkfl-3596557372/mlpipeline-metrics.tgz"}}]},"metadata":{},"container":{"name":"","image":"python:alpine3.6","command":["sh","-c"],"args":["python
      -c \"import random; result = 'heads' if random.randint(0,1) == 0 else
      'tails'; print(result)\" | tee
      /tmp/output"],"resources":{}},"archiveLocation":{}}
  creationTimestamp: '2018-12-16T22:16:09Z'
  labels:
    workflows.argoproj.io/completed: 'true'
    workflows.argoproj.io/workflow: pipeline-flip-coin-xlkfl
  name: pipeline-flip-coin-xlkfl-3596557372
  namespace: kubeflow
  ownerReferences:
    - apiVersion: argoproj.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: Workflow
      name: pipeline-flip-coin-xlkfl
      uid: 30850dfb-0180-11e9-bd47-063a66a580a8
  resourceVersion: '14833825'
  selfLink: /api/v1/namespaces/kubeflow/pods/pipeline-flip-coin-xlkfl-3596557372
  uid: 309010c0-0180-11e9-ac4e-0abcca1e707a
spec:
  containers:
    - args:
        - >-
          python -c "import random; result = 'heads' if random.randint(0,1) == 0
          else 'tails'; print(result)" | tee /tmp/output
      command:
        - sh
        - '-c'
      image: 'python:alpine3.6'
      imagePullPolicy: IfNotPresent
      name: main
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: pipeline-runner-token-wffsv
          readOnly: true
    - args:
        - wait
      command:
        - argoexec
      env:
        - name: ARGO_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
      image: 'argoproj/argoexec:v2.2.1'
      imagePullPolicy: IfNotPresent
      name: wait
      resources: {}
      securityContext:
        privileged: false
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /argo/podmetadata
          name: podmetadata
        - mountPath: /var/lib/docker
          name: docker-lib
          readOnly: true
        - mountPath: /var/run/docker.sock
          name: docker-sock
          readOnly: true
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: pipeline-runner-token-wffsv
          readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
    - name: pipeline-runner-dockercfg-xpbn2
  nodeName: ip-10-0-48-147.us-east-2.compute.internal
  nodeSelector:
    node-role.kubernetes.io/compute: 'true'
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: pipeline-runner
  serviceAccountName: pipeline-runner
  terminationGracePeriodSeconds: 30
  volumes:
    - downwardAPI:
        defaultMode: 420
        items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.annotations
            path: annotations
      name: podmetadata
    - hostPath:
        path: /var/lib/docker
        type: Directory
      name: docker-lib
    - hostPath:
        path: /var/run/docker.sock
        type: Socket
      name: docker-sock
    - name: pipeline-runner-token-wffsv
      secret:
        defaultMode: 420
        secretName: pipeline-runner-token-wffsv
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2018-12-16T22:16:09Z'
      reason: PodCompleted
      status: 'True'
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: '2018-12-16T22:16:09Z'
      reason: PodCompleted
      status: 'False'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: '2018-12-16T22:16:09Z'
      status: 'True'
      type: PodScheduled
  containerStatuses:
    - containerID: >-
        docker://bc66e85bce78f14247b325b421ae321b1e5bc27c14fcab4b8c27d749f7690810
      image: 'docker.io/python:alpine3.6'
      imageID: >-
        docker-pullable://docker.io/python@sha256:766a961bf699491995cc29e20958ef11fd63741ff41dcc70ec34355b39d52971
      lastState: {}
      name: main
      ready: false
      restartCount: 0
      state:
        terminated:
          containerID: >-
            docker://bc66e85bce78f14247b325b421ae321b1e5bc27c14fcab4b8c27d749f7690810
          exitCode: 0
          finishedAt: '2018-12-16T22:16:15Z'
          reason: Completed
          startedAt: '2018-12-16T22:16:15Z'
    - containerID: >-
        docker://4dcbf5229f61a04b842281b01bc102789228c7519583c33c1c62ef2324a2830e
      image: 'docker.io/argoproj/argoexec:v2.2.1'
      imageID: >-
        docker-pullable://docker.io/argoproj/argoexec@sha256:9b12553aa7dccddc88c766d3dd59f4e8758acbd82ceef9e7aedc75f09934480a
      lastState: {}
      name: wait
      ready: false
      restartCount: 0
      state:
        terminated:
          containerID: >-
            docker://4dcbf5229f61a04b842281b01bc102789228c7519583c33c1c62ef2324a2830e
          exitCode: 0
          finishedAt: '2018-12-16T22:16:16Z'
          reason: Completed
          startedAt: '2018-12-16T22:16:16Z'
  hostIP: 10.0.48.147
  phase: Succeeded
  podIP: 10.129.2.12
  qosClass: BestEffort
  startTime: '2018-12-16T22:16:09Z'
@hongye-sun
Copy link
Contributor

The docker socket is installed by argo for using "docker cp" to copy the artifact out from a container.
https://github.com/argoproj/argo/blob/master/workflow/controller/workflowpod.go#L48

I think this is the default behavior for openshift. User needs to relax the security constraint explicitly: https://docs.okd.io/latest/admin_guide/manage_scc.html#use-the-hostpath-volume-plugin

@jlewi
Copy link
Contributor Author

jlewi commented Dec 18, 2018

Thanks @hongye-sun. Does pipelines depend on this behavior of copying out the artifact using docker cp? Could pipelines instead just use a volume (e.g. emptyDir) to share data between containers.
Making the docker socket available to the pod seems like an undesirable escalation of privileges.

/cc @ioandr @vkoukis @pdmack @jessesuen

@hongye-sun
Copy link
Contributor

Yes, we highly rely on this behavior to get component outputs and upload pipeline artifacts. Currently, argo doesn't support other ways to copy file content from the main container. We might consider to use k8s API to copy the file content by implementing the copy methods in argo's k8s API executor. It requires non-trivial work.

Does it only affect openshift? From a web search, I don't see other providers (aws and azure) have similar issues.

/cc @Ark-kun

@hongye-sun
Copy link
Contributor

This is a more relevant bug in argo: argoproj/argo-workflows#970
It looks like Argo team is planning to take care of this.

@trusch
Copy link

trusch commented May 2, 2019

This also breaks all workflows which should be executed on a k8s cluster which doesnt use docker. My current usecase is running argo inside k3s which uses containerd a pod executer.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 21, 2019

We've now upgraded to Argo 2.3. AFAIK there are many improvements to different executors. Let's check whether switching the executor fixes the problem.

@sbko
Copy link

sbko commented Sep 20, 2019

I'm running Kubeflow v0.6.2. Pipelines still trying to mount hostPath:
Invalid value: "hostPath": hostPath volumes are not allowed to be used

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 24, 2019

Pipelines still trying to mount hostPath:

What Kubernetes environment do you use? Does this Argo sample work for you? https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml

If you're using a Docker-less environment the first step would be to change Argo workflow controller configuration to non-Docker executor. See this thread: #1654

@stale
Copy link

stale bot commented Jun 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 25, 2020
@stale
Copy link

stale bot commented Jul 3, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Jul 3, 2020
@linkvt
Copy link
Contributor

linkvt commented Jul 28, 2020

Hi @Ark-kun I just had a look at this and the referenced argo issue. Is my assumption correct, that this ticket is not solved yet?

We are currently deploying KFP 1.0 and it seems that hostPath volumes are still required:

This step is in Error state with this message: pods "conditional-execution-pipeline-with-exit-handler-tnpv5-1956183255" is forbidden: unable to validate against any pod security policy: [spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]

We are using k8s 1.14 with docker.

We were on the hand able to deploy argo directly and only emptyDir was required AFAIK and argo even seems to offer an option for putting the logs on a specific persistent volume, but this is not fully verified. pls ignore, switched it up with airflow...

Thanks in advance!

@Jeffwan
Copy link
Member

Jeffwan commented Aug 10, 2020

/reopen

@k8s-ci-robot
Copy link
Contributor

@Jeffwan: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Aug 10, 2020
@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Aug 10, 2020
@Jeffwan
Copy link
Member

Jeffwan commented Aug 10, 2020

The argo version in v1.1 still have the issue. This blocks one use case in EKS that we can not deploy kubeflow pipeline on EKS Fargate since Fargate doesn't support HostPath yet.

@d34th4ck3r
Copy link

I am running a local cluster using kind and getting the same error. Here is what I get when I describe my pod using kubectl describe pod file-passing-pipelines-cclzh-2358551148 -n kubeflow:

Name:           file-passing-pipelines-cclzh-2358551148
Namespace:      kubeflow
Priority:       0
Node:           kind-worker/172.19.0.2
Start Time:     Mon, 24 Aug 2020 17:44:08 +0900
Labels:         pipelines.kubeflow.org/cache_enabled=true
                pipelines.kubeflow.org/cache_id=
                pipelines.kubeflow.org/metadata_context_id=1
                pipelines.kubeflow.org/metadata_execution_id=3
                workflows.argoproj.io/completed=false
                workflows.argoproj.io/workflow=file-passing-pipelines-cclzh
Annotations:    pipelines.kubeflow.org/component_ref: {}
                pipelines.kubeflow.org/component_spec:
                  {"implementation": {"container": {"args": [{"if": {"cond": {"isPresent": "start"}, "then": ["--start", {"inputValue": "start"}]}}, {"if": ...
                pipelines.kubeflow.org/execution_cache_key: f6594b8f0728df187ec4f26083654d7b147e9e512c2a0bbeb11138846e028a60
                pipelines.kubeflow.org/metadata_input_artifact_ids: []
                sidecar.istio.io/inject: false
                workflows.argoproj.io/node-name: file-passing-pipelines-cclzh.write-numbers
                workflows.argoproj.io/template:
                  {"name":"write-numbers","arguments":{},"inputs":{},"outputs":{"artifacts":[{"name":"write-numbers-numbers","path":"/tmp/outputs/numbers/da...
Status:         Pending
IP:
IPs:            <none>
Controlled By:  Workflow/file-passing-pipelines-cclzh
Containers:
  wait:
    Container ID:
    Image:         gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:  file-passing-pipelines-cclzh-2358551148 (v1:metadata.name)
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /argo/secret/mlpipeline-minio-artifact from mlpipeline-minio-artifact (ro)
      /var/run/docker.sock from docker-sock (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-vvz7g (ro)
  main:
    Container ID:
    Image:         python:3.7
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      python3
      -u
      -c
      def _make_parent_dirs_and_return_path(file_path: str):
          import os
          os.makedirs(os.path.dirname(file_path), exist_ok=True)
          return file_path

      def write_numbers(numbers_path, start = 0, count = 10):
          with open(numbers_path, 'w') as writer:
              for i in range(start, count):
                  writer.write(str(i) + '\n')

      import argparse
      _parser = argparse.ArgumentParser(prog='Write numbers', description='')
      _parser.add_argument("--start", dest="start", type=int, required=False, default=argparse.SUPPRESS)
      _parser.add_argument("--count", dest="count", type=int, required=False, default=argparse.SUPPRESS)
      _parser.add_argument("--numbers", dest="numbers_path", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)
      _parsed_args = vars(_parser.parse_args())

      _outputs = write_numbers(**_parsed_args)

    Args:
      --count
      100000
      --numbers
      /tmp/outputs/numbers/data
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-vvz7g (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  docker-sock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  Socket
  mlpipeline-minio-artifact:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  mlpipeline-minio-artifact
    Optional:    false
  pipeline-runner-token-vvz7g:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pipeline-runner-token-vvz7g
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                   From                  Message
  ----     ------       ----                  ----                  -------
  Normal   Scheduled    53m                   default-scheduler     Successfully assigned kubeflow/file-passing-pipelines-cclzh-2358551148 to kind-worker
  Warning  FailedMount  47m                   kubelet, kind-worker  Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[mlpipeline-minio-artifact pipeline-runner-token-vvz7g podmetadata docker-sock]: timed out waiting for the condition
  Warning  FailedMount  36m (x2 over 49m)     kubelet, kind-worker  Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[pipeline-runner-token-vvz7g podmetadata docker-sock mlpipeline-minio-artifact]: timed out waiting for the condition
  Warning  FailedMount  32m (x2 over 45m)     kubelet, kind-worker  Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[docker-sock mlpipeline-minio-artifact pipeline-runner-token-vvz7g podmetadata]: timed out waiting for the condition
  Warning  FailedMount  8m7s (x11 over 51m)   kubelet, kind-worker  Unable to attach or mount volumes: unmounted volumes=[docker-sock], unattached volumes=[podmetadata docker-sock mlpipeline-minio-artifact pipeline-runner-token-vvz7g]: timed out waiting for the condition
  Warning  FailedMount  2m24s (x33 over 53m)  kubelet, kind-worker  MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file

@d34th4ck3r
Copy link

I was able to get KFP working on kind. Thanks to the comments mentioned here: #4256

@stale
Copy link

stale bot commented Nov 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Nov 24, 2020
@davidspek
Copy link
Contributor

I think I've run into this issue as well with Kubeflow 1.2 on Kubernetes 1.20 using containerd. Considering the deprecation of the dockershim that was announced, I think it might be a good idea to switch the on-prem kdef to use pns for the containerRuntimeExecutor.
#1654 (comment)

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 3, 2021
@stale
Copy link

stale bot commented Jun 3, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 3, 2021
@stale
Copy link

stale bot commented Apr 28, 2022

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Apr 28, 2022
Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023
…be updated. (kubeflow#561)

* update_kf_apps.py should create pipelineruns for images that need to be updated.

* Determine whether an image is already up to date by comparing the desired
  image to the image listed in the manifest

* If the image needs to be updated submit create the PipelineRun to update the
  image.

* Related to kubeflow#450

* Remove commented out code.

* Address comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/samples kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label. priority/p1
Projects
None yet
Development

No branches or pull requests