Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for non-docker based deployments #1654

Closed
saschagrunert opened this issue Jul 23, 2019 · 63 comments · Fixed by #5273
Closed

Support for non-docker based deployments #1654

saschagrunert opened this issue Jul 23, 2019 · 63 comments · Fixed by #5273

Comments

@saschagrunert
Copy link
Contributor

Do you think it would be possible to support non-docker based clusters as well? I'm currently checking out the examples and see that they want to mount the docker.sock into the container. We might achieve the same results when using crictl. WDYT?

@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 23, 2019

AFAIK, you can configure Argo to use other executors (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78. Then pipelines should just work.
Would you like to try it?

@Ark-kun Ark-kun self-assigned this Jul 23, 2019
@saschagrunert
Copy link
Contributor Author

saschagrunert commented Jul 24, 2019

AFAIK, you can configure Argo to use other executors (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78. Then pipelines should just work.
Would you like to try it?

Thanks for the help. I edited the configmap and also restarted the workflow controller pod (which seems not necessary). The config looks like this now:

> kubectl get configmap workflow-controller-configmap -o yaml
apiVersion: v1
data:
  config: |
    {
    executorImage: argoproj/argoexec:v2.3.0,
    artifactRepository:
        {
            s3: {
                bucket: mlpipeline,
                keyPrefix: artifacts,
                endpoint: minio-service.kubeflow:9000,
                insecure: true,
                accessKeySecret: {
                    name: mlpipeline-minio-artifact,
                    key: accesskey
                },
                secretKeySecret: {
                    name: mlpipeline-minio-artifact,
                    key: secretkey
                }
            },
            containerRuntimeExecutor: k8sapi
        }
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-07-22T13:56:32Z"
  labels:
    kustomize.component: argo
  name: workflow-controller-configmap
  namespace: kubeflow
  resourceVersion: "1181725"
  selfLink: /api/v1/namespaces/kubeflow/configmaps/workflow-controller-configmap
  uid: 3144d234-101f-4031-94ce-b1aa258bfafd

I also tried kubelet as value, but it still tries to mount the docker socket when running a pipeline:

> kubectl describe pod parallel-pipeline-jdnxw-643249177
...
Events:
  Type     Reason       Age                  From                   Message
  ----     ------       ----                 ----                   -------
  Normal   Scheduled    2m26s                default-scheduler      Successfully assigned kubeflow/parallel-pipeline-jdnxw-643249177 to caasp-node-3
  Warning  FailedMount  23s                  kubelet, caasp-node-3  Unable to mount volumes for pod "parallel-pipeline-jdnxw-643249177_kubeflow(9d937151-c9e3-493a-a7b3-a0870507caa7)": timeout expired waiting for volumes to attach or mount for pod "kubeflow"/"parallel-pipeline-jdnxw-643249177". list of unmounted volumes=[docker-sock]. list of unattached volumes=[podmetadata docker-sock mlpipeline-minio-artifact pipeline-runner-token-dr4dg]
  Warning  FailedMount  18s (x9 over 2m26s)  kubelet, caasp-node-3  MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file

The cluster runs on top of Kubernetes 1.15 and CRI-O 1.15 as container runtime. Is there anything else I can try?

@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 24, 2019

Your containerRuntimeExecutor is inside artifactRepository. It should be outside.

@saschagrunert
Copy link
Contributor Author

Your containerRuntimeExecutor is inside artifactRepository. It should be outside.

Ah thanks for the hint 🤦‍♂️, now I'm encountering a different set of issues when running the example pipelines:

With pns, every single step issues:

This step is in Error state with this message: failed to save outputs: Failed to determine pid for containerID b6f5119e85788ab25d8979841a5ff064240faeb77180ad28498624a98f0c4059: container may have exited too quickly

With kubelet and k8sapi:

invalid spec: templates.echo.outputs.artifacts.mlpipeline-ui-metadata: kubelet executor does not support outputs from base image layer. must use emptyDir

@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 25, 2019

With pns, every single step issues:

You should probably look at the workflow controller logs and the Wait container logs.

kubelet executor does not support outputs from base image layer. must use emptyDir

This is inconvenient, but can you try to satisfy that requirement? Mount an emptyDir beneath the outputs path using task.add_volume and task.add_volume_mount. See

task
.add_volume(
k8s_client.V1Volume(
name=volume_name,
secret=k8s_client.V1SecretVolumeSource(
secret_name=secret_name,
)
)
)
.add_volume_mount(
k8s_client.V1VolumeMount(
name=volume_name,
mount_path=secret_volume_mount_path,
)
)
as reference (it mounts secret volume though).

@saschagrunert
Copy link
Contributor Author

With pns, every single step issues:

You should probably look at the workflow controller logs and the Wait container logs.

Okay, If I run the [Sample] Basic - Exit Handler example pipeline with pns, then the workflow-controller pod logs:

time="2019-07-26T06:57:14Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Updated phase  -> Running" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="DAG node exit-handler-zpxsx (exit-handler-zpxsx) initialized Running" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="All of node exit-handler-zpxsx.echo dependencies [] completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Created pod: exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195)" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Pod node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) initialized Pending" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="All of node exit-handler-zpxsx.exit-handler-1 dependencies [] completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="DAG node exit-handler-zpxsx.exit-handler-1 (exit-handler-zpxsx-3298695089) initialized Running" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="All of node exit-handler-zpxsx.exit-handler-1.gcs-download dependencies [] completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Created pod: exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207)" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Pod node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) initialized Pending" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:15Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:15Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) message: ContainerCreating"
time="2019-07-26T06:57:15Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) message: ContainerCreating"
time="2019-07-26T06:57:15Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:16Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:19Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:19Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) status Pending -> Running"
time="2019-07-26T06:57:19Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:20Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:20Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) status Pending -> Running"
time="2019-07-26T06:57:20Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) status Running -> Error"
time="2019-07-26T06:57:20Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) message: failed to save outputs: Failed to determine pid for containerID baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19: container may have exited too quickly"
time="2019-07-26T06:57:20Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:21Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:21Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3143064195 completed"
time="2019-07-26T06:57:24Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) status Running -> Error"
time="2019-07-26T06:57:24Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) message: failed to save outputs: Failed to determine pid for containerID f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a: container may have exited too quickly"
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx.exit-handler-1 (exit-handler-zpxsx-3298695089) phase Running -> Error" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx.exit-handler-1 (exit-handler-zpxsx-3298695089) finished: 2019-07-26 06:57:24.436064763 +0000 UTC" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Checking daemoned children of exit-handler-zpxsx-3298695089" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx (exit-handler-zpxsx) phase Running -> Error" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx (exit-handler-zpxsx) finished: 2019-07-26 06:57:24.436217138 +0000 UTC" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Checking daemoned children of exit-handler-zpxsx" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Created pod: exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955)" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Pod node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) initialized Pending" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:25Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:25Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3267705207 completed"
time="2019-07-26T06:57:25Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) message: ContainerCreating"
time="2019-07-26T06:57:25Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:25Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:26Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:26Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:26Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3267705207 completed"
time="2019-07-26T06:57:29Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:29Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) status Pending -> Running"
time="2019-07-26T06:57:29Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:29Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) status Running -> Error"
time="2019-07-26T06:57:30Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) message: failed to save outputs: Failed to determine pid for containerID 15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56: container may have exited too quickly"
time="2019-07-26T06:57:30Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Updated phase Running -> Error" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Marking workflow completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Checking daemoned children of " namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:31Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3148652955 completed"

Whereas the exit handler logs contain:

Pod 1

wait:

time="2019-07-26T06:57:17Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: exit-handler-zpxsx-3143064195, pid: 9, hasOutputs: true)"
time="2019-07-26T06:57:17Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/exit-handler-zpxsx-3143064195) with template:\n{\"name\":\"echo\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true},{\"name\":\"mlpipeline-metrics\",\"path\":\"/mlpipeline-metrics.json\",\"optional\":true}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"library/bash:4.4.23\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo \\\"$0\\\"\",\"exit!\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/exit-handler-zpxsx/exit-handler-zpxsx-3143064195\"}}}"
time="2019-07-26T06:57:17Z" level=info msg="Waiting on main container"
time="2019-07-26T06:57:17Z" level=warning msg="Polling root processes (1m0s)"
time="2019-07-26T06:57:17Z" level=info msg="pid 33: &{root 4096 2147484141 {632376424 63699721037 0x22af420} {1048751 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124237 800376805} {1564124237 632376424} {1564124237 832376878} [0 0 0]}}"
time="2019-07-26T06:57:17Z" level=info msg="Secured filehandle on /proc/33/root"
time="2019-07-26T06:57:17Z" level=info msg="containerID crio-baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19 mapped to pid 33"
time="2019-07-26T06:57:17Z" level=info msg="pid 33: &{root 4096 2147484141 {632376424 63699721037 0x22af420} {1048751 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124237 800376805} {1564124237 632376424} {1564124237 832376878} [0 0 0]}}"
time="2019-07-26T06:57:17Z" level=info msg="pid 33: &{root 4096 2147484141 {632376424 63699721037 0x22af420} {1048751 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124237 800376805} {1564124237 632376424} {1564124237 832376878} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="main container started with container ID: baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19"
time="2019-07-26T06:57:19Z" level=info msg="Starting annotations monitor"
time="2019-07-26T06:57:19Z" level=info msg="Starting deadline monitor"
time="2019-07-26T06:57:19Z" level=error msg="executor error: Failed to determine pid for containerID baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:292\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:166\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:867\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:19Z" level=info msg="No sidecars"
time="2019-07-26T06:57:19Z" level=info msg="No output parameters"
time="2019-07-26T06:57:19Z" level=info msg="Saving output artifacts"
time="2019-07-26T06:57:19Z" level=info msg="Annotations monitor stopped"
time="2019-07-26T06:57:19Z" level=info msg="Staging artifact: mlpipeline-ui-metadata"
time="2019-07-26T06:57:19Z" level=info msg="Copying /mlpipeline-ui-metadata.json from container base image layer to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-07-26T06:57:19Z" level=error msg="executor error: could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:19Z" level=info msg="Alloc=3736 TotalAlloc=11943 Sys=70590 NumGC=5 Goroutines=9"
time="2019-07-26T06:57:19Z" level=fatal msg="could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

main

exit!

Pod 2

wait:

time="2019-07-26T06:57:28Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: exit-handler-zpxsx-3148652955, pid: 8, hasOutputs: true)"
time="2019-07-26T06:57:28Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/exit-handler-zpxsx-3148652955) with template:\n{\"name\":\"echo\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true},{\"name\":\"mlpipeline-metrics\",\"path\":\"/mlpipeline-metrics.json\",\"optional\":true}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"library/bash:4.4.23\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo \\\"$0\\\"\",\"exit!\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/exit-handler-zpxsx/exit-handler-zpxsx-3148652955\"}}}"
time="2019-07-26T06:57:28Z" level=info msg="Waiting on main container"
time="2019-07-26T06:57:28Z" level=warning msg="Polling root processes (1m0s)"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 274 2147484141 {232317677 63699057718 0x22af420} {65027 96 21 16877 0 0 0 0 274 4096 0 {1563892256 947849310} {1563460918 232317677} {1563460918 232317677} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="Secured filehandle on /proc/32/root"
time="2019-07-26T06:57:29Z" level=info msg="containerID crio-15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56 mapped to pid 32"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 4096 2147484141 {480401061 63699721048 0x22af420} {1048741 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124249 60402378} {1564124248 480401061} {1564124249 92402451} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="Secured filehandle on /proc/32/root"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 4096 2147484141 {480401061 63699721048 0x22af420} {1048741 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124249 60402378} {1564124248 480401061} {1564124249 92402451} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 4096 2147484141 {480401061 63699721048 0x22af420} {1048741 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124249 60402378} {1564124248 480401061} {1564124249 92402451} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="main container started with container ID: 15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56"
time="2019-07-26T06:57:29Z" level=info msg="Starting annotations monitor"
time="2019-07-26T06:57:29Z" level=info msg="Annotations monitor stopped"
time="2019-07-26T06:57:29Z" level=error msg="executor error: Failed to determine pid for containerID 15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:292\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:166\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:867\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:29Z" level=info msg="No sidecars"
time="2019-07-26T06:57:29Z" level=info msg="No output parameters"
time="2019-07-26T06:57:29Z" level=info msg="Starting deadline monitor"
time="2019-07-26T06:57:29Z" level=info msg="Deadline monitor stopped"
time="2019-07-26T06:57:29Z" level=info msg="Saving output artifacts"
time="2019-07-26T06:57:29Z" level=info msg="Staging artifact: mlpipeline-ui-metadata"
time="2019-07-26T06:57:29Z" level=info msg="Copying /mlpipeline-ui-metadata.json from container base image layer to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-07-26T06:57:29Z" level=error msg="executor error: could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:29Z" level=info msg="Alloc=3943 TotalAlloc=11808 Sys=70334 NumGC=5 Goroutines=8"
time="2019-07-26T06:57:29Z" level=fatal msg="could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

main

exit!

Pod 3

wait:

time="2019-07-26T06:57:19Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: exit-handler-zpxsx-3267705207, pid: 8, hasOutputs: true)"
time="2019-07-26T06:57:19Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/exit-handler-zpxsx-3267705207) with template:\n{\"name\":\"gcs-download\",\"inputs\":{\"parameters\":[{\"name\":\"url\",\"value\":\"gs://ml-pipeline-playground/shakespeare1.txt\"}]},\"outputs\":{\"parameters\":[{\"name\":\"gcs-download-data\",\"valueFrom\":{\"path\":\"/tmp/results.txt\"}}],\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true},{\"name\":\"mlpipeline-metrics\",\"path\":\"/mlpipeline-metrics.json\",\"optional\":true}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"google/cloud-sdk:216.0.0\",\"command\":[\"sh\",\"-c\"],\"args\":[\"gsutil cat $0 | tee $1\",\"gs://ml-pipeline-playground/shakespeare1.txt\",\"/tmp/results.txt\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/exit-handler-zpxsx/exit-handler-zpxsx-3267705207\"}}}"
time="2019-07-26T06:57:19Z" level=info msg="Waiting on main container"
time="2019-07-26T06:57:19Z" level=warning msg="Polling root processes (1m0s)"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 274 2147484141 {232317677 63699057718 0x22af420} {65027 96 21 16877 0 0 0 0 274 4096 0 {1563892256 947849310} {1563460918 232317677} {1563460918 232317677} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="Secured filehandle on /proc/30/root"
time="2019-07-26T06:57:19Z" level=info msg="containerID crio-f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a mapped to pid 30"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 504380676} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="Secured filehandle on /proc/30/root"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 504380676} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="main container started with container ID: f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a"
time="2019-07-26T06:57:20Z" level=info msg="Starting annotations monitor"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=error msg="executor error: Failed to determine pid for containerID f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:292\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:166\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:867\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:20Z" level=info msg="No sidecars"
time="2019-07-26T06:57:20Z" level=info msg="Saving output parameters"
time="2019-07-26T06:57:20Z" level=info msg="Saving path output parameter: gcs-download-data"
time="2019-07-26T06:57:20Z" level=info msg="Copying /tmp/results.txt from base image layer"
time="2019-07-26T06:57:20Z" level=error msg="executor error: could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).GetFileContents\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:79\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:412\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:48\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:20Z" level=info msg="Alloc=3575 TotalAlloc=11754 Sys=70846 NumGC=5 Goroutines=10"
time="2019-07-26T06:57:20Z" level=info msg="Annotations monitor stopped"
time="2019-07-26T06:57:20Z" level=info msg="Starting deadline monitor"
time="2019-07-26T06:57:20Z" level=info msg="Deadline monitor stopped"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=fatal msg="could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).GetFileContents\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:79\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:412\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:48\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

main

With which he yoketh your rebellious necks Razeth your cities and subverts your towns And in a moment makes them desolate

Unfortunately I can't find anything helpful in there, do you? 🤔

kubelet executor does not support outputs from base image layer. must use emptyDir

This is inconvenient, but can you try to satisfy that requirement? Mount an emptyDir beneath the outputs path using task.add_volume and task.add_volume_mount. See

task
.add_volume(
k8s_client.V1Volume(
name=volume_name,
secret=k8s_client.V1SecretVolumeSource(
secret_name=secret_name,
)
)
)
.add_volume_mount(
k8s_client.V1VolumeMount(
name=volume_name,
mount_path=secret_volume_mount_path,
)
)

as reference (it mounts secret volume though).

Hm, I tried to create my own pipeline but the big question is where to mount that empty dir? For now I have something like this, which causes the same issue as mentioned:

#!/usr/bin/env python3

import kfp
from kfp import dsl


def echo_op(text):
    return dsl.ContainerOp(name='echo',
                           image='library/bash:4.4.23',
                           command=['sh', '-c'],
                           arguments=['echo "$0"', text])


@dsl.pipeline(name='My pipeline', description='')
def pipeline():
    from kubernetes import client as k8s_client
    echo_task = echo_op('Hello world').add_volume(
        k8s_client.V1Volume(
            name='volume',
            empty_dir=k8s_client.V1EmptyDirVolumeSource())).add_volume_mount(
                k8s_client.V1VolumeMount(name='volume', mount_path='/output'))


if __name__ == '__main__':
    kfp.compiler.Compiler().compile(pipeline)

@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 26, 2019

where to mount that empty dir?

It should have been mounted to the folder where you're storing the outputs you produce. But in the last example you're not producing any, so there should have been no issues.

Ah. I forgot about the auto-added artifacts (#1422).

Can you try the following two things:

  1. First of all, try some Argo examples (e.g. https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml ) to narrow directly check the lower level compatibility with various execution modes.

  2. In your last example, add the following to the ContainerOp construction:

output_artifact_paths={
  'mlpipeline-ui-metadata': '/output/mlpipeline-ui-metadata.json',
  'mlpipeline-metrics': '/output/mlpipeline-metrics.json',
}

Here we override the paths for the auto-added output artifacts so that they're stored under the /output directory where you've mounter the emptyDir volume.

@saschagrunert
Copy link
Contributor Author

Can you try the following two things:

  1. First of all, try some Argo examples (e.g. https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml ) to narrow directly check the lower level compatibility with various execution modes.

So I applied the example via argo apply -f and that's the output of the pods logs:

main:

> kubectl logs -f artifact-passing-tmv4v-2138355403 main
 _____________
< hello world >
 -------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

wait:

> kubectl logs -f artifact-passing-tmv4v-2138355403 wait
time="2019-07-29T06:59:33Z" level=info msg="Creating a K8sAPI executor"
time="2019-07-29T06:59:34Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/artifact-passing-tmv4v-2138355403) with template:\n{\"name\":\"whalesay\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"hello-art\",\"path\":\"/tmp/hello_world.txt\"}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"sleep 1; cowsay hello world | tee /tmp/hello_world.txt\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/artifact-passing-tmv4v/artifact-passing-tmv4v-2138355403\"}}}"
time="2019-07-29T06:59:34Z" level=info msg="Waiting on main container"
time="2019-07-29T06:59:34Z" level=error msg="executor error: Failed to establish pod watch: unknown (get pods)\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:78\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).waitMainContainerStart\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:885\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:856\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-29T06:59:34Z" level=info msg="No sidecars"
time="2019-07-29T06:59:34Z" level=info msg="No output parameters"
time="2019-07-29T06:59:34Z" level=info msg="Saving output artifacts"
time="2019-07-29T06:59:34Z" level=warning msg="Failed to get pod 'artifact-passing-tmv4v-2138355403': pods \"artifact-passing-tmv4v-2138355403\" is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kubeflow\""
time="2019-07-29T06:59:34Z" level=error msg="executor error: pods \"artifact-passing-tmv4v-2138355403\" is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kubeflow\"\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:71\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).getPod\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:620\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerStatus\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:702\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerID\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:719\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:220\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-29T06:59:34Z" level=info msg="Alloc=3164 TotalAlloc=9686 Sys=70846 NumGC=4 Goroutines=5"
time="2019-07-29T06:59:34Z" level=fatal msg="pods \"artifact-passing-tmv4v-2138355403\" is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kubeflow\"\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:71\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).getPod\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:620\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerStatus\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:702\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerID\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:719\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:220\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
  1. In your last example, add the following to the ContainerOp construction:
output_artifact_paths={
  'mlpipeline-ui-metadata': '/output/mlpipeline-ui-metadata.json',
  'mlpipeline-metrics': '/output/mlpipeline-metrics.json',
}

Here we override the paths for the auto-added output artifacts so that they're stored under the /output directory where you've mounter the emptyDir volume.

Alright, this seems to work now, the pipeline succeeds.

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 6, 2019

Alright, this seems to work now, the pipeline succeeds.

I've looked at Argo source code. Maybe you do not even need the emptyDir mount.
Just changing the paths might be enough.

@Ark-kun Ark-kun closed this as completed Aug 6, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 6, 2019

Tracking the original issue
/reopen

@Ark-kun Ark-kun reopened this Aug 6, 2019
@Ark-kun Ark-kun assigned IronPan and unassigned Ark-kun Aug 6, 2019
@saschagrunert
Copy link
Contributor Author

Alright, this seems to work now, the pipeline succeeds.

I've looked at Argo source code. Maybe you do not even need the emptyDir mount.
Just changing the paths might be enough.

Hm, no then I get this error message:

invalid spec: templates.analyze-data.outputs.artifacts.mlpipeline-ui-metadata: k8sapi executor does not support outputs from base image layer. must use emptyDir

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 7, 2019

Hm, no then I get this error message:

Hmm. Maybe it would work if the paths are in an existing base image dir like /tmp (? I wonder whether /tmp is part of the base image) or /home not /output.

@saschagrunert
Copy link
Contributor Author

Hm, no then I get this error message:

Hmm. Maybe it would work if the paths are in an existing base image dir like /tmp (? I wonder whether /tmp is part of the base image) or /home not /output.

Yes I tried with /tmp, but had no luck either.

@knkski
Copy link

knkski commented Aug 13, 2019

I also ran into this issue, and the above fix worked for me for the regular, ContainerOp-style pipelines. When I tried creating a pipeline with func_to_container_op, I also had to add in this bit, as func_to_container_op wanted to store output under /tmp/outputs:

op.add_volume(k8s_client.V1Volume(name='outputs', empty_dir=k8s_client.V1EmptyDirVolumeSource()))
op.container.add_volume_mount(k8s_client.V1VolumeMount(name='outputs', mount_path='/tmp/outputs'))

@saschagrunert
Copy link
Contributor Author

I also ran into this issue, and the above fix worked for me for the regular, ContainerOp-style pipelines. When I tried creating a pipeline with func_to_container_op, I also had to add in this bit, as func_to_container_op wanted to store output under /tmp/outputs:

op.add_volume(k8s_client.V1Volume(name='outputs', empty_dir=k8s_client.V1EmptyDirVolumeSource()))
op.container.add_volume_mount(k8s_client.V1VolumeMount(name='outputs', mount_path='/tmp/outputs'))

Hey, I now run into similar issues. How to make it work with func_to_container_op? Because something like this does not work for me:

from kfp.components import func_to_container_op

OUT_DIR = '/tmp/outputs'
METADATA_FILE = 'mlpipeline-ui-metadata.json'
METRICS_FILE = 'mlpipeline-metrics.json'
METADATA_FILE_PATH = path.join(OUT_DIR, METADATA_FILE)
METRICS_FILE_PATH = path.join(OUT_DIR, METRICS_FILE)
BASE_IMAGE = 'my-image:latest'

def default_artifact_path() -> Dict[str, str]:
    return {
        path.splitext(METADATA_FILE)[0]: METADATA_FILE_PATH,
        path.splitext(METRICS_FILE)[0]: METRICS_FILE_PATH,
    }

def storage_op(func, *args):
    op = func_to_container_op(func, base_image=BASE_IMAGE)(*args)
    op.output_artifact_paths=default_artifact_path() # I'm not able to overwrite the artifact path here
    op.add_volume(k8s.V1Volume(name='outputs',
                               empty_dir=k8s.V1EmptyDirVolumeSource()))\
      .add_volume_mount(k8s.V1VolumeMount(name='outputs', mount_path=OUT_DIR))
    return op

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 24, 2019

Good news: The 'mlmetadata-*' artifacts are no longer automatically added to every single pipeline task. (There are still some components that explicitly produce those.)

Side news: All outputs now produce artifacts.

We need to investigate how to make Argo copy the artifacts when using PNS. They should be supporting this, otherwise it's a bug. I need to check the exact criteria for the "emptyDir" error.

BTW, What would be the easiest way to set-up a temporary Docker-less Linux environment?

@saschagrunert
Copy link
Contributor Author

Good news: The 'mlmetadata-*' artifacts are no longer automatically added to every single pipeline task. (There are still some components that explicitly produce those.)

Side news: All outputs now produce artifacts.

We need to investigate how to make Argo copy the artifacts when using PNS. They should be supporting this, otherwise it's a bug. I need to check the exact criteria for the "emptyDir" error.

BTW, What would be the easiest way to set-up a temporary Docker-less Linux environment?

Sounds good, thanks for the update. I guess an easy way would be the usage of kubeadm with some natively supported distribution, like ubuntu 18.04. Then you could use the project atomic PPA to install CRI-O and bootstrap the node with selecting the crio.sock as runtime endpoint.

@knkski
Copy link

knkski commented Sep 24, 2019

@Ark-kun: As far as setting up a Docker-less environment, I ran into this issue while using microk8s, which uses containerd.

@Bobgy
Copy link
Contributor

Bobgy commented Mar 22, 2021

/reopen
Due to #5285, we reverted default to docker executor for current release.

We need to stabilize PNS executor preparing for the next release.

@google-oss-robot
Copy link

@Bobgy: Reopened this issue.

In response to this:

/reopen
Due to #5285, we reverted default to docker executor for current release.

We need to stabilize PNS executor preparing for the next release.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@juliusvonkohout
Copy link
Member

/reopen
Due to #5285, we reverted default to docker executor for current release.

We need to stabilize PNS executor preparing for the next release.

For the next release you should update to argo 3.1 and use the emissary executor, which works everywhere rootless. https://argoproj.github.io/argo-workflows/workflow-executors/ i already tested it successfully with kubeflow 1.2

@juliusvonkohout
Copy link
Member

Please upvote #5718 if you want to have a proper solution to this bug.

@Bobgy
Copy link
Contributor

Bobgy commented Aug 22, 2021

Documentation: https://www.kubeflow.org/docs/components/pipelines/installation/choose-executor/
Issue: #5718

We are now recommending the emissary executor (Alpha, released in KFP 1.7.0), welcome feedbacks!

@zacharymostowsky
Copy link

zacharymostowsky commented Sep 28, 2021

where to mount that empty dir?

It should have been mounted to the folder where you're storing the outputs you produce. But in the last example you're not producing any, so there should have been no issues.

Ah. I forgot about the auto-added artifacts (#1422).

Can you try the following two things:

  1. First of all, try some Argo examples (e.g. https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml ) to narrow directly check the lower level compatibility with various execution modes.
  2. In your last example, add the following to the ContainerOp construction:
output_artifact_paths={
  'mlpipeline-ui-metadata': '/output/mlpipeline-ui-metadata.json',
  'mlpipeline-metrics': '/output/mlpipeline-metrics.json',
}

Here we override the paths for the auto-added output artifacts so that they're stored under the /output directory where you've mounter the emptyDir volume.

How would I do 2. for a functional component?

@Bobgy
Copy link
Contributor

Bobgy commented Sep 28, 2021

@zacharymostowsky the instructions you read is outdated. #1654 (comment) is our current recommendations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.