-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize pod process namespace sharing instead of docker executor #970
Comments
/cc @edlee2121 @JulienBalestra @gaganapplatix I don't think we need to spend any effort around a pure K8s API server executor implementation, since I think process namespace sharing will be the future of how we do artifact management, process waiting, process killing. Longer term, I think the docker/kubelet executors will eventually go away if the argoexec sidecar can already access the main container's filesystem easily through process namespace sharing. |
For copying artifacts, we would share process namespaces and access files via /proc/pid/root? |
That’s right |
I have quickly tested process namespace sharing (now beta feature enabled by default) on k8s v1.12 and is working as expected to access other container's filesystem. |
Update on this issue. After some deeper investigation, process namespace sharing does not solve the problem by itself. The crux of the issue is that, when the main process (container) exits, the processes filesystem of the main container goes away. In other words, the filesystem of the main container is only accessible at 1. setns with CAP_SYS_ADMIN capabilityThis approach uses
With this technique, as soon as the wait sidecar starts, it immediately infers the main container's pid and performs a setns syscall (http://man7.org/linux/man-pages/man2/setns.2.html) to that pid along with recursive mount of the main container's mounts. For those familiar with nsenter, it would be similar to the command:
The SYS_ADMIN capability is required since it is only possible to perform the setns and bind mounts After the main process exits, the wait process still has a file handle on the main containers filesystem, and can proceed to upload files. The obvious disadvantage of this approach is that SYS_ADMIN essentially makes the sidecar a privileged pod. For this reason, this approach would not be acceptable in secure environments. A second disadvantage is timing related. It is possible for the main container to run to completion before the pid was even available to the wait sidecar. In this situation, the wait sidecar would not be able to copy out the artifacts. 2. Overriding entrypoint of the main container with argoexecThe following technique would inject an argoexec binary into the main container and would
With this approach, the entrypoint of the main container is modified to be that of the argexec The disadvantages of this approach are that (1) it is most intrusive to the main container. (2) it 3. chroot
|
Could you let us know, any further development on finalizing the approach from the above list? |
@srikumar-b yes, I'm close to finishing approach #4. |
@jessesuen Any update on this ? Is there a PR for thie issue? |
Coming to this from kubeflow pipeline community. We have phased out Docker support in IBM Kubernetes Service, and are now relying on containerd, This is one blocking issue currently impacting the adotption for us. Any further updates on thsi? |
Seems like this might be the problem: https://github.com/zmhassan/argo/blob/master/workflow/controller/workflowpod.go#L52-L59 Users shouldn't be required to mount hostpath. |
Perhaps you can pass in an option into your Custom Resource to disable this as some users might just be using argo just for running workflows and might not require copying files. One option could be to have the code within the container running the job connect to s3 or some cloud storage to store these files. or even minio could be a drop in option. |
Do you know how this option can be passed or is it existing at all ? |
Fixed. K8s, Kubelete, and PNS executors will no longer mount docker.sock. The docker executor still needs docker.sock access even if not copying out artifacts, in order to perform a |
Thanks @jessesuen - can we now use either K8s and PNS executors, and get same functional behaviour from Argo, if not scalability? |
The main difference is the K8s executor puts some additional polling load on k8s API server for waiting for container completion, whereas PNS is polling the local linux kernel (procfs) for container completion. Also PNS requires K8s 1.12 for PNS support (without the feature gate). I'm copying my pro/con list from the other issue.
|
Is this a BUG REPORT or FEATURE REQUEST?: FEATURE REQUEST
What happened:
Currently, artifact saving is performed through the docker executor, which implements artifact copying via a
docker cp
command to copy out artifacts from the main container. The problem with this approach is it requires mounting ofdocker.sock
of the host, which is insecure, and unacceptable in some secure environments.In K8s, there is an (alpha) feature to share the process namespace and the filesystem between containers in a pod:
https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/
Instead of utilizing the docker executor, we could simply create the pod spec with
shareProcessNamespace: true
and access the filesystem of the main container to copy files directly. Similarly, the actual waiting and killing of the process, would only need to be done via normal kill command from the argoexec sidecar, as opposed to adocker kill
.Using process namespace sharing provides us an ideal solution which addresses a lot of the security, and scalability concerns with our current docker, kubelet, or k8s API server approaches.
NOTE that this is an alpha feature and needs an feature gate (
--feature-gates=PodShareProcessNamespace=true
) configured for it to be enabled.The text was updated successfully, but these errors were encountered: