Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes Instead of Sidecars for the Artifact Repository #1024

Open
vicaire opened this issue Sep 28, 2018 · 6 comments
Open

Volumes Instead of Sidecars for the Artifact Repository #1024

vicaire opened this issue Sep 28, 2018 · 6 comments
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/feature Feature request

Comments

@vicaire
Copy link

vicaire commented Sep 28, 2018

FEATURE REQUEST: Volumes Instead of Sidecars to upload/download data to the default Artifact Repository

Hi, I was wondering why Argo decided to use a sidecar to download/upload data to GCS/S3/etc when using the Default Artifact Repository.

Did we consider using the Volume abstraction in Kubernetes? It looks that there are types of volumes for many kinds of storage and that it would make it easy to add a new kind of storage for the Default Artifact Repository by implementing a new kind of volume.

https://ai.intel.com/kubernetes-volume-controller-kvc-data-management-tailored-for-machine-learning-workloads-in-kubernetes/

https://kubernetes.io/docs/concepts/storage/volumes/

@vicaire
Copy link
Author

vicaire commented Sep 28, 2018

/cc @jlewi

@wookasz
Copy link

wookasz commented Oct 13, 2018

If I understand your question correctly, the sidecar ensures that the specified files are stored to a specific location in the artifact repository, and that specific files are fetched to a specific location in the container. Without a sidecar it would not be possible to do this as configuration. It would be up to the step logic to do this.

For example, if step 1 wires a.csv, b.csv, and tmp.csv to /output/ in the container, we may only want a.csv and b.csv stored as artifacts. Step 2 may only require b.csv. Furthermore, the step 2 container may expect the input file to be named input.csv so a rename is required. The sidecar does this without requiring the step to perform that logic.

It would also be possible for steps to modify/delete the artifact of another step. That removes what I believe to be a key feature of any workflow/pipeline manager, which is data provenance.

@Ark-kun
Copy link
Member

Ark-kun commented Oct 15, 2018

It would also be possible for steps to modify/delete the artifact of another step.

The inputs volume can be mounted in read-only mode.

the sidecar ensures that the specified files are stored to a specific location in the artifact repository, and that specific files are fetched to a specific location in the container. Without a sidecar it would not be possible to do this as configuration.

You can mount any inputs/outputs volume subpath to any container location.

E.g. for task3 that uses artifacts from task1 and task2:

Mount <repository>/workflow1/task1/outputs/output1/ to /io/inputs/input1/ in read-only mode

Mount <repository>/workflow1/task2/outputs/output1/ to /io/inputs/input2/ in read-only mode

Mount <repository>/workflow1/task3/outputs/output1/ to /io/outputs/output2/ for writing

container may expect the input file to be named input.csv

Ideally, containers should only use paths received from the command line arguments.

@vicaire
Copy link
Author

vicaire commented Feb 6, 2019

@wookasz

Let me reformulate a bit. Instead of the artifact repository being GCS/S3/MinioServer, would it be possible to have an option to store the data in a volume?

Given the large number of volume implementations (NFS, GCP Cloud Filer, etc.), it seems that this would support a large number of use cases beyond object stores.

@vicaire
Copy link
Author

vicaire commented Feb 6, 2019

@IronPan
@paveldournov

@vicaire
Copy link
Author

vicaire commented Feb 8, 2019

@hongye-sun

@jessesuen jessesuen added the type/feature Feature request label Apr 19, 2019
@alexec alexec added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc type/feature Feature request
Projects
None yet
Development

No branches or pull requests

5 participants