Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import*: fully support to-cache/remote transfers #5623

Closed
jorgeorpinel opened this issue Mar 15, 2021 · 3 comments
Closed

import*: fully support to-cache/remote transfers #5623

jorgeorpinel opened this issue Mar 15, 2021 · 3 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push enhancement Enhances DVC feature request Requesting a new feature

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 15, 2021

Context: iterative/dvc.org#2302 (review)

Currently only add --out/--to-remote has the ability to transfer data chunks from an external location to both the cache (e.g. an external one) or remote storage, to completely avoid the local file system.

import-url does have --to-remote as well, but apparently if you just import-url something into an external cache, there's no chunking so even if the cache is setup in some other drive, the data is fully downloaded to the local drive (which can cause an error if there's not enough space). So there's no straigh-to-cache transfer there (rel #4520). I see how users could expect this since it's available in add (then again imports are utility commands so maybe not very important).

Furthermore, plain import doesn't support any of this, and that's less of a utility command and more of a main feature (package-like data management). So should it also support to-cache and --to-remote transfers?


The docs change after this could start by (partially) reverting iterative/dvc.org@c393212.

@jorgeorpinel jorgeorpinel added enhancement Enhances DVC feature request Requesting a new feature labels Mar 15, 2021
@jorgeorpinel
Copy link
Contributor Author

Cc @isidentical. Thanks

@shcheklein
Copy link
Member

Furthermore, plain import doesn't support any of this

this is expected I think for --to-remote at least. For import we expect data to be in some DVC remote storage already and we don't want it to be duplicated again, so not reason to do that, it should be the same as if we do dvc import --no-exec.

For --to-cache (or --out these days?). Same here- I think dvc import will bring data into cache first and it will link it after.

if you just import-url something into an external cache

what do you exactly mean by this? (just to be on the same page, since there are too many overloaded terms in this sentence)

if understand the question right, can it be that you don't have links set to symlinks for example (so that it first caches it and then makes a copy like expected)

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Mar 16, 2021

For import we expect data to be in some DVC remote storage ... so not reason to do that

True @shcheklein! So I guess this is only about import-url not chunking external cache "transfers". Updated issue desc.

what do you exactly mean by this?

add /ext/path -o local/path uses chunking to copy /ext/path to the (external) cache, so the local FS doesn't need to have enough space for the entire data. import-url /ext/data seems to download everything locally first (to some tmp dir I guess) and then move it to the cache (per iterative/dvc.org#2302 (comment) unless I didn't get that right, cc @isidentical).

--to-cache (or --out these days?)

p.s. There was never a --to-cache flag 🤓 — I was also confused by this before.

@daavoo daavoo added the A: data-sync Related to dvc get/fetch/import/pull/push label Mar 1, 2022
@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push enhancement Enhances DVC feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

4 participants