-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Follow up to #5198 (comment), #5301, and iterative/dvc.org#2172 (comment):
Question
add --to-remote
is a bit strange because normally add
doesn't move target data, rather tracks it in-place (analog to git add
). But --to-remote
implies that external data will be moved into the workspace at some point, which we skip for now but "pre-push" (transfer) it to remote storage (for later pull/fetch
).
As of now add --to-remote
has a similar result to get-url
+ add
+ push
+ remove
, gc
. So OK, maybe it's nice to have a shortcut to all that, but we already have import-url (--to-remote)
to achieve the same.
The only difference vs. importing is that the data source is not recorded as a dependency in the .dvc file. So you can't update
it or unfreeze
+repro
it. However I don't see any use cases where you would want to prevent the .dvc from having this dep
, as you can simply never update
or unfreeze
it.
TLDR: I think import-url --to-remote
is enough and what we should recommend for these situations. And add --to-remote
breaks the Git analogy. Cc @dberenbaum
Improvement
- But if we keep it, an improvement would be to NOT require the
--external
flag with it (cc @isidentical). This saves the user from typing a flag that is always needed, but also make sense since the data is not actually being treated as external in the sense that it won't be tracked/controlled in it's original location (requiring external cache, etc.).
- Finish or close Example for dvc add --to-remote dvc.org#2172 when this is decided.