Skip to content

add: --to-remote needed? OR --external needed? #5445

@jorgeorpinel

Description

@jorgeorpinel

Follow up to #5198 (comment), #5301, and iterative/dvc.org#2172 (comment):

Question

add --to-remote is a bit strange because normally add doesn't move target data, rather tracks it in-place (analog to git add). But --to-remote implies that external data will be moved into the workspace at some point, which we skip for now but "pre-push" (transfer) it to remote storage (for later pull/fetch).

As of now add --to-remote has a similar result to get-url + add + push + remove, gc. So OK, maybe it's nice to have a shortcut to all that, but we already have import-url (--to-remote) to achieve the same.

The only difference vs. importing is that the data source is not recorded as a dependency in the .dvc file. So you can't update it or unfreeze+repro it. However I don't see any use cases where you would want to prevent the .dvc from having this dep, as you can simply never update or unfreeze it.

TLDR: I think import-url --to-remote is enough and what we should recommend for these situations. And add --to-remote breaks the Git analogy. Cc @dberenbaum

Improvement

  • But if we keep it, an improvement would be to NOT require the --external flag with it (cc @isidentical). This saves the user from typing a flag that is always needed, but also make sense since the data is not actually being treated as external in the sense that it won't be tracked/controlled in it's original location (requiring external cache, etc.).

Metadata

Metadata

Assignees

Labels

discussionrequires active participation to reach a conclusionenhancementEnhances DVCproduct: VSCodeIntegration with VSCode extension

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions