-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specify remote when adding file to dvc repo #9352
Comments
@christian-steinmeyer Do you have any interest in trying to contribute it? |
I've had a first look in how the arguments are parsed and where the add method of the repo is defined, but I dont know what mechanism creates the actual files and how their content is defined. |
I think it should work similarly to The Line 269 in 787c72e
@iterative/dvc WDYT? |
I think it should be better passed to: Lines 259 to 266 in 787c72e
And pass it all the way down to Lines 316 to 335 in 787c72e
|
Sounds good, @daavoo. My only point about |
I am a little lost without actually going into the code, but shouldn't |
I think it is preserved and it's tested here: Lines 846 to 880 in b814531
|
Hi So shall we add this? |
I got as far as this 😅 Index: dvc/repo/add.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/dvc/repo/add.py b/dvc/repo/add.py
--- a/dvc/repo/add.py (revision b814531c84071823d83194af6f91d5718fc26976)
+++ b/dvc/repo/add.py (date 1682085738793)
@@ -72,9 +72,7 @@
invalid_opt = "--external option"
else:
message = "{option} can't be used without --to-remote"
- if kwargs.get("remote"):
- invalid_opt = "--remote"
- elif kwargs.get("jobs"):
+ if kwargs.get("jobs"):
invalid_opt = "--jobs"
if invalid_opt is not None:
@@ -255,6 +253,7 @@
path, wdir, out = resolve_paths(
repo, target, always_local=transfer and not kwargs.get("out")
)
+ remote = kwargs.get("remote")
stage = repo.stage.create(
single_stage=True,
@@ -263,6 +262,7 @@
wdir=wdir,
outs=[out],
external=external,
+ remote=remote
)
out_obj = stage.outs[0] Not sure where it would need to go next. |
have you pushed these changes on a branch ? |
No, but feel free to just apply the git patch I posted |
As noted in the original description, this might be misleading when used with |
It'd be great if there were some option to achieve this goal with pure dvc (i.e., without any other tool). For my personal use case, two steps (one Any idea I have to add it do I'm slightly surprised that when I use |
Agreed, and I think it has come up before in discussion. @efiop What do you think about it? |
I agree, it makes sense with |
What do you think about a separate command? dvc out --set-remote <remote> <out> Or, a generic command to update dvcfiles/dvc.yaml file.
|
@skshetry That is an option, but I don't think it is worth investing into right now. As mentioned above, it is just a yaml and one could use yq or any other tool to modify it to their liking. If this becomes a reoccuring request or we need to modify dvc files in scripts often - then yeah, some purpose-built command for modifying dvcfiles according to schema would make sense to spend time on. |
Slightly off-topic: Do you have usage data / case studies of how end users handle multiple remotes? We, for once, recently started a project that has some smaller data dependencies for development and a big data dependency for e.g. training a model. I.e., not everyone checking out the repo needs all the data. We "solved" this by using several dvc remotes, but as we now have to specify a remote for each file / stage or else they would get pushed to both remotes, that doesn't feel like the intended use case. I'm asking to gain some insights, whether we are the outsiders or perhaps the norm. If the latter should be the case, perhaps it would make sense to revisit the way dvc handles multiple remotes, which I believe to be connected to @efiop's point about the With our use cases in mind, this behavior would be a bit more intuitive to me. But perhaps I'm just missing other use cases. I understand, if you find this too derailing from the original topic and too broad a change from the current behavior to address in this thread. |
@christian-steinmeyer It might help to know that There is some related discussion in #8298 in case that seems more directly relevant to your questions. |
Currently, one can add a file to dvc via
dvc add some.file
which will create a tracking filesome.file.dvc
. In this tracking file, one can addremote: some_remote
to the fields, to tell dvc which remote to synchronize with (cf. docs). I would like to be able to do that from the command line per some argument todvc add
. The command already has argument-r/--remote
(although currently only allowed in combination with--to-remote
. It'd be great if it could be used for this purpose.What I'm suggesting in code:
dvc add test.txt --remote foo
should yield a file with this contentThe text was updated successfully, but these errors were encountered: