Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement straight-to-remote functionality #5198

Merged
merged 36 commits into from
Jan 19, 2021

Conversation

isidentical
Copy link
Contributor

@isidentical isidentical commented Jan 4, 2021

Part of #4520

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Internal Tracker

  • If the provider supports it, show the total size in the progress bar (maybe with another patch, since this PR is too big already)

@isidentical isidentical changed the title Export to remote [WIP]: Implement straight-to-remote functionality Jan 4, 2021
@isidentical isidentical force-pushed the export-to-remote branch 4 times, most recently from 8ab4626 to 7a89af6 Compare January 11, 2021 13:22
@efiop
Copy link
Contributor

efiop commented Jan 12, 2021

For the record: while we are reviewing the technical part, let's start creating docs for this, to make sure we are on the same page in terms of CLI semantics.

dvc/repo/imp_url.py Outdated Show resolved Hide resolved
dvc/command/add.py Outdated Show resolved Hide resolved
"--to-remote",
action="store_true",
default=False,
help="Download it directly to the remote",
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like

Suggested change
help="Download it directly to the remote",
help="Track an external target, but don't move it into the "
"workspace, nor cache it. Push it to remote storage "
"instead.",

May need formatting

This comment was marked as off-topic.

Comment on lines +92 to +95
parser.add_argument(
"-r",
"--remote",
help="Remote storage to download to",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should definitely be included in --to-remote (let it accept a remote name as argument) to simplify the UI. add --to-remote --remote looks fort of repetitive anyway.

Copy link
Contributor Author

@isidentical isidentical Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually thought about it and decided for the sake of consistency with other commands who works on the remote, it would better to just split it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, in most cases it'll be just --to-remote, right?

also, can an option in argparse be made to optionally accept a value - in this case we would have to make --to-remote [remote-name] where remote-name is optional (default remote if not specified)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, in most cases it'll be just --to-remote, right?

yes

also, can an option in argparse be made to optionally accept a value - in this case we would have to make --to-remote [remote-name] where remote-name is optional (default remote if not specified)

yes, the implementation is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the implementation is possible.

then, I guess, I don't have a strong preference I think. --remote is good for consistency, but I would not say it looks critical to me, --to-remote already reminds it a lot.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, in most cases it'll be just --to-remote, right?

I'm not sure that most users employ default remotes. Do we know this?

I see the consistency argument but it's not that big a deal, and forcing the exact flag name causes other issues e.g. just add --remote is meaningless (and forbidden, I think, or it it silently ignored?) + its an extra flag to doc/explain and maintain πŸ˜‹

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that most users employ default remotes. Do we know this?

that's why we have default remotes at all. if we are not sure about this we should reconsider the whole logic we work with remotes then

add --remote is meaningless

agreed, I would not do this. may be keep only --to-remote <optional-name>

@jorgeorpinel jorgeorpinel marked this pull request as ready for review January 17, 2021 22:18
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 17, 2021

Oops I meant to resolve my #5198 (review) above and accidentally I clicked on "Ready for review". Unfortunately you can't go back to draft status... Sorry!

@isidentical
Copy link
Contributor Author

isidentical commented Jan 18, 2021

No worries, eventually I was going to also do that to get @efiop 's reviews!

@efiop efiop changed the title [WIP]: Implement straight-to-remote functionality Implement straight-to-remote functionality Jan 19, 2021
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! πŸ”₯

Work from this PR is needed for other PRs (e.g. overlapping with vdir functionality and even erepo refactor), so merging this as-is for now. Let's address the CLI and other leftovers on top.

@efiop efiop merged commit 51ada6d into iterative:master Jan 19, 2021
local_tree = self.repo.cache.local.tree
local_info = local_tree.path_info / tmp_fname()

from_tree.download(from_info, local_info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not fit on the local filesystem πŸ™ We could download a chunk and upload a chunk though and then assemble them. Or could just forbid transfering files as a whole for now.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 23, 2021

Working on iterative/dvc.org/pull/2091, I'm assuming the concerns left above will be addressedconsidered in a follow-up PR. Specifically:

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 5, 2021

Another thing to consider (BTW I meant considered/discussed, not necessarily addressed as in implemented):

Similar to combining --to-remote and --remote for simplicity/usability, should --external (in add) be implied with this? (applies to --to-cache too) As of now add --to-remote /ext/path fails if I don't also add --external:

Adding...
ERROR: Output(s) outside of DVC project: /ext/path. See <https://dvc.org/doc/user-guide/managing-external-data> for more info.

Cc @dberenbaum

@jorgeorpinel
Copy link
Contributor

p.s. on that error message (unrelated to this issue, sorry) should it give a HINT to retry with --external?

@efiop
Copy link
Contributor

efiop commented Feb 5, 2021

@jorgeorpinel Good catch with --external! @isidentical Could you please adjust that? πŸ™

@isidentical
Copy link
Contributor Author

I believe this already works with straigh-to-cache when the given output is within the repo, dvc add /some/stuff -o stuff. Though for the case of dvc add --to-remote, wouldn't it make sense to keep it with explicit --external since the actions related to that file in the future would affect paths outside of the project?

@jorgeorpinel
Copy link
Contributor

I haven't tested --to-cache yet so not sure about the first part of your comment for now, @isidentical.

And sorry, I didn't understand the relationship between requiring explicit --external with --to-remote to the fact that the data is external (affects external paths). Conceptually though, I don't think we consider consider it "outside of the project" once it's added. In fact we call it an "extended workspace" i.e. DVC "owns" that external location (they become links to the cache and no other system should try to control the same location).

Anyway, my main rationale is that saving users from unnecessarily typing a flag that is always required is good for the UX (similar as with merging --to-remote and --remote).

@isidentical
Copy link
Contributor Author

Would you create a ticket for the things in your mind (--external, --to-remote + --remote etc) @jorgeorpinel, it is hard to debate these stuff under a merged PR.

@jorgeorpinel
Copy link
Contributor

Yeah let me do that because actually I think I'm still confused here πŸ˜…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants