Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow restriction dvc pull to targets #2706

Closed
Titousensei opened this issue Oct 31, 2019 · 5 comments
Closed

Allow restriction dvc pull to targets #2706

Titousensei opened this issue Oct 31, 2019 · 5 comments
Labels
enhancement Enhances DVC feature request Requesting a new feature help wanted p3-nice-to-have It should be done this or next sprint

Comments

@Titousensei
Copy link

We have a big mono repo where all the projects are eventually merged. We plan to put the dvc files related to a model in the same directory, and we will have many such directories. In that case, doing dvc pull will start pulling all the files cached in remote storage (potentially terabytes of data), right? If a user does dvc pull <target> she will only load the cached file needed by this target, which is what we really want the user to do.

It would be nice to have a way to prevent the dvc pull without a target, like a config option pull.disable_global.

@shcheklein
Copy link
Member

shcheklein commented Oct 31, 2019

Context: https://discordapp.com/channels/485586884165107732/563406153334128681/639582276388388870

Initial answer on Discord:

we don't have an option like this unfortunately but request makes total sense to me. Let's create a feature request for this please. I would add also that if there are TBs of data she will see that it takes too long even to download that data and will be able to interrupt w/o any consequences and changes in the workspace. It also looks like you would benefit from using dvc pull -R

@shcheklein shcheklein added enhancement Enhances DVC feature request Requesting a new feature labels Oct 31, 2019
@Suor
Copy link
Contributor

Suor commented Nov 1, 2019

I think letting user interrupt the process is ok solution unless there is a significant delay before it's obvious that lots of things will be pulled now.

@dmpetrov
Copy link
Member

dmpetrov commented Feb 3, 2020

It is quite dangerous to change a command behavior (syntax of the command in this case) by a config option. This should be done only in exceptional cases.

For this particular case, managing multiple DVC projects inside a single mono-repo might be a better solution with additional benefit. See #2349. The implementation is coming soon #3257.

I’m closing the issue in favor of #2349. Please provide more details if I missed something and multiple dvc repro won’t solve the issue.

@Suor
Copy link
Contributor

Suor commented Feb 3, 2020

@dmpetrov #2349 looks way more dangerous and way more heavy.

@dmpetrov
Copy link
Member

dmpetrov commented Feb 3, 2020

@Suor it is more general and it should not be heavy. Otherwise, we will end up with many ways of doing the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature help wanted p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

5 participants