Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc pull --glob tries to pull dvcignored files and glob pattern is only applied to files that already exist #5864

Open
kevinhaybach opened this issue Apr 22, 2021 · 3 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? p2-medium Medium priority, should be done, but less important

Comments

@kevinhaybach
Copy link

As mentioned in topic I have found two issues

  1. dvc pull --glob ".*json" tries to pull dvcignored files
  2. dvc pull --glob "*.json" ignores the pattern and pulls everything

Explaination 1:
Lets assume I have a folder with two json files: file_1.json and file_2.json
file_1.json is in dvcignore

I run the command "dvc pull --glob "*.json"

The return is:
"ERROR: failed to pull data from the cloud - 'file_1.json' does not exist as an output or a stage name in 'dvc.yaml': 'dvc.yaml' does not exist"

Explaination 2:
Lets assume I have a folder with one json file and one png file but this files but the files are NOT in the current directory yet but only the .dvc files

I run the command "dvc pull --glob "*.json"

I would expect hat only the .json file is pulled but instead the .png file is pulled as well.

I got following explaination:
"The problem is that the glob pattern is only applied to files that are already in your local workspace (so it works for pulling updated versions but not for pulling new files). if you haven't pulled anything yet, it will return an empty list of pattern matches, and then DVC falls back to the default "pull everything" behavior"

@skshetry
Copy link
Member

After #5273 is merged, we'll be able to glob actual outputs instead of the workspace.

@pared pared added bug Did we break something? p2-medium Medium priority, should be done, but less important labels Jul 14, 2021
@pared
Copy link
Contributor

pared commented Jul 14, 2021

#5273 Finally was not merged, so it seems this issue remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

5 participants