Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc add error using wild match operator in windows? #4705

Closed
karajan1001 opened this issue Oct 12, 2020 · 7 comments
Closed

dvc add error using wild match operator in windows? #4705

karajan1001 opened this issue Oct 12, 2020 · 7 comments
Labels
awaiting response we are waiting for your reply, please respond! :) feature request Requesting a new feature good first issue help wanted P: windows Related to the Platform: Windows p3-nice-to-have It should be done this or next sprint

Comments

@karajan1001
Copy link
Contributor

karajan1001 commented Oct 12, 2020

Bug Report

image
image

dvc add failed while using wild match mode. It adds a wrong suffix .dvc to my pattern.

Please provide information about your setup

Output of dvc version:
Platform: Python 3.8.3 on Windows-10-10.0.18362-SP0
Supports: http, https, ssh
Cache types: hardlink
Cache directory: NTFS on D:
Workspace directory: NTFS on D:
Repo: dvc, git

$ dvc version

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

@efiop
Copy link
Contributor

efiop commented Oct 14, 2020

@karajan1001 What does that error say, btw? And also could you provide verbose output, please?

So far looks like there might be something about the way your shell is evaluating the regex. It actually looks like it is not expanding the wildcards and ? at all and passing them as is to the dvc. Dvc itself doesn't support regexes at all, we simply rely on your shell to do that and then pass the list to dvc add. E.g. on bash dvc add *.mov would actually result in dvc add 1.mov 2.mov ... 10.mov, so dvc will get the list of files and not the original regex.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Oct 14, 2020
@karajan1001
Copy link
Contributor Author

karajan1001 commented Oct 15, 2020

@karajan1001 What does that error say, btw? And also could you provide verbose output, please?

So far looks like there might be something about the way your shell is evaluating the regex. It actually looks like it is not expanding the wildcards and ? at all and passing them as is to the dvc. Dvc itself doesn't support regexes at all, we simply rely on your shell to do that and then pass the list to dvc add. E.g. on bash dvc add *.mov would actually result in dvc add 1.mov 2.mov ... 10.mov, so dvc will get the list of files and not the original regex.

Yes obviously, DVC didn't get correctly file list. This might be an issue of 1. environment 2. package DVC relied on, not one in DVC itself. But

  1. Git didn't suffer from the same issue in the same directory.
  2. Wildcards are wildly used in daily work, some other DVC commands may have the same problem.

@efiop
Copy link
Contributor

efiop commented Oct 18, 2020

@karajan1001 Great point about the git. Git indeed supports some globing natively, so to match it we also need to pass the targets through os.glob. The use case is limited to shells that don't support globbing natively (I'm surprised PS didn't do that, maybe I'm missing something), so it is pretty limited 🙁

@karajan1001
Copy link
Contributor Author

@efiop
I tested on my computer, PowerShell didn't expand patterns.

image
image

According to stackoverflow:
We have to implement wildcard expansion ourselves.

@efiop
Copy link
Contributor

efiop commented Oct 19, 2020

@karajan1001 Thanks for the research! 🙏 So we indeed need to pass targets through os.glob to implement that. We could start with doing just that only in dvc/repo/add.py, but there might be a better way to do it everywhere. Obviously we could add a custom argparse action that would pass the targets through os.glob, but it seems to be more fitting to implement it on API level (dvc/repo/) instaed of just CLI (dvc/command/).

Related #4419

@efiop efiop added feature request Requesting a new feature good first issue help wanted P: windows Related to the Platform: Windows p3-nice-to-have It should be done this or next sprint labels Oct 19, 2020
@gamis
Copy link

gamis commented Oct 27, 2020

+1 on this feature!

@efiop
Copy link
Contributor

efiop commented Nov 1, 2020

Closing in favor of #4816

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) feature request Requesting a new feature good first issue help wanted P: windows Related to the Platform: Windows p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

3 participants