-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for wildcard/patterns #4816
Comments
Based on my experience I'd assign the priorities like this:
But we need to agree on the common pattern format (how to reflect the pattern in dvc-files) before implementing even the first step. |
Regarding the first step
support for dir entries will simply require treating existing Line 403 in 6a9ab9c
Regular glob patterns are clearer than the proposed date/counter selectors, those need some research on existing solutions. So this is a multilayer ticket that has a lot of special cases. |
Related #4419. |
I will be taking a stab at implementing the first step for this issue. |
Related to iterative#4816. Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
Adds a new argument for the add command `glob` that is disabled by default and when enabled it passes the input targets through glob filtering. Related: iterative#4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
* api: add support for simple wildcards Related to #4816. Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com> * api: make wildcard interpretation optional Adds a new argument for the add command `glob` that is disabled by default and when enabled it passes the input targets through glob filtering. Related: #4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com> * Update dvc/repo/add.py * Update dvc/repo/add.py * Update dvc/repo/add.py * Update dvc/repo/add.py * Update dvc/repo/add.py Co-authored-by: Ruslan Kuprieiev <kupruser@gmail.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Related #4912 |
@jorgeorpinel #4864 is only about |
I can continue adding this functionality for all commands, if that's alright. |
@ju0gri Thanks for looking into it! 🙏 |
Question: We've introduced the Is the option temporary, expecting to make this default the behavior at some point? Otherwise I think we may need a better term as discussed in #4976 (comment), and even more now that I see patterns 3 (iterator) and 4 (date) which I think aren't covered by "glob". Thanks |
related: iterative#4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
Related: iterative#4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
Related: iterative#4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
* api: add glob option for pull command Related: #4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com> * api: add globbing utility function Related: #4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com> * api: use utility function for pull command Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com> * Update dvc/utils/__init__.py Co-authored-by: Ruslan Kuprieiev <kupruser@gmail.com>
Related: iterative#4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
* api: add globbing option for pushing Related: #4816 Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com> * api: use utility function for push command Signed-off-by: Ioana Grigoropol <ioana.grigoropol@gmail.com>
Hi, can we include the discussion about wildcards in stage output and dependency definitions (in
|
Seconding @jorgeorpinel on this, there is some new demand for wildcards on dvc stage outputs |
Sometimes only a subset of files is needed when the user runs
import
orpull
data from a data directory. It is convenient to define a file pattern for an import.From https://discuss.dvc.org/t/working-with-a-small-subset-of-remote-data/541
Related: #4705, #4815
Patterns to implement:
dvc pull cats-dogs/data/train/dogs/*.img
dvc pull cats-dogs/data/train/{dogs,cats}/???.img
dvc pull cats-dogs/data/train/**/*.img
dvc pull cats-dogs/data/train/dogs/%C.img?counter=1:100
dvc pull users/%Y/%m/%d/users.csv?startdata=2020-09-01,enddate=now,ignoremissing
The first three patterns should use a regular Unix file syntax. While the last two require a special language to define the pattern - we need to find a good examples.
The text was updated successfully, but these errors were encountered: