-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull: pull all parameterized stages (--group
)
#7462
Comments
Thanks, good suggestion. And for now. you can use the following method. |
Yes, but the problem there is just as you stated: IF you already have them in the workspace. |
Seems like I should know this 😅 , but would someone mind explaining why |
|
For glob, it only matches the targets. --glob Allows targets containing shell-style wildcards. And besides this a big limitation is mentioned before: For example for a stage $ dvc repro --glob mode?.* # can match `model.pkl`
Data and pipelines are up to date.
$ dvc repro --glob mode??.* # two ?? here can not match `model.pkl`
zsh: no matches found: mode??.*
$ dvc repro --glob trai* # cannot match stage name `train`
zsh: no matches found: trai*
$ rm model.pkl # if we delete the file `model.pkl`
$ dvc repro --glob mode?.* # cannot match it for now.
zsh: no matches found: mode?.* |
Hm, I was getting confused because of behavior without wildcards:
It seems without wildcards that |
@dberenbaum, the last one is coming from your shell, you have to single-quote it to avoid wildcard expansion by the shell. |
Isn't the same true for the example from @karajan1001? |
I think so, we need to add a $ dvc repro --glob "trai*"
'data/data.xml.dvc' didn't change, skipping
Stage 'prepare' didn't change, skipping
Stage 'featurize' didn't change, skipping
Stage 'train' is cached - skipping run, checking out outputs
Use `dvc push` to send your updates to remote storage.
$ dvc repro "trai*"
ERROR: Stage 'trai*' not found inside 'dvc.yaml' file So, @sjawhar . you can try this method $ rm raw*.cln
dvc pull --glob "raw*.cln"
A raw1.cln
A raw2.cln
2 files added |
@karajan1001 I see that
|
@karajan1001 that only appears to work for you because it's in fact pulling all stages. You can see this by setting a breakpoint here: Lines 13 to 34 in 0797147
No targets are matches, so |
@dberenbaum |
So, can I go ahead and implement? I see the issue was tagged as a bug, so should I instead change how the |
Related #7323 |
@sjawhar We discussed this one, and you managed to find multiple issues with the globbing features, which need a lot of work 😅 . Addressing a group should be handled automatically as part of #7323, so simply Globbing outputs generally should not depend on what is checked out in the workspace. Otherwise, I agree it's not that useful for commands other than There's also confusion because |
We use parametrized stages in our repos a lot (
foreach
/do
). It's a hassle to have todvc pull stage@parameter
for each parameter. It would be great if we could add a--group
flag tofetch
,pull
, andcheckout
that would run for all parametrized versions of that stage. I thought--glob
would do this, but after digging into the code I can see that's not what we want.I called it
--glob
by taking the name of the kwarg fromStageLoad.load_all()
, which is the method called under the hood by these various commands:dvc pull
>Repo.pull
>Repo.fetch
>Repo.used_objs
>Index.used_objs
>StageLoad.collect_granular
>stage._collect_specific_target
>stage._maybe_collect_from_dvc_yaml
>StageLoad.load_all()
. 😅If there are no objections or if someone isn't already working on it, I'm happy to contribute. I'd just pass the optional
accept_group: bool = False
down through that call chain.The text was updated successfully, but these errors were encountered: