Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outputs as target supporting for dvc status #4433

Merged
merged 4 commits into from
Aug 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions dvc/repo/status.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,32 @@
logger = logging.getLogger(__name__)


def _joint_status(stages):
def _joint_status(pairs):
status_info = {}

for stage in stages:
for stage, filter_info in pairs:
if stage.frozen and not stage.is_repo_import:
logger.warning(
"{} is frozen. Its dependencies are"
" not going to be shown in the status output.".format(stage)
)

status_info.update(stage.status(check_updates=True))
if not filter_info:
status_info.update(stage.status(check_updates=True))
else:
for out in stage.filter_outs(filter_info):
status_info.update(out.status())
Comment on lines +22 to +26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push this to Stage::status() or Stage::_status_outs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should keep {stage_name: stage_status} format, to make it in line with --show-json.

Copy link
Contributor Author

@karajan1001 karajan1001 Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @karajan1001 for the PR. Looks good to me, although, have a few suggestions.
@skshetry , Thanks for your suggestions.

Also, we should keep {stage_name: stage_status} format, to make it in line with --show-json.
Here, I just follow the format of dvc status -c [outputs]. Besides, if we show

"alice_bob":  
        "changed outs": 
                "alice": "modified" 

We give misinformation that there is only one output "alice" in stage "alice_bob", here we need more emphasis on outputs than stages.

Should we push this to Stage::status() or Stage::_status_outs?
It has a different format from the current Stage::status() and Stage::_status_outs now. So we can't reuse them here.

Maybe we can discuss the output format more in #2180?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, better to be consistent here. If we need better CLI output, we can handle that in dvc/commands. But, let's wait for others, let's see what they'll say.

It has a different format from the current Stage::status() and Stage::_status_outs now

Regarding status, we could add filter_info to it so that you could just do

Suggested change
if not filter_info:
status_info.update(stage.status(check_updates=True))
else:
for out in stage.filter_outs(filter_info):
status_info.update(out.status())
status_info.update(stage.status(filter_outs=filter_info))

But this would make sense if we made result consistent.

Copy link
Contributor Author

@karajan1001 karajan1001 Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, better to be consistent here. If we need better CLI output, we can handle that in dvc/commands. But, let's wait for others, let's see what they'll say.

It has a different format from the current Stage::status() and Stage::_status_outs now

Regarding status, we could add filter_info to it so that you could just do

But this would make sense if we made result consistent.

@skshetry
The question is dvc/commands didn't know targets are stages or outputs.

Actually I think the most elegant way is that collect_granular returns a list of outputs or stages or files, and they share the same API .status() which returns the result. But we need another object FilePathSlot which is at a finer granularity than outputs to get rid of filter_info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with @skshetry here,

"alice_bob":  
        "changed outs": 
                "alice": "modified" 

In that case we inform user that only one out has changed, but that does not mean that this particular stage has only this output. I would say that status primary function is to inform about changes, and there is no need to list all of its outputs, if they did not change.

Copy link
Contributor Author

@karajan1001 karajan1001 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with @skshetry here,

"alice_bob":  
        "changed outs": 
                "alice": "modified" 

In that case we inform user that only one out has changed, but that does not mean that this particular stage has only this output. I would say that status primary function is to inform about changes, and there is no need to list all of its outputs, if they did not change.

@pared Thank you.
How about the status -c? Do they also need to follow this format, or just keep the current one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001 This is a good question.
So in case of our current test, if we commit modified alice we will get Data and pipelines up to date on dvc status
and we will get new: alice on dvc status -c. As this issue is about filtering the status I would keep output of dvc status -c as is and create issue discussing whether we should have consistent output format for both dvc status and dvc status -c.


return status_info


def _local_status(self, targets=None, with_deps=False, recursive=False):
if targets:
stages = cat(
self.collect(t, with_deps=with_deps, recursive=recursive)
for t in targets
)
else:
stages = self.collect(None, with_deps=with_deps, recursive=recursive)
targets = targets or [None]
pairs = cat(
self.collect_granular(t, with_deps=with_deps, recursive=recursive)
for t in targets
)

return _joint_status(stages)
return _joint_status(pairs)


def _cloud_status(
Expand Down
8 changes: 4 additions & 4 deletions dvc/stage/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,7 @@ def run(self, dry=False, no_commit=False, force=False, run_cache=True):
if not no_commit:
self.commit()

def _filter_outs(self, path_info):
def filter_outs(self, path_info):
def _func(o):
return path_info.isin_or_eq(o.path_info)

Expand All @@ -451,7 +451,7 @@ def _func(o):
@rwlocked(write=["outs"])
def checkout(self, **kwargs):
stats = defaultdict(list)
for out in self._filter_outs(kwargs.get("filter_info")):
for out in self.filter_outs(kwargs.get("filter_info")):
key, outs = self._checkout(out, **kwargs)
if key:
stats[key].extend(outs)
Expand Down Expand Up @@ -526,14 +526,14 @@ def outs_cached(self):
def get_all_files_number(self, filter_info=None):
return sum(
out.get_files_number(filter_info)
for out in self._filter_outs(filter_info)
for out in self.filter_outs(filter_info)
)

def get_used_cache(self, *args, **kwargs):
from dvc.cache import NamedCache

cache = NamedCache()
for out in self._filter_outs(kwargs.get("filter_info")):
for out in self.filter_outs(kwargs.get("filter_info")):
cache.update(out.get_used_cache(*args, **kwargs))

return cache
Expand Down
19 changes: 19 additions & 0 deletions tests/func/test_status.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,22 @@ def test_status_recursive(tmp_dir, dvc):
}
],
}


def test_status_outputs(tmp_dir, dvc):
tmp_dir.dvc_gen({"foo": "foo", "bar": "bar"})
dvc.run(
outs=["alice", "bob"],
deps=["foo", "bar"],
cmd="echo alice>alice && echo bob>bob",
name="alice_bob",
)
tmp_dir.gen({"alice": "new alice", "bob": "new bob"})

assert dvc.status(targets=["alice_bob"]) == {
"alice_bob": [
{"changed outs": {"alice": "modified", "bob": "modified"}}
]
}

assert dvc.status(targets=["alice"]) == {"alice": "modified"}