-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diff: consider including modified git-tracked files #3385
Comments
I'd vote 2. 3 if specified via a |
I'd vote for 2. as well, but maybe also 3. if Other possibly unnecessary comments
This seems like the tricky case: the default behavior, where
I think it's actually about [ DVC-tracked but (Git) uncommitted changes in current workspace ] -minus-
|
Now we have: All the options (probably exclude (1)) might be very convenient in different cases. The major question is - what should be used as a default. If we include Git file changes in the default output then we need to provide clear marks in the output what is data file/dir and what is not. If the default is (0) then it might be ok not to supply this mark (in JSON-format it might be still valuable). |
I'd vote for (2) if we come up with a good formating for Git/Data files. Otherwise - (0). |
@dmpetrov 0 - meaning no changes to metric files (no matter cached or not)? |
It supposed to work for changed data files including metrics, right? |
@dmpetrov I'm just trying to understand what is the difference between 0 and 1 then. |
0 is the current behavior |
@dmpetrov yep, that's what I meant by 1 :) may be I missed something important about the current behavior though, but I didn't mean to change it - 1 is the current one, at least the way I understood it initially. |
But it is not part of the "data" part of dvc, it is part of the pipelines, for which |
What is "data" part of DVC though? Is a metric (non cached) file is? Will new config files be?
one of the biggest difference between them is their purpose which also dictates their CLI interface for example. status is a summary of a default diff (workspace agains last commit), you can put a simplification like this. But with diff you can compare (and even show as a summary) to arbitrary objects. My point that even for git they might overlap but have specific needs and different interfaces, outputs, etc. I do see what you want to achieve with this - keep |
Closing as stale. Doesn't seem like there is any demand for this after vscode found a workaround. |
If the purpose of the
dvc diff
is to take a glance at what has changed in my iteration vs HEAD, or compare two experiments, it would be really convenient to include output ofgit diff --name-only
in the output.E.g. when I run
dvc diff baseline-experiment bigrams-experiment
, I get output like:it's fine, but can be misleading or not very informative. In this specific case there is also a change in the script
train.py
itself which is an essential part of the pipeline. I think, it would make to see at the name here.Any though @iterative/engineering ?
UPDATE: Giving a second thought to this, I see that it's not even about DVC-tracked (cached) files vs Git-tracked files but DVC outputs vs outputs+dependencies vs all files (DVC and Git).
So, three options:
The text was updated successfully, but these errors were encountered: