-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove
: output file/dir by path
#2357
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @daavoo ! That's an important change that we never had time to start making.
From reading this update (and not taking a look at the DVC PR) the way end users would read - I have a few questions. Mostly, it's not clear what semantics this operation has. I put some comments in the PR, it would be great if we could go and improve the docs/terminology.
Thanks for the review, I realized that I need to change many more parts of the doc page in order to better explain the change. |
## Example: remove a single output file by name | ||
|
||
Imagine the same <abbr>workspace</abbr> as before but now we have multiple | ||
`outs` for the `train` stage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Example: remove a single output file by name | |
Imagine the same <abbr>workspace</abbr> as before but now we have multiple | |
`outs` for the `train` stage: | |
## Example: remove a stage output by file name | |
Imagine the we have multiple <abbr>outputs</abbr> for a `train` stage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why focus the example on stage outputs? What about making it about tracked data in general? Showing both removing an added dataset (dir), as well as a stage output (a model), for example. And/or update a previous example to mention the alternative to targeting .dvc files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right, I was too focused on stage outputs because that was the "use case" described in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happens. And we can go with this for now (just needs some fixes as indicated by Ivan) and I'll change it later. Unless you don't mind reworking it in a more general form along the way 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting this change is being definitely more challenging than implementing it xD.
@jorgeorpinel After reading iterative/dvc#2575 (comment) , I'm thinking that maybe we should remove the example dvc remove data.csv.dvc
in favor of dvc remove data.csv
, given that the behavior is now the same but new users might don't be aware of the concept of .dvc
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing section ## Example: remove a tracked file
kind of overlaps in content with the separate page # How to Stop Tracking Data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @daavoo, I can wrap this up once iterative/dvc#5757 is merged.
Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
After trying to better explain the semantics I have now the feeling that the underlying problem might be that the current |
/logs | ||
``` | ||
|
||
The <abbr>output</abbr> file is actually being removed from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daavoo besides removing it from the .gitignore
, does it mean that we have pretty much rm -f
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently. Is that a bug @shcheklein ? It feels like it should be smart enough to edit dvc.yaml in this case.
@@ -30,7 +30,7 @@ corresponding `.gitignore` entry). The data file is now no longer being tracked | |||
after this: | |||
|
|||
```dvc | |||
$ dvc remove data.csv.dvc | |||
$ dvc remove data.csv | |||
|
|||
$ git status | |||
Untracked files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't be untracked, right? it will become missing?
Note that, when using stage name as target, the actual <abbr>output</abbr> files | ||
or directories of the stage (`outs` field) are not removed by this command, | ||
unless the `--outs` option is used which will remove **all** of them. | ||
Alternatively, you can the names of individual <abbr>output</abbr> files or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in this case we don't do anything to the .dvc or dvc.yaml?
this mixed semantics confuses me a bit to be honest
with some targets we touch dvc.yaml, but not data ... with some targets we touch data, but not DVC files
also, --outs
that is not clear how it should be have when file/directory path is provided
It feels to me that even if we merge this (and the DVC core one), we'll need to get back to the whiteboard pretty soon with this dvc remove
that we are long overdue cc @efiop @dberenbaum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, not sure if this has enough value to do this now at least - it seems it's pretty much a replacement to rm -rf
(+git command to remove .gitignore), unless I'm missing something? @daavoo what is your use case that you had in mind?
sorry, that we didn't get back to you initially in the DVC core ticket (or may be I missed the discussion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, not sure if this has enough value to do this now at least - it seems it's pretty much a replacement to
rm -rf
(+git command to remove .gitignore), unless I'm missing something? @daavoo what is your use case that you had in mind?sorry, that we didn't get back to you initially in the DVC core ticket (or may be I missed the discussion).
I agree with the problem of mixed semantics @shcheklein . Sadly I don't think I'm in position to explain why
(an intended use case) but rather how
(what the new behavior is); I just picked a Good First Issue
ticket to force myself to dive into the source code 😅 .
I think the original use cases were:
- remove one of many stage outputs: misleading error message dvc#4497
- ui: add a way to undo
dvc add
easily dvc#2575 (comment)
But maybe I misinterpreted the use cases along the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry for the lack of clarity here after you've already put in so much work @daavoo.
but in this case we don't do anything to the .dvc or dvc.yaml?
I just tested, and it looks like .dvc is removed with this command but dvc.yaml won't be touched. Is that right @daavoo?
There might be enough straightforward utility here to merge the PR if it is limited to making dvc remove data.xml
equivalent to dvc remove data.xml.dvc
to support iterative/dvc#2575 (comment).
The rest might be valuable, too, but it seems like the semantics need to be straightened out in iterative/dvc#5791 first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you consider that we need to take a step back and put this and the dvc P.R. on pause or even discarding them in favor of waiting to find the right semantics, I would not mind it at all.
I wanted to dive into the source code and your thoughtful development process and I did; don't feel obligated to try to merge this just because of the work i already put.
> `dvc remove` doesn't remove files from the DVC <abbr>cache</abbr> or | ||
> [remote storage](/doc/command-reference/remote). Use `dvc gc` for that. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to have an example (maybe just extending the one below) that shows how to remove the data from the cache, although I don't know that current gc functionality is sufficient to really be useful in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy edits on the examples (will commit these)
## Example: remove a stage and its output | ||
## Example: remove a specific stage output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need this example AND the next one ("remove specific data" from .dvc files)
We can take this over once iterative/dvc#5757 is done. @daavoo if you don't mind just reminding us when/if that happens. Thanks! |
@jorgeorpinel Thanks for taking a look! I think this PR is on pause pending iterative/dvc#5791 (see #2357 (comment)). |
remove
: output file/dir by path
Closing this as stale for now, let's get back to this when core repo is merged. |
Docs for: iterative/dvc#5757