From cdf1bde1b2fa0dfe61f5eadd36e9d58336152f18 Mon Sep 17 00:00:00 2001 From: David de la Iglesia Castro Date: Tue, 6 Apr 2021 09:14:13 +0200 Subject: [PATCH 1/8] remove: Add example about removing single output file by name --- content/docs/command-reference/remove.md | 39 +++++++++++++++++++++--- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index 34253fbb24..380e4f6384 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -9,7 +9,7 @@ optionally delete them). usage: dvc remove [-h] [-q | -v] [--outs] targets [targets ...] positional arguments: - targets stages (found in dvc.yaml) or .dvc files to remove. + targets stages (found in dvc.yaml), .dvc files or name of output files to remove. ``` ## Description @@ -81,7 +81,7 @@ the workspace: ```yaml train: - cmd: python train.py data.py + cmd: python train.py data.csv deps: - data.csv - train.py @@ -91,7 +91,7 @@ train: ```dvc $ ls -dvc.lock dvc.yaml foo.csv foo.csv.dvc model train.py +dvc.lock dvc.yaml data.csv data.csv.dvc model train.py ``` Using `dvc remove` on the stage name will remove that entry from `dvc.yaml`, and @@ -101,7 +101,38 @@ deleted (just the `model` file in this example): ```dvc $ dvc remove train --outs $ ls -dvc.lock dvc.yaml foo.csv foo.csv.dvc train.py +dvc.lock dvc.yaml data.csv data.csv.dvc train.py ``` > Notice that the dependencies (`data.csv` and `train.py`) are not deleted. + +## Example: remove a single output file by name + +Imagine the same workspace as before but now we have multiple +`outs` for the `train` stage: + +```yaml +train: + cmd: python train.py data.csv + deps: + - data.csv + - train.py + outs: + - logs + - model.h5 +``` + +```dvc +$ ls +dvc.lock dvc.yaml foo.csv foo.csv.dvc logs model.h5 train.py +``` + +Using `dvc remove` you can remove a specific file using it's output name: + +```dvc +$ dvc remove model.h5 +$ ls +dvc.lock dvc.yaml data.csv data.csv.dvc logs train.py +``` + +> Notice that the other outputs (`logs`) are not deleted. From 405dcf30cfa19cc3a1b220ec8bfa637116c0480c Mon Sep 17 00:00:00 2001 From: David de la Iglesia Castro Date: Tue, 6 Apr 2021 09:37:55 +0200 Subject: [PATCH 2/8] Update remove.md --- content/docs/command-reference/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index 380e4f6384..cbf53d41d9 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -124,7 +124,7 @@ train: ```dvc $ ls -dvc.lock dvc.yaml foo.csv foo.csv.dvc logs model.h5 train.py +dvc.lock dvc.yaml data.csv data.csv.dvc logs model.h5 train.py ``` Using `dvc remove` you can remove a specific file using it's output name: From 63d864eed5705ccff90e44fbc02dd6c7f08bcb07 Mon Sep 17 00:00:00 2001 From: David de la Iglesia Castro Date: Fri, 9 Apr 2021 09:11:25 +0200 Subject: [PATCH 3/8] Update content/docs/command-reference/remove.md Co-authored-by: Jorge Orpinel --- content/docs/command-reference/remove.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index cbf53d41d9..6ec40213bc 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -9,7 +9,8 @@ optionally delete them). usage: dvc remove [-h] [-q | -v] [--outs] targets [targets ...] positional arguments: - targets stages (found in dvc.yaml), .dvc files or name of output files to remove. + targets Tracked files/directories, stage names (found in + dvc.yaml), or .dvc files to remove. ``` ## Description From d47b688983b936a72a27f52dbdb800693376a4a4 Mon Sep 17 00:00:00 2001 From: David de la Iglesia Castro Date: Tue, 13 Apr 2021 10:40:52 +0200 Subject: [PATCH 4/8] Update stop-tracking-data --- content/docs/user-guide/how-to/stop-tracking-data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/how-to/stop-tracking-data.md b/content/docs/user-guide/how-to/stop-tracking-data.md index 4f85b2befb..8184f87ffc 100644 --- a/content/docs/user-guide/how-to/stop-tracking-data.md +++ b/content/docs/user-guide/how-to/stop-tracking-data.md @@ -30,7 +30,7 @@ corresponding `.gitignore` entry). The data file is now no longer being tracked after this: ```dvc -$ dvc remove data.csv.dvc +$ dvc remove data.csv $ git status Untracked files: From a4921ad197822ca8f5811677e1bd965dab5842f4 Mon Sep 17 00:00:00 2001 From: David de la Iglesia Castro Date: Tue, 13 Apr 2021 11:54:20 +0200 Subject: [PATCH 5/8] Update semantics --- content/docs/command-reference/remove.md | 107 +++++++++++------------ 1 file changed, 53 insertions(+), 54 deletions(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index 6ec40213bc..95e91605f6 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -15,22 +15,24 @@ positional arguments: ## Description -Safely removes `.dvc` files or stages from `dvc.yaml`. This includes deleting -the corresponding `.gitignore` entries (based on the `outs` fields removed). +Safely removes tracked files/directories, stage names (found in `dvc.yaml`), or +`.dvc` files. This includes deleting the corresponding `.gitignore` entries. > `dvc remove` doesn't remove files from the DVC cache or > [remote storage](/doc/command-reference/remote). Use `dvc gc` for that. -It takes one or more stage names (see `-n` option of `dvc run`) or `.dvc` file -names as `targets`. +It takes one or more stage names (see `-n` option of `dvc run`), `.dvc` file +names or tracked files/directories as `targets`. If there are no stages left in `dvc.yaml` after the removal, then both `dvc.yaml` and `dvc.lock` are deleted. `.gitignore` is also deleted if there are no more entries left in it. -Note that the actual output files or directories of the stage -(`outs` field) are not removed by this command, unless the `--outs` option is -used. +Note that, when using stage name as target, the actual output files +or directories of the stage (`outs` field) are not removed by this command, +unless the `--outs` option is used which will remove **all** of them. +Alternatively, you can the names of individual output files or +directories of a stage as `targets`. 💡 Refer to [Undo Adding Data](/doc/user-guide/how-to/stop-tracking-data) to see how it helps replace data that is tracked by DVC. @@ -48,34 +50,7 @@ how it helps replace data that is tracked by DVC. - `-v`, `--verbose` - displays detailed tracing information. -## Example: remove a .dvc file - -Let's imagine we have `foo.csv` and `bar.csv` files, that are already -[tracked](/doc/command-reference/add) by DVC: - -```dvc -$ ls -bar.csv bar.csv.dvc foo.csv foo.csv.dvc -$ cat .gitignore -/foo.csv -/bar.csv -``` - -This removes `foo.csv.dvc` and double checks that its entry is gone from -`.gitignore`: - -```dvc -$ dvc remove foo.csv.dvc - -$ ls -bar.csv bar.csv.dvc foo.csv -$ cat .gitignore -/bar.csv -``` - -> The same procedure applies to tracked directories. - -## Example: remove a stage and its output +## Example target: stage name Let's imagine we have a `train` stage in `dvc.yaml`, and corresponding files in the workspace: @@ -87,53 +62,77 @@ train: - data.csv - train.py outs: - - model + - logs + - model.h5 ``` ```dvc $ ls -dvc.lock dvc.yaml data.csv data.csv.dvc model train.py +dvc.lock dvc.yaml data.csv data.csv.dvc model.h5 logs train.py + +$ cat .gitignore +/data.csv +/model.h5 +/logs ``` Using `dvc remove` on the stage name will remove that entry from `dvc.yaml`, and its outputs from `.gitignore`. With the `--outs` option, its outputs are also -deleted (just the `model` file in this example): +deleted (`logs` and `model.h5` in this example): ```dvc $ dvc remove train --outs + $ ls dvc.lock dvc.yaml data.csv data.csv.dvc train.py + +$ cat .gitignore +/data.csv ``` > Notice that the dependencies (`data.csv` and `train.py`) are not deleted. -## Example: remove a single output file by name +## Example target: tracked files/directories -Imagine the same workspace as before but now we have multiple -`outs` for the `train` stage: +Let's imagine we have the same initial workspace as before: -```yaml -train: - cmd: python train.py data.csv - deps: - - data.csv - - train.py - outs: - - logs - - model.h5 +```dvc +$ ls +dvc.lock dvc.yaml data.csv data.csv.dvc model.h5 logs train.py + +$ cat .gitignore +/data.csv +/model.h5 +/logs ``` +Using `dvc remove` on a tracked file name will remove the corresponding `.dvc` +file and `gitignore` entry: + ```dvc +$ dvc remove data.csv + $ ls -dvc.lock dvc.yaml data.csv data.csv.dvc logs model.h5 train.py +dvc.lock dvc.yaml data.csv model.h5 logs train.py + +$ cat .gitignore +/model.h5 +/logs ``` -Using `dvc remove` you can remove a specific file using it's output name: +> The same procedure applies to tracked directories. + +In addition, `dvc remove` can also be used on individual output +files or directories of a stage.: ```dvc $ dvc remove model.h5 + $ ls -dvc.lock dvc.yaml data.csv data.csv.dvc logs train.py +dvc.lock dvc.yaml data.csv logs train.py + +$ cat .gitignore +/logs ``` -> Notice that the other outputs (`logs`) are not deleted. +> Note than in this case the file is being removed from the workspace. From 895cf55e4885f1bb86a5343a25b9786373a1a274 Mon Sep 17 00:00:00 2001 From: David de la Iglesia Castro Date: Tue, 13 Apr 2021 12:04:44 +0200 Subject: [PATCH 6/8] Add stage output files/directories --- content/docs/command-reference/remove.md | 64 +++++++++++++++++++----- 1 file changed, 51 insertions(+), 13 deletions(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index 95e91605f6..684a2db8c0 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -92,9 +92,20 @@ $ cat .gitignore > Notice that the dependencies (`data.csv` and `train.py`) are not deleted. -## Example target: tracked files/directories +## Example target: stage output files/directories -Let's imagine we have the same initial workspace as before: +Assuming we have the same initial workspace as before: + +```yaml +train: + cmd: python train.py data.csv + deps: + - data.csv + - train.py + outs: + - logs + - model.h5 +``` ```dvc $ ls @@ -106,33 +117,60 @@ $ cat .gitignore /logs ``` -Using `dvc remove` on a tracked file name will remove the corresponding `.dvc` -file and `gitignore` entry: +`dvc remove` can also be used on **individual** output files or +directories of a stage: ```dvc -$ dvc remove data.csv +$ dvc remove model.h5 $ ls -dvc.lock dvc.yaml data.csv model.h5 logs train.py +dvc.lock dvc.yaml data.csv data.csv.dvc logs train.py $ cat .gitignore -/model.h5 +/data.csv /logs ``` -> The same procedure applies to tracked directories. +The output file is actually being removed from the +workspace and `.gitignore` but `dvc.yaml` is not being updated. -In addition, `dvc remove` can also be used on individual output -files or directories of a stage.: +## Example target: tracked files/directories + +Assuming we have the same initial workspace as before: + +```yaml +train: + cmd: python train.py data.csv + deps: + - data.csv + - train.py + outs: + - logs + - model.h5 +``` ```dvc -$ dvc remove model.h5 +$ ls +dvc.lock dvc.yaml data.csv data.csv.dvc model.h5 logs train.py + +$ cat .gitignore +/data.csv +/model.h5 +/logs +``` + +Using `dvc remove` on a tracked file name will remove the corresponding `.dvc` +file and `gitignore` entry: + +```dvc +$ dvc remove data.csv $ ls -dvc.lock dvc.yaml data.csv logs train.py +dvc.lock dvc.yaml data.csv model.h5 logs train.py $ cat .gitignore +/model.h5 /logs ``` -> Note than in this case the file is being removed from the workspace. +> The same procedure applies to tracked directories. From ba1bdf5801e358f8bcefbd6c8c81ade50188eafa Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 27 Apr 2021 17:32:56 -0500 Subject: [PATCH 7/8] Update content/docs/command-reference/remove.md --- content/docs/command-reference/remove.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index 684a2db8c0..6a71988e19 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -15,8 +15,8 @@ positional arguments: ## Description -Safely removes tracked files/directories, stage names (found in `dvc.yaml`), or -`.dvc` files. This includes deleting the corresponding `.gitignore` entries. +Safely removes tracked data (by file name, stage name, or `.dvc` file path). +This includes deleting the corresponding `.gitignore` entries. > `dvc remove` doesn't remove files from the DVC cache or > [remote storage](/doc/command-reference/remote). Use `dvc gc` for that. From 1daeb5fbb1a0083394e8c74002497d3d78b621fe Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 27 Apr 2021 18:03:43 -0500 Subject: [PATCH 8/8] Apply suggestions from code review --- content/docs/command-reference/remove.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/content/docs/command-reference/remove.md b/content/docs/command-reference/remove.md index 6a71988e19..d031690c2d 100644 --- a/content/docs/command-reference/remove.md +++ b/content/docs/command-reference/remove.md @@ -50,7 +50,7 @@ how it helps replace data that is tracked by DVC. - `-v`, `--verbose` - displays detailed tracing information. -## Example target: stage name +## Example: Remove stage outputs Let's imagine we have a `train` stage in `dvc.yaml`, and corresponding files in the workspace: @@ -76,9 +76,9 @@ $ cat .gitignore /logs ``` -Using `dvc remove` on the stage name will remove that entry from `dvc.yaml`, and -its outputs from `.gitignore`. With the `--outs` option, its outputs are also -deleted (`logs` and `model.h5` in this example): +Using `dvc remove` on the stage name will remove the stage from `dvc.yaml`, and +corresponding entries from `.gitignore`. With the `--outs` option, the actual +files and directories are deleted too (`logs/` and `model.h5` in this example): ```dvc $ dvc remove train --outs @@ -92,7 +92,7 @@ $ cat .gitignore > Notice that the dependencies (`data.csv` and `train.py`) are not deleted. -## Example target: stage output files/directories +## Example: remove a specific stage output Assuming we have the same initial workspace as before: @@ -117,8 +117,8 @@ $ cat .gitignore /logs ``` -`dvc remove` can also be used on **individual** output files or -directories of a stage: +`dvc remove` can also be used on **individual** outputs of a +stage (by file name): ```dvc $ dvc remove model.h5 @@ -131,10 +131,10 @@ $ cat .gitignore /logs ``` -The output file is actually being removed from the -workspace and `.gitignore` but `dvc.yaml` is not being updated. +`model.h5` file is removed from the workspace and `.gitignore`, +but note that `dvc.yaml` is not updated. -## Example target: tracked files/directories +## Example: remove specific data Assuming we have the same initial workspace as before: