Skip to content

Commit

Permalink
Merge pull request #2077 from iterative/1.0-updates
Browse files Browse the repository at this point in the history
dvc: still pending 1.x updates (3)
  • Loading branch information
jorgeorpinel authored Jan 7, 2021
2 parents e486358 + 8065a36 commit 275a3ec
Show file tree
Hide file tree
Showing 22 changed files with 113 additions and 80 deletions.
8 changes: 4 additions & 4 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ A `dvc add` target can be either a file or a directory. In the latter case, a
`.dvc` file is created for the top of the hierarchy (with default name
`<dir_name>.dvc`).

Every file inside is stored in the cache (unless the `--no-commit` option is
used), but DVC does not produce individual `.dvc` files for each file in the
entire tree. Instead, the single `.dvc` file references a special JSON file in
the cache (with `.dir` extension), that in turn points to the added files.
Every file in the dir is cached normally (unless the `--no-commit` option is
used), but DVC does not produce individual `.dvc` files for each one. Instead,
the single `.dvc` file references a special JSON file in the cache (with `.dir`
extension), that in turn points to the added files.

> Refer to
> [Structure of cache directory](/doc/user-guide/dvc-internals#structure-of-the-cache-directory)
Expand Down
11 changes: 4 additions & 7 deletions content/docs/command-reference/cache/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,12 @@ positional arguments:

## Description

The DVC Cache is where your data files, models, etc. (anything you want to
version with DVC) are actually stored. The data files and directories visible in
the <abbr>workspace</abbr> are links\* to (or copies of) the ones in cache.
Learn more about it's
[structure](/doc/user-guide/dvc-internals#structure-of-the-cache-directory).
Tracked files and directories visible in the <abbr>workspace</abbr> are links\*
to the ones in the project's <abbr>cache</abbr>.

> \* Refer to
> \* Or copies. Refer to
> [File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
> for more information on file links on different platforms.
> for more information on supported linking on different platforms.
For cache configuration options, refer to `dvc config cache`.

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/commit.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ $ pip install -r src/requirements.txt
Download the precomputed data using:

```dvc
$ dvc pull --all-branches --all-tags
$ dvc pull -aT
```

</details>
Expand Down
10 changes: 4 additions & 6 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,11 +131,8 @@ remote. See `dvc remote` for more information.

### cache

A DVC project <abbr>cache</abbr> is the hidden storage (by default located in
the `.dvc/cache` directory) for files that are tracked by DVC, and their
different versions. (See `dvc cache` and
[DVC Files and Directories](/doc/user-guide/dvc-internals#structure-of-the-cache-directory)
for more details.) This section contains the following options:
This section contains the following options, which affect the project's
<abbr>cache</abbr>:

- `cache.dir` - set/unset cache directory location. A correct value is either an
absolute path, or a path **relative to the config file location**. The default
Expand Down Expand Up @@ -279,7 +276,8 @@ or to a relative path (resolved from `./.dvc/`):

```dvc
$ dvc config cache.dir ../../mycache
$ dvc pull -q
$ dvc pull
$ ls ../mycache
2f/
```
Expand Down
1 change: 1 addition & 0 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ but it also creates an import stage (`.dvc` file) with a link to the data source

```yaml
md5: 7de90e7de7b432ad972095bc1f2ec0f8
frozen: true
wdir: .
deps:
- path: data/data.xml
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ $ pip install -r src/requirements.txt
Download the precomputed data using:

```dvc
$ dvc pull --all-branches --all-tags
$ dvc pull -aT
```

</details>
Expand Down
28 changes: 28 additions & 0 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,31 @@ images/dvc-logo-outlines.png.dvc
images/owl_sticker.png
...
```

## Example: Create an archive of you DVC project

Just like you can use `git archive` to make a quick bundle (ZIP) file of the
current code, `dvc list` can be easily complemented with simple archive tools to
bundle the current data files in the project.

For example, here's a TAR archive of the entire <abbr>workspace</abbr>
(Linux/GNU):

```dvc
$ dvc list . -R | tar -cvf project.tar
```

Or separate ZIP archives of code and DVC-tracked data (POSIX terminal with
`zip`):

```
$ git archive -o code.zip HEAD
$ dvc list . -R --dvc-only | zip -@ data.zip
```

ZIP alternative for [POSIX on Windows](/doc/user-guide/running-dvc-on-windows)
(Python installed):

```dvc
$ dvc list . -R --dvc-only | xargs python -m zipfile -c data.zip
```
4 changes: 2 additions & 2 deletions content/docs/command-reference/metrics/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ Print [metrics](/doc/command-reference/metrics), with optional formatting.
## Synopsis

```usage
usage: dvc metrics show [-h] [-q | -v] [-a] [-T] [--all-commits] [-R]
[--show-json] [--show-md]
usage: dvc metrics show [-h] [-q | -v] [-a] [-T] [--all-commits]
[--show-json] [--show-md] [-R]
[targets [targets ...]]
positional arguments:
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ params `lr`, `layers`, and `epochs` from the params file above. Full paths
should be used to specify `layers` and `epochs` from the `train` group:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p lr,train.epochs,train.layers \
python train.py
```
Expand Down Expand Up @@ -143,7 +143,7 @@ Alternatively, the entire group of parameters `train` can be referenced, instead
of specifying each of the params separately:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
```
Expand All @@ -160,7 +160,7 @@ Note that this file name can be redefined using a prefix in the `-p` argument of
`dvc run`. In our case:

```dvc
$ dvc run -n train -d logs/ -o users.csv \
$ dvc run -n train -d train.py -d logs/ -o users.csv -f \
-p parse_params.yaml:threshold,classes_num \
python train.py
```
Expand Down Expand Up @@ -203,7 +203,7 @@ The following [stage](/doc/command-reference/run) depends on params `BOOL`,
`INT`, as well as `TrainConfig`'s `EPOCHS` and `layers`:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \
python train.py
```
Expand Down Expand Up @@ -248,7 +248,7 @@ can be referenced
supported), instead of the parameters in it:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TestConfig \
python train.py
```
Expand Down
7 changes: 4 additions & 3 deletions content/docs/command-reference/plots/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,10 @@ Note, a new field _y_ was added to `dvc.yaml` file for the plot. Please do not
forget to commit the change in Git if the modification needs to be preserved.

```yaml
- logs.csv:
cache: false
y: accuracy
plots:
- logs.csv:
cache: false
y: accuracy
```
Changing the plot `title` and `x-label`:
Expand Down
3 changes: 2 additions & 1 deletion content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,8 @@ to retrieve part of the data?
```dvc
$ dvc pull --with-deps featurize
... Use the partial update, then pull the remaining data:
# Use the partial update...
# Then pull the remaining data:
$ dvc pull
Everything is up to date.
Expand Down
7 changes: 3 additions & 4 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ the default remote:
$ dvc push
```
Push <abbr>outputs</abbr> of a specific `.dvc` file only:
Push files related to a specific `.dvc` file only:

```dvc
$ dvc push data.zip.dvc
Expand Down Expand Up @@ -165,12 +165,11 @@ want to upload part of the data?
```dvc
$ dvc push --with-deps test-posts
... Do some work based on the partial update
# Do some work based on the partial update...
# Then push the rest of the data:
$ dvc push --with-deps matrix-train
... Push the rest of the data
$ dvc status --cloud
Cache and remote 'r1' are in sync.
```
Expand Down
16 changes: 8 additions & 8 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@ positional arguments:

Provides a way to regenerate data pipeline results, by restoring the dependency
graph (a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) implicitly
defined by the stages listed in `dvc.yaml`. The commands defined in these stages
are then be executed in the correct order.
defined by the stages listed in `dvc.yaml` files. The commands defined in these
stages are then be executed in the correct order.

For stages with multiple commands (having a list or a multiline string in the
`cmd` field), commands are run one after the other in the order they are
defined. The failure of any command will halt the remaining stage execution, and
raises an error.

> Pipeline stages are defined in a `dvc.yaml` file (either manually or by using
> Pipeline stages are defined in `dvc.yaml` (either manually or by using
> `dvc run`) while initial data dependencies can be registered with `dvc add`.
This command is similar to [Make](https://www.gnu.org/software/make/) in
Expand Down Expand Up @@ -187,7 +187,7 @@ up-to-date and only execute the final stage.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if all
stages are up to date or if all stages are successfully executed, otherwise
exit with 1. The commands defined in the stage are free to write output
exit with 1. The command defined in the stage is free to write output
regardless of this flag.

- `-v`, `--verbose` - displays detailed tracing information.
Expand Down Expand Up @@ -270,8 +270,8 @@ If we now run `dvc repro`, we should see this:
```dvc
$ dvc repro
Stage 'filter' didn't change, skipping
Running stage 'count':
> python process.py numbers.txt > count.txt
Running stage 'count' with command:
python process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

Expand Down Expand Up @@ -307,8 +307,8 @@ of only the target (`count`) and following stages (none in this case):

```dvc
$ dvc repro --downstream count
Running stage 'count':
> python process.py numbers.txt > count.txt
Running stage 'count' with command:
python process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

Expand Down
5 changes: 4 additions & 1 deletion content/docs/command-reference/root.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,14 @@ Use this command to build fixed paths to dependencies, files, or stage

- `-v`, `--verbose` - displays detailed tracing information.

## Example: Basic output
## Examples

Basic demonstration:

```dvc
$ dvc root
.
$ mkdir subdir
$ cd subdir
$ dvc root
Expand Down
20 changes: 10 additions & 10 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ so on (see `dvc dag`). This graph can be restored by DVC later to modify or

```dvc
$ dvc run -n printer -d write.sh -o pages ./write.sh
$ dvc run -n scanner -d read.sh -d pages -o signed.pdf ./read.sh
$ dvc run -n scanner -d read.sh -d pages -o signed.pdf ./read.sh pages
```

Stage dependencies can be any file or directory, either untracked, or more
Expand Down Expand Up @@ -150,8 +150,8 @@ like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
variables in it that should be evaluated dynamically. Examples:

```dvc
$ dvc run -n my_stage "./my_script.sh > /dev/null 2>&1"
$ dvc run -n my_stage './my_script.sh $MYENVVAR'
$ dvc run -n first_stage "./a_script.sh > /dev/null 2>&1"
$ dvc run -n second_stage './another_script.sh $MYENVVAR'
```

## Options
Expand Down Expand Up @@ -317,17 +317,17 @@ dataset (`20180226` is a seed value):

```dvc
$ dvc run -n train \
-d matrix-train.p -d train_model.py \
-o model.p \
python train_model.py matrix-train.p 20180226 model.p
-d train_model.py -d matrix-train.p -o model.p \
python train_model.py 20180226 model.p
```

To update a stage that is already defined, the `-f` (`--force`) option is
needed. Let's update the seed for the `train` stage:

```dvc
$ dvc run -n train -f -d matrix-train.p -d train_model.py -o model.p \
python train_model.py matrix-train.p 18494003 model.p
$ dvc run -n train --force \
-d train_model.p -d matrix-train.p -o model.p \
python train_model.py 18494003 model.p
```

## Example: Separate stages in a subdirectory
Expand Down Expand Up @@ -421,9 +421,9 @@ Define a stage with both regular dependencies as well as parameter dependencies:

```dvc
$ dvc run -n train \
-d matrix-train.p -d train_model.py -o model.p \
-d train_model.py -d matrix-train.p -o model.p \
-p seed,train.lr,train.epochs
python train_model.py matrix-train.p model.p
python train_model.py 20200105 model.p
```

`train_model.py` will include some code to open and parse the parameters:
Expand Down
4 changes: 2 additions & 2 deletions content/docs/start/data-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ $ dvc run -n prepare \
```

A `dvc.yaml` file is generated. It includes information about the command we ran
(`python src/prepare.py`), its <abbr>dependencies</abbr>, and
(`python src/prepare.py data/data.xml`), its <abbr>dependencies</abbr>, and
<abbr>outputs</abbr>.

<details>
Expand Down Expand Up @@ -129,8 +129,8 @@ stages:
prepare:
cmd: python src/prepare.py data/data.xml
deps:
- data/data.xml
- src/prepare.py
- data/data.xml
params:
- prepare.seed
- prepare.split
Expand Down
8 changes: 5 additions & 3 deletions content/docs/use-cases/shared-development-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,9 @@ Let's say you are cleaning up raw data for later stages:

```dvc
$ dvc add raw
$ dvc run -n clean_data -d raw -o clean ./cleanup.py raw clean
# The data is cached in the shared location.
$ dvc run -n clean_data -d cleanup.py -d raw -o clean \
./cleanup.py raw clean
# The data is cached in the shared location.
$ git add raw.dvc dvc.yaml dvc.lock .gitignore
$ git commit -m "cleanup raw data"
$ git push
Expand All @@ -97,7 +98,8 @@ manually. After this, they could decide to continue building this
$ git pull
$ dvc checkout
A raw # Data is linked from cache to workspace.
$ dvc run -n process_clean_data -d clean -o processed ./process.py clean process
$ dvc run -n process_clean_data -d process.py -d clean -o processed
./process.py clean processed
$ git add dvc.yaml dvc.lock
$ git commit -m "process clean data"
$ git push
Expand Down
8 changes: 4 additions & 4 deletions content/docs/user-guide/basic-concepts/dvc-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: 'DVC Cache'
match: ['DVC cache', cache, caches, cached, 'cache directory']
---

The DVC cache is a hidden storage (by default located in the `.dvc/cache`
directory) for files that are tracked by DVC, and their different versions.
Learn more about it's
[structure](/doc/user-guide/dvc-internals#structure-of-the-cache-directory).
The DVC cache is a hidden storage (by default in `.dvc/cache`) for files and
directories tracked by DVC, and their different versions. Learn more about it's
structure
[here](/doc/user-guide/dvc-files-and-directories#structure-of-the-cache-directory).
Loading

0 comments on commit 275a3ec

Please sign in to comment.