Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions content/docs/command-reference/commit.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,6 @@ $ git status -s
M src/train.py

$ dvc status

train.dvc:
changed deps:
modified: src/train.py
Expand All @@ -275,7 +274,6 @@ dependencies ['src/train.py'] of 'train.dvc' changed.
Are you sure you commit it? [y/n] y

$ dvc status

Data and pipelines are up to date.
```

Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,8 @@ into our local <abbr>cache</abbr>.
```dvc
$ dvc status --cloud
...
deleted: data/features/train.pkl
deleted: model.pkl
deleted: data/features/train.pkl
deleted: model.pkl

$ dvc fetch

Expand Down
21 changes: 10 additions & 11 deletions content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,20 +31,19 @@ directory. (Analogous to `wget`, but for repos.)
> directories to download.

The `url` argument specifies the address of the DVC or Git repository containing
the data source. Both HTTP and SSH protocols are supported for online repos
(e.g. `[user@]server:project.git`). `url` can also be a local file system path
to an "offline" repo (if it's a DVC repo without a default remote, instead of
downloading, DVC will try to copy the target data from its <abbr>cache</abbr>).
the data source. Both HTTP and SSH protocols are supported (e.g.
`[user@]server:project.git`). `url` can also be a local file system path.

The `path` argument is used to specify the location of the target to download
within the source repository at `url`. `path` can specify any file or directory
in the source repo, either tracked by DVC (including paths inside tracked
directories) or by Git. Note that DVC-tracked targets must be found in a
`dvc.yaml` or `.dvc` file of the repo.

⚠️ The project should have a default
[DVC remote](/doc/command-reference/remote), containing the actual data for this
command to work.
tracked by either Git or DVC (including paths inside tracked directories). Note
that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the
repo.

⚠️ DVC repos should have a default [DVC remote](/doc/command-reference/remote)
containing the target actual for this command to work. The only exception is for
local repos, where DVC will try to copy the data from its <abbr>cache</abbr>
first.

> See `dvc get-url` to download data from other supported locations such as S3,
> SSH, HTTP, etc.
Expand Down
8 changes: 6 additions & 2 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,12 @@ $ dvc run -n download_data \
wget https://data.dvc.org/get-started/data.xml -O data.xml
```

`dvc import-url` generates an import stage `.dvc` file and `dvc run` a regular
stage (in `dvc.yaml`).
`dvc import-url` generates an <abbr>import stage</abbr> `.dvc` file and
`dvc run` a regular stage (in `dvc.yaml`).

⚠️ DVC won't push or pull imported data to/from
[remote storage](/doc/command-reference/remote), it will rely on it's original
source.

## Options

Expand Down
34 changes: 20 additions & 14 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,21 +34,19 @@ updating the import later, if it has changed in its data source. (See
> directories to import.

The `url` argument specifies the address of the DVC or Git repository containing
the data source. Both HTTP and SSH protocols are supported for online repos
(e.g. `[user@]server:project.git`). `url` can also be a local file system path
to an "offline" repo (if it's a DVC repo without a default remote, instead of
downloading, DVC will try to copy the target data from its <abbr>cache</abbr>).
the data source. Both HTTP and SSH protocols are supported (e.g.
`[user@]server:project.git`). `url` can also be a local file system path.

The `path` argument is used to specify the location of the target to download
within the source repository at `url`. `path` can specify any file or directory
in the source repo, either tracked by DVC (including paths inside tracked
directories) or by Git. Note that DVC-tracked targets must be found in a
`dvc.yaml` or `.dvc` file of the repo. Chained imports (importing data that was
imported into the source repo at `url`) are not supported, however.
tracked by either Git or DVC (including paths inside tracked directories). Note
that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the
repo.

⚠️ The project should have a default
[DVC remote](/doc/command-reference/remote), containing the actual data for this
command to work.
⚠️ DVC repos should have a default [DVC remote](/doc/command-reference/remote)
containing the target actual for this command to work. The only exception is for
local repos, where DVC will try to copy the data from its <abbr>cache</abbr>
first.

> See `dvc import-url` to download and track data from other supported locations
> such as S3, SSH, HTTP, etc.
Expand All @@ -66,6 +64,10 @@ path in the <abbr>workspace</abbr>. It records enough metadata about the
imported data to enable DVC efficiently determining whether the local copy is
out of date.

⚠️ DVC won't push or pull imported data to/from
[remote storage](/doc/command-reference/remote), it will rely on it's original
source.

To actually [version the data](/doc/tutorials/get-started/data-versioning),
`git add` (and `git commit`) the import stage.

Expand All @@ -74,6 +76,9 @@ Note that import stages are considered always
they won't be updated. Use `dvc update` to update the downloaded data artifact
from the source repo.

Also note that chained imports (importing data that was imported into the source
repo at `url`) are not supported.

## Options

- `-o <path>`, `--out <path>` - specify a path to the desired location in the
Expand Down Expand Up @@ -112,9 +117,10 @@ Importing 'data/data.xml (git@github.com:iterative/example-get-started)'
```

In contrast with `dvc get`, this command doesn't just download the data file,
but it also creates an import stage (`.dvc` file) with a link to the data source
(as explained in the description above). (This import stage can later be used to
[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`:
but it also creates an <abbr>import stage</abbr> (`.dvc` file) with a link to
the data source (as explained in the description above). (This import stage can
later be used to [update](/doc/command-reference/update) the import.) Check
`data.xml.dvc`:

```yaml
md5: 7de90e7de7b432ad972095bc1f2ec0f8
Expand Down
1 change: 0 additions & 1 deletion content/docs/command-reference/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,6 @@ M model.pkl
M data/features/

$ dvc status

Data and pipelines are up to date.
```

Expand Down
9 changes: 4 additions & 5 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ DVC, by effectively replacing data files, models, directories with `.dvc` files
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
GitHub), you just see the `dvc.yaml` and `.dvc` files. This makes it hard to
navigate the project to find <abbr>data artifacts</abbr> for use with `dvc get`,
`dvc import`, or `dvc.api`.
`dvc import`, or `dvc.api` functions.

`dvc list` prints a virtual view of a DVC repository, as if files and
directories tracked by DVC were found directly in the remote Git repo. Only the
Expand All @@ -36,10 +36,9 @@ $ dvc pull
$ ls <path>
```

The `url` argument specifies the address of the Git repository containing the
data source. Both HTTP and SSH protocols are supported for online repos (e.g.
`[user@]server:project.git`). `url` can also be a local file system path to an
"offline" Git repo.
The `url` argument specifies the address of the DVC or Git repository containing
the data source. Both HTTP and SSH protocols are supported (e.g.
`[user@]server:project.git`). `url` can also be a local file system path.

The optional `path` argument is used to specify a directory to list within the
source repository at `url` (including paths inside tracked directories). It's
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ lists all the current metrics without comparisons.

## Options

- `--targets <paths>` - limit command scope to these metric files. Using -R,
- `--targets <paths>` - limit command scope to these metric files. Using `-R`,
directories to search metric files in can also be given. When specifying
arguments for `--targets` before `revisions`, you should use `--` after this
option's arguments, e.g.:
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/move.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ $ dvc commit -f

- `-v`, `--verbose` - displays detailed tracing information.

## Example: change the file name
## Example: Change the file name

We first use `dvc add` to track file with DVC. Then, we change its name using
`dvc move`.
Expand All @@ -130,7 +130,7 @@ $ tree
└── other.csv.dvc
```

## Example: change the location
## Example: Change a file location

We use `dvc add` to track a file with DVC, then we use `dvc move` to change its
location. If the target path is a directory and already exists, the data file is
Expand Down Expand Up @@ -166,7 +166,7 @@ $ tree
└── foo.dvc
```

## Example: change an imported directory name and location
## Example: Move a directory

Let's try the same with an entire directory imported from an external <abbr>DVC
repository</abbr> with `dvc import`. Note that, as in the previous cases, the
Expand Down
1 change: 1 addition & 0 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ such that the data in some of these stages should be updated in the

```dvc
$ dvc status -c
...
deleted: data/features/test.pkl
deleted: data/features/train.pkl
deleted: model.pkl
Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,9 +149,10 @@ Imagine the <abbr>project</abbr> has been modified such that the

```dvc
$ dvc status --cloud
new: data/model.p
new: data/matrix-test.p
new: data/matrix-train.p
...
new: data/model.p
new: data/matrix-test.p
new: data/matrix-train.p
```

One could do a simple `dvc push` to share all the data, but what if you only
Expand Down Expand Up @@ -258,7 +259,6 @@ $ tree ~/vault/recursive
10 directories, 10 files

$ dvc status --cloud

Data and pipelines are up to date.
```

Expand Down
19 changes: 10 additions & 9 deletions content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,11 @@ bar.dvc:
modified: bar
changed outs:
not in cache: foo
foo.dvc
foo.dvc:
changed outs:
deleted: foo
changed checksum
prepare.dvc
prepare.dvc:
changed outs:
new: bar
always changed
Expand All @@ -180,11 +180,11 @@ This shows that for stage `bar.dvc`, the dependency `foo` and the

```dvc
$ dvc status foo.dvc dobar
foo.dvc
foo.dvc:
changed outs:
deleted: foo
changed checksum
dobar
dobar:
changed deps:
modified: bar
changed outs:
Expand Down Expand Up @@ -220,7 +220,7 @@ $ dvc status model.p
Data and pipelines are up to date.

$ dvc status model.p --with-deps
matrix-train.p
matrix-train.p:
changed deps:
modified: code/featurization.py
```
Expand All @@ -243,10 +243,11 @@ remote yet:

```dvc
$ dvc status --remote storage
new: data/model.p
new: data/eval.txt
new: data/matrix-train.p
new: data/matrix-test.p
...
new: data/model.p
new: data/eval.txt
new: data/matrix-train.p
new: data/matrix-test.p
```

The output shows where the location of the remote storage is, as well as any
Expand Down
2 changes: 0 additions & 2 deletions content/docs/user-guide/dvcignore.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,10 @@ adding new file:

```dvc
$ dvc status

Data and pipelines are up to date.

$ mv data/data1 data/data3
$ dvc status

data.dvc:
changed outs:
modified: data
Expand Down
42 changes: 27 additions & 15 deletions content/docs/user-guide/external-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,27 +146,39 @@ $ dvc run -n download_file \

</details>

## Example: DVC remote aliases
## Example: Using DVC remote aliases

If instead of a URL you'd like to use an alias that can be managed
independently, or if the external dependency location requires access
credentials, you may use `dvc remote add` to define this location as a DVC
Remote, and then use a special URL with format `remote://{remote_name}/{path}`
to define an external dependency.
You may want to encapsulate external locations as configurable entities that can
be managed independently. This is useful if multiple dependencies (or stages)
reuse the same location, or if its likely to change in the future. And if the
location requires authentication, you need a way to configure it in order to
connect.

For example, for an HTTPs remote/dependency:
[DVC remotes](/doc/command-reference/remote) can do just this. You may use
`dvc remote add` to define them, and then use a special URL with format
`remote://{remote_name}/{path}` (remote alias) to define the external
dependency.

Let's see an example using SSH. First, register and configure the remote:

```dvc
$ dvc remote add myssh ssh://myserver.com
$ dvc remote modify --local myssh user myuser
$ dvc remote modify --local myssh password mypassword
```
Comment on lines +162 to +168
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realize our SSH examples for remote add/modify may be misleading (in other docs) because we sometimes use ssh://user@example.com... when adding, and then mention about modifying the credentials or even show an example of remote modify example user ... but I don't know if that will work correctly with DVC. Need to check...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried testing this but can't because of treeverse/dvc#4712.


> Please refer to `dvc remote add` for more details like setting up access
> credentials for the different remote types.

Now, use an alias to this remote when defining the stage:

```dvc
Comment on lines +173 to 175
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realized this doc only uses dvc run to define stages and doesn't show the resulting dvc.yaml files (except for import(-url) examples. Maybe we shouldn't even use dvc run in most of these examples? Since manually writing dvc.yaml is the recommended way.

But then again you can't easily write an external dependency in dvc.yaml ... Hmmm 🤔

$ dvc remote add example https://example.com
$ dvc run -n download_file \
-d remote://example/data.txt \
-d remote://myssh/path/to/data.txt \
-o data.txt \
wget https://example.com/data.txt -O data.txt
```

Please refer to `dvc remote add` for more details like setting up access
credentials for the different remotes.

## Example: `import-url` command

In the previous examples, special downloading tools were used: `scp`,
Expand Down Expand Up @@ -205,11 +217,11 @@ determine whether the source has changed and we need to download the file again.

</details>

## Example: Using import
## Example: Imports

`dvc import` can download a <abbr>data artifact</abbr> from any <abbr>DVC
project</abbr> or Git repository. It also creates an external dependency in its
import `.dvc` file.
project</abbr>, or any file from a Git repository. It also creates an external
dependency in its import `.dvc` file.

```dvc
$ dvc import git@github.com:iterative/example-get-started model.pkl
Expand Down
Loading