Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: add to-cache docs #2246

Merged
merged 9 commits into from
Mar 10, 2021
54 changes: 51 additions & 3 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ not.
> Note that external outputs typically require an external cache setup. See
> link above for more details.

- `-o <path>`, `--out <path>` - destination `path` to make a local target copy,
or to [transfer](#example-transfer-to-cache) an external target into the cache
(and link to workspace). Note that this can be combined with `--to-remote` to
avoid storing the data locally, while still adding it to the project.

- `--to-remote` - import an external target, but don't move it into the
workspace, nor cache it. [Transfer it](#example-transfer-to-remote-storage) it
directly to remote storage (the default one, unless `-r` is specified)
Expand All @@ -160,9 +165,6 @@ not.
[remote storage](/doc/command-reference/remote) to transfer external target to
(can only be used with `--to-remote`).

- `-o <path>`, `--out <path>` - destination `path` for the transferred data (can
only be used with `--to-remote`).

- `--desc <text>` - user description of the data (optional). This doesn't affect
any DVC operations.

Expand Down Expand Up @@ -332,6 +334,52 @@ $ tree .dvc/cache
Only the hash values of the `dir/` directory (with `.dir` file extension) and
`file2` have been cached.

## Example: Transfer to the cache

When you have a large dataset in an external location, you may want to add it to
the <abbr>project</abbr> without having to copy it into the workspace. Maybe
your local disk doesn't have enough space, but you have setup an
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
that could handle it.

The `--out` option lets you add external paths in a way that they are
<abbr>cached</abbr> first, and then
[linked](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to a given path inside the <abbr>workspace<abbr>. Let's initialize an example
DVC project to try this:

```dvc
$ mkdir example # workspace
$ cd example
$ git init
$ dvc init
```

Now we can add a `data.xml` file via HTTP for example, putting it a local path
in our project:

```dvc
$ dvc add https://data.dvc.org/get-started/data.xml -o data.xml

$ ls
data.xml data.xml.dvc
```

The resulting `.dvc` file will save the provided local `path` as if the data was
already in the workspace, while the `md5` hash points to the copy of the data
that has now been transferred to the <abbr>cache</abbr>. Let's check the
contents of `data.xml.dvc` in this case:

```yaml
outs:
- md5: a304afb96060aad90176268345e10355
nfiles: 1
path: data.xml
```

> For a similar operation that actually keeps a connection to the data source,
> please see `dvc import-url`.

## Example: Transfer to remote storage

When you have a large dataset in an external location, you may want to track it
Expand Down