Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: add to-cache docs #2246

Merged
merged 9 commits into from
Mar 10, 2021
58 changes: 55 additions & 3 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ not.
> Note that external outputs typically require an external cache setup. See
> link above for more details.

- `-o <path>`, `--out <path>` - destination `path` to make a local target copy, or to
[transfer](#example-transfer-to-cache) an external target into the cache and workspace.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
Note that this can be combined with `--to-remote` to avoid
storing the data locally, while still adding it to the project.

- `--to-remote` - import an external target, but don't move it into the
workspace, nor cache it. [Transfer it](#example-transfer-to-remote-storage) it
directly to remote storage (the default one, unless `-r` is specified)
Expand All @@ -160,9 +165,6 @@ not.
[remote storage](/doc/command-reference/remote) to transfer external target to
(can only be used with `--to-remote`).

- `-o <path>`, `--out <path>` - destination `path` for the transferred data (can
only be used with `--to-remote`).

- `--desc <text>` - user description of the data (optional). This doesn't affect
any DVC operations.

Expand Down Expand Up @@ -332,6 +334,56 @@ $ tree .dvc/cache
Only the hash values of the `dir/` directory (with `.dir` file extension) and
`file2` have been cached.

## Example: Transfer to the cache

When you have a large dataset in an external location, you may want to add it to
the <abbr>project</abbr> without having to copy it into the workspace. Maybe
your local disk doesn't even have enough space, but you have setup an
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
that could handle it.

The `--out` option lets you add external paths in a way that they are
<abbr>cached</abbr> first, and then
[linked](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to a given path inside the <abbr>workspace<abbr>. Let's initialize a DVC
project:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ mkdir example # workspace
$ cd example
$ git init
$ dvc init
```

Now we can add a `data.xml` file via HTTP for example, putting it a local path
in our project:

```
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
$ dvc add https://data.dvc.org/get-started/data.xml -o data.xml

To track the changes with git, run:

git add data.xml.dvc
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

$ ls
data.xml data.xml.dvc
```

The resulting `.dvc` file will save the provided local `path` as if the data was
always there, while the `md5` hash points to the copy of the data that has now
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
been transferred to the cache. Let's check the contents of `data.xml.dvc` in
this case:

```yaml
outs:
- md5: a304afb96060aad90176268345e10355
nfiles: 1
path: data.xml
```

> For a similar operation that actually keeps a connection to the data source,
> please see `dvc import-url`.

## Example: Transfer to remote storage

When you have a large dataset in an external location, you may want to track it
Expand Down