diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 8ffeb47ef3..1d0ef04ad9 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -151,6 +151,11 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. +- `-o `, `--out ` - destination `path` to make a local target copy, + or to [transfer](#example-transfer-to-cache) an external target into the cache + (and link to workspace). Note that this can be combined with `--to-remote` to + avoid storing the data locally, while still adding it to the project. + - `--to-remote` - import an external target, but don't move it into the workspace, nor cache it. [Transfer it](#example-transfer-to-remote-storage) it directly to remote storage (the default one, unless `-r` is specified) @@ -160,9 +165,6 @@ not. [remote storage](/doc/command-reference/remote) to transfer external target to (can only be used with `--to-remote`). -- `-o `, `--out ` - destination `path` for the transferred data (can - only be used with `--to-remote`). - - `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. @@ -332,6 +334,52 @@ $ tree .dvc/cache Only the hash values of the `dir/` directory (with `.dir` file extension) and `file2` have been cached. +## Example: Transfer to the cache + +When you have a large dataset in an external location, you may want to add it to +the project without having to copy it into the workspace. Maybe +your local disk doesn't have enough space, but you have setup an +[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache) +that could handle it. + +The `--out` option lets you add external paths in a way that they are +cached first, and then +[linked](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) +to a given path inside the workspace. Let's initialize an example +DVC project to try this: + +```dvc +$ mkdir example # workspace +$ cd example +$ git init +$ dvc init +``` + +Now we can add a `data.xml` file via HTTP for example, putting it a local path +in our project: + +```dvc +$ dvc add https://data.dvc.org/get-started/data.xml -o data.xml + +$ ls +data.xml data.xml.dvc +``` + +The resulting `.dvc` file will save the provided local `path` as if the data was +already in the workspace, while the `md5` hash points to the copy of the data +that has now been transferred to the cache. Let's check the +contents of `data.xml.dvc` in this case: + +```yaml +outs: + - md5: a304afb96060aad90176268345e10355 + nfiles: 1 + path: data.xml +``` + +> For a similar operation that actually keeps a connection to the data source, +> please see `dvc import-url`. + ## Example: Transfer to remote storage When you have a large dataset in an external location, you may want to track it