Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for dvc add --to-remote #2172

Merged
merged 5 commits into from
Feb 28, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 48 additions & 4 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,8 @@ copy of the target data directly to a remote of your choice (or the default
one). A `.dvc` file will be created normally, but the data won't be found in
your local project until you `dvc pull` it.

This option is useful when the local system can't handle the target data, but
you still want to track and store it in remote storage, so that whenever you
switch to a different system that can handle it, you can simply pull the data
and start working on it.
(ℹ️) See the [Transfer to remote storage](#example-transfer-to-remote-storage)
below.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

### Adding entire directories

Expand Down Expand Up @@ -344,3 +342,49 @@ $ tree .dvc/cache

Only the hash values of the `dir/` directory (with `.dir` file extension) and
`file2` have been cached.

## Example: Transfer to remote storage

When you have a large dataset in an external location, you may want to add it to
your project without downloading it to the local file system (for using it
later/elsewhere). The `--to-remote` option let you skip the download, while
storing the imported data [remotely](/doc/command-reference/remote). Let's
initialize a DVC project, and setup a remote:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ mkdir example # workspace
$ cd example
$ git init
$ dvc init
$ mkdir /tmp/dvc-storage
$ dvc remote add myremote /tmp/dvc-storage
```

Now let's add the `data.xml` to our remote storage from the given remote
location.

```dvc
$ dvc add https://data.dvc.org/get-started/data.xml -o data.xml \
--to-remote -r myremote
...
```

The only difference that dataset is transferred straight to remote, so DVC won't
control the remote location you gave but rather continue managing your remote
storage where the data is now on. The operation will still be resulted with an
`.dvc` file:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ ls
data.xml.dvc
```

Whenever anyone wants to actually download the added data (for example from a
system that can handle it), they can use `dvc pull` as usual:

```dvc
$ dvc pull data.xml.dvc -r tmp_remote

A data.xml
1 file added and 1 file fetched
```
4 changes: 2 additions & 2 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -361,8 +361,8 @@ Running stage 'prepare' with command:
## Example: Transfer to remote storage

When you have a large dataset in an external location, you may want to import it
to you project without downloading it to the local file system (for using it
later/elsewhere). The `--to-remote` option lets you skip the download, while
to your project without downloading it to the local file system (for using it
later/elsewhere). The `--to-remote` option let you skip the download, while
storing the imported data [remotely](/doc/command-reference/remote). Let's
initialize a DVC project, and setup a remote:

Expand Down