Skip to content

Commit

Permalink
Address some reviews
Browse files Browse the repository at this point in the history
  • Loading branch information
isidentical committed Jan 14, 2021
1 parent fce5f64 commit 5cabdfc
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 41 deletions.
57 changes: 31 additions & 26 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ file.

```usage
usage: dvc add [-h] [-q | -v] [-R] [--no-commit] [--external]
[--file <filename>] [--glob] [-o <filename>]
[--file <filename>] [--glob] [-o <path>]
[--to-remote] [-r <name>] [--desc <text>]
targets [targets ...]
Expand Down Expand Up @@ -74,18 +74,23 @@ large files. DVC also supports other link types for use on file systems without

### Transferring data directly to the remote

Giving `--to-remote` option would change the behavior described above. Instead
of only being able to give it something from local/remote workspace, it would be
able to support all kinds of remote locations that you can import something
(listed in [import-url](/doc/command-reference/import-url)). The main difference
is that it won't actually do anything on the workspace beside creating a DVC
file. It will take the data in batches from the given target and transfer it
through 'the local system' to the
[remote storage](/doc/command-reference/remote). This option especially targets
cases where the running system doesn't have the means of storage that data as a
whole fits in but it can later have (or another user's system who shares the
same project). So that the DVC file would allow checking out that data from the
same remote storage when the system is ready to handle it.
When you have a very big dataset that you want to move from some remote location
to one of your remotes, but at the same time you don't have time or resources to
store it locally on your local system, you can use `--to-remote` to add that
remote location straight to remote instead of your local workspace. The remote
location can be any of the ones that are listed under
[import-url](/doc/command-reference/import-url) page. When you add a remote
location with `--to-remote`, it will get the dataset from the given location and
transfer it to the remote you specified (or the default one). It will create a
DVC file just like you added something locally, but there won't be any data that
you can access, unless you [pull](/doc/command-reference/pull) it. In that case,
it will pull it from the remote storage unit to your workspace and you can start
using it.

This flag is extremely useful when your current system can't handle the data as
whole, but you still want to track and store it in a remote storage unit, so
that whenever you switch to a different system that can handle it as a whole (or
partially) you can simply get the data and start working on it.

### Adding entire directories

Expand Down Expand Up @@ -165,29 +170,29 @@ not.
> Note that external outputs typically require an external cache setup. See
> link above for more details.
- `--to-remote` - transfer data straight to remote, when the used system doesn't
have the means to store it locally. So instead of transferring it to the local
cache and link it to the working directory, it is transferred through the
local computer in batches to the remote storage (can be configured using
`--remote <name>`) and can be checked out locally when the necessary means
have been established since this process also results with a DVC file.
- `--to-remote` - add target data into DVC and create a .dvc file, but instead
of caching it into DVC cache, transfer it straight to remote storage. Check
[this](#transferring-data-directly-to-the-remote) section for the details. If
this option is specified target can be any cloud or local URL, not necessarily
a local file or directory from the workspace as it is required in the regular
dvc addworkflow.

* `-o <filename>`, `--out <filename>` - destination path for the transferred
data. (Can only be used with `--to-remote`)
- `-o <path>`, `--out <path>` - destination path for the transferred data. (Can
only be used with `--to-remote`)

* `-r <name>`, `--remote <name>` - name of the
- `-r <name>`, `--remote <name>` - name of the
[remote storage](/doc/command-reference/remote). (Can only be used with
`--to-remote`)

* `--desc <text>` - user description of the data (optional). This doesn't affect
- `--desc <text>` - user description of the data (optional). This doesn't affect
any DVC operations.

* `-h`, `--help` - prints the usage/help message, and exit.
- `-h`, `--help` - prints the usage/help message, and exit.

* `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

* `-v`, `--verbose` - displays detailed tracing information.
- `-v`, `--verbose` - displays detailed tracing information.

## Example: Single file

Expand Down
42 changes: 27 additions & 15 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,12 +135,8 @@ original source.
already exist locally and you want to "DVCfy" this state of the project (see
also `dvc commit`).

- `--to-remote` - transfer data straight to remote, when the used system doesn't
have the means to store it locally. So instead of importing it to the
workspace, it is transferred through the local computer in batches to the
remote storage (can be configured using `--remote <name>`) and can be checked
out locally when the necessary means have been established since this process
also results with a DVC file.
- `--to-remote` - import data straight to remote storage and create a .dvc file.
Check [this](#example-import-straight-to-the-remote) section for the details.

- `-r <name>`, `--remote <name>` - name of the
[remote storage](/doc/command-reference/remote)
Expand Down Expand Up @@ -353,10 +349,18 @@ Running stage 'prepare' with command:

## Example: Import straight to the remote

If you want to move a dataset or a model from a distant location into your
remote storage, and while doing that you also want to track it in case you might
later need to [pull](/docs/command-reference/pull) it locally, `--to-remote`
option can come to your help on that case.
When you have a massive dataset in a distant location, and working on a computer
which can't actually store it locally (due to not having enough disk space) but
you still want to take it under control of DVC just like in the scenario of
importing it and then pushing it to the remote, then you can use `--to-remote`
flag.

It will try to import the data into the remote storage that you choose, and when
you or any of your colleagues want to copy the data to their systems, they could
just simply [pull](/doc/command-reference/remote). Let's do a simple example

We initalize 2 directories, one being the remote storage unit and the other one
is the workspace.

```dvc
$ mkdir /tmp/dvc-import-url-straight-to-remote/
Expand All @@ -369,8 +373,16 @@ $ dvc remote add tmp_remote /tmp/remote

For transferring a source from a remote location, to the given remote you can
combine `import-url` with `--to-remote` option which basically does the whole
transferring operation without actually a need of fitting the dataset as a whole
to your system.
importing and [push](/doc/command-reference/push)ing operation under the hood
but without actually downloading everything in once, but rather transferring
gradually.

When you run the `import-url` with `--to-remote`, you pass as usual the remote
location and the output filename, afterward if you haven't set a default
[remote](/doc/command-reference/remote) yet, you can simply pass the name of the
remote with `-r`/`--remote` flag and it will start the transfer and leave a DVC
file as an only side effect on your workspace (everything else happens in the
remote storage unit)

```
$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml --to-remote -r tmp_remote
Expand All @@ -379,9 +391,9 @@ To track the changes with git, run:
git add data.xml.dvc
```
This operation will result with a DVC file (`data.xml.dvc`) and no local cache /
data at all. When you move to a more suitable system, which can store the data
locally `dvc pull` will simply get it for you.
Whenever anyone wants to actually get this file, like when they have a system
which can handle it, it is just a simple [pull](/doc/command-reference/pull)
operation.
```
$ dvc pull data.xml.dvc -r tmp_remote
Expand Down

0 comments on commit 5cabdfc

Please sign in to comment.