From f64f60113a0cdc6dbaaeec7361bd809457d14e0b Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Tue, 12 Jan 2021 12:47:33 +0300 Subject: [PATCH 01/39] Initial pre-texts regarding straight to remote --- content/docs/command-reference/add.md | 37 ++++++++++++++++++-- content/docs/command-reference/import-url.md | 16 +++++++-- 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 984be5ce75..f53ac3b9e2 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -7,7 +7,8 @@ file. ```usage usage: dvc add [-h] [-q | -v] [-R] [--no-commit] [--external] - [--glob] [--file ] [--desc ] + [--glob] [--file ] [-o ] [--to-remote] + [-r ] [-j ] [--desc ] targets [targets ...] positional arguments: @@ -33,7 +34,8 @@ option to avoid this, and `dvc commit` to finish the process when needed). > intermediate and final results (like ML models). After checking that each `target` hasn't been added before (or tracked with -other DVC commands), a few actions are taken under the hood: +other DVC commands), a few actions are taken under the hood (if `--to-remote` is +not provided): 1. Calculate the file hash. 2. Move the file contents to the cache (by default in `.dvc/cache`), using the @@ -70,6 +72,27 @@ large files. DVC also supports other link types for use on file systems without `reflink` support, but they have to be specified manually. Refer to the `cache.type` config option in `dvc config cache` for more information. +### Transferring data directly to the remote + +Giving `--to-remote` option would change the behavior described above. Instead +of only being able to give it a local target, it would be able to support all +kinds of remote locations (listed in +[import-url](/doc/command-reference/import-url)). The main difference is that it +won't actually do anything on the working system beside creating a DVC file. It +will take the data in batches from the given target and transfer it through 'the +local system' to the [remote storage](/doc/command-reference/remote). It is +especially targeting cases where the running system doesn't have the means of +storing that data as a whole but it can later have (or another user's system who +shares the same project). So that the DVC file would allow checking out that +data when the system can meet the needs of storage. + +The option is designed to transfer data straight to remote, when the used system +doesn't have the means to store it locally. So instead of transferring it to the +local cache and link it to the working directory, it is transferred through the +local computer in batches to the remote storage (can be configured using +`--remote `) and can be checked out locally when the necessary means have +been established since this process also results with a DVC file. + ### Adding entire directories A `dvc add` target can be either a file or a directory. In the latter case, a @@ -148,6 +171,16 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. +- `--to-remote` - adds data into the remote storage, instead of the local + workspace. + +- `-o `, `--out ` - destination path for the transferred + data. (Can only be used with `--to-remote`) + +- `-r `, `--remote ` - name of the + [remote storage](/doc/command-reference/remote). (Can only be used with + `--to-remote`) + - `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 27b70e2d98..d34495d1a1 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -11,7 +11,7 @@ import `.dvc` file is created). ```usage usage: dvc import-url [-h] [-q | -v] [--file ] [--no-exec] - [--desc ] + [--to-remote] [-r ] [--desc ] url [out] positional arguments: @@ -22,8 +22,8 @@ positional arguments: ## Description In some cases it's convenient to add a data file or directory from an external -location into the workspace, such that it can be updated later, if/when the -external data source changes. Example scenarios: +location into the workspace (or to the [remote storage](/doc/command-reference/remote), +such that it can be updated later, if/when the external data source changes. Example scenarios: - A remote system may produce occasional data files that are used in other projects. @@ -37,6 +37,10 @@ external data source changes. Example scenarios: having to manually copy files from the supported locations (listed below), which may require installing a different tool for each type. +When you don't actually want to store the whole data file / directory in your +local workspace but rather import it directly to the remote storage, `--to-remote` +option can be given. + The `url` argument specifies the external location of the data to be imported. The imported data is cached, and linked (or copied) to the current working directory with its original file name e.g. `data.txt` (or to a location @@ -131,6 +135,12 @@ $ dvc run -n download_data \ finish the operation(s)); or if the target data already exist locally and you want to "DVCfy" this state of the project (see also `dvc commit`). +- `--to-remote` - imports data into the remote storage, instead of the local + workspace. + +- `-r `, `--remote ` - name of the + [remote storage](/doc/command-reference/remote) + - `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. From 98cf2374b31e0e72d9cf59fef126359681f6bf58 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Tue, 12 Jan 2021 13:22:25 +0300 Subject: [PATCH 02/39] Add an import-url example --- content/docs/command-reference/import-url.md | 39 ++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index d34495d1a1..de6b5c7ecf 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -346,3 +346,42 @@ $ dvc repro Running stage 'prepare' with command: python src/prepare.py data/data.xml ``` + +## Example: Import straight to the remote + +If you want to move a dataset or a model from a distant location into your +remote storage, and while doing that you also want to track it in case you might +later need to [checkout](/docs/command-reference/checkout) it locally, +`--to-remote` option can come to your help on that case. + +```dvc +$ mkdir /tmp/dvc-import-url-straight-to-remote/ +$ mkdir /tmp/remote +$ cd /tmp/dvc-import-url-straight-to-remote/ +$ git init +$ dvc init +$ dvc remote add tmp_remote /tmp/remote +``` + +For transferring a source from a remote location, to the given remote you can +combine `import-url` with `--to-remote` option which basically does the whole +transferring operation without actually a need of fitting the dataset as a whole +to your system. + +``` +$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml --to-remote -r tmp_remote +To track the changes with git, run: + + git add data.xml.dvc +``` + +This operation will result with a DVC file (`data.xml.dvc`) and no local cache / +data at all. When you move to a more suitable system, which can store the data +locally `dvc pull` will simply get it for you. + +``` + $ dvc pull data.xml.dvc -r tmp_remote + +A data.xml +1 file added and 1 file fetched +``` From 114ba8d013bfc66d1e1ccdba1c870dbcc8da0e5c Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Tue, 12 Jan 2021 13:33:28 +0300 Subject: [PATCH 03/39] More mentions to --to-remote --- content/docs/command-reference/import-url.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index de6b5c7ecf..82fe594df4 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -1,7 +1,8 @@ # import-url Download a file or directory from a supported URL (for example `s3://`, -`ssh://`, and other protocols) into the workspace, and track it (an +`ssh://`, and other protocols) into the workspace (or to the +[remote storage](/doc/command-reference/remote), and track it (an import `.dvc` file is created). > See `dvc import` to download and tack data/model files or directories from @@ -119,8 +120,17 @@ $ dvc run -n download_data \ wget https://data.dvc.org/get-started/data.xml -O data.xml ``` +<<<<<<< HEAD `dvc import-url` generates an _import `.dvc` file_ and `dvc run` a regular stage (in `dvc.yaml`). +======= +`dvc import-url` generates an _import stage_ `.dvc` file and `dvc run` a regular +stage (in `dvc.yaml`). + +⚠️ When not combined with `--to-remote`, DVC won't push or pull imported data +to/from [remote storage](/doc/command-reference/remote), it will rely on it's +original source. +>>>>>>> More mentions to --to-remote ## Options From cbdf546b971699477ab48bf71c8b80f938caefa0 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Tue, 12 Jan 2021 13:48:18 +0300 Subject: [PATCH 04/39] More description regarding --to-remote --- content/docs/command-reference/add.md | 48 ++++++++++---------- content/docs/command-reference/import-url.md | 8 +++- 2 files changed, 29 insertions(+), 27 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index f53ac3b9e2..21c83bdb04 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -75,23 +75,17 @@ large files. DVC also supports other link types for use on file systems without ### Transferring data directly to the remote Giving `--to-remote` option would change the behavior described above. Instead -of only being able to give it a local target, it would be able to support all -kinds of remote locations (listed in -[import-url](/doc/command-reference/import-url)). The main difference is that it -won't actually do anything on the working system beside creating a DVC file. It -will take the data in batches from the given target and transfer it through 'the -local system' to the [remote storage](/doc/command-reference/remote). It is -especially targeting cases where the running system doesn't have the means of -storing that data as a whole but it can later have (or another user's system who -shares the same project). So that the DVC file would allow checking out that -data when the system can meet the needs of storage. - -The option is designed to transfer data straight to remote, when the used system -doesn't have the means to store it locally. So instead of transferring it to the -local cache and link it to the working directory, it is transferred through the -local computer in batches to the remote storage (can be configured using -`--remote `) and can be checked out locally when the necessary means have -been established since this process also results with a DVC file. +of only being able to give it something from local/remote workspace, it would be +able to support all kinds of remote locations that you can import something +(listed in [import-url](/doc/command-reference/import-url)). The main difference +is that it won't actually do anything on the workspace beside creating a DVC +file. It will take the data in batches from the given target and transfer it +through 'the local system' to the +[remote storage](/doc/command-reference/remote). This option especially targets +cases where the running system doesn't have the means of storage that data as a +whole fits in but it can later have (or another user's system who shares the +same project). So that the DVC file would allow checking out that data from the +same remote storage when the system is ready to handle it. ### Adding entire directories @@ -171,25 +165,29 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - adds data into the remote storage, instead of the local - workspace. +- `--to-remote` - transfer data straight to remote, when the used system doesn't + have the means to store it locally. So instead of transferring it to the local + cache and link it to the working directory, it is transferred through the + local computer in batches to the remote storage (can be configured using + `--remote `) and can be checked out locally when the necessary means + have been established since this process also results with a DVC file. -- `-o `, `--out ` - destination path for the transferred +* `-o `, `--out ` - destination path for the transferred data. (Can only be used with `--to-remote`) -- `-r `, `--remote ` - name of the +* `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote). (Can only be used with `--to-remote`) -- `--desc ` - user description of the data (optional). This doesn't affect +* `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. -- `-h`, `--help` - prints the usage/help message, and exit. +* `-h`, `--help` - prints the usage/help message, and exit. -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no +* `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1. -- `-v`, `--verbose` - displays detailed tracing information. +* `-v`, `--verbose` - displays detailed tracing information. ## Example: Single file diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 82fe594df4..fed22b384d 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -145,8 +145,12 @@ original source. finish the operation(s)); or if the target data already exist locally and you want to "DVCfy" this state of the project (see also `dvc commit`). -- `--to-remote` - imports data into the remote storage, instead of the local - workspace. +- `--to-remote` - transfer data straight to remote, when the used system doesn't + have the means to store it locally. So instead of importing it to the + workspace, it is transferred through the local computer in batches to the + remote storage (can be configured using `--remote `) and can be checked + out locally when the necessary means have been established since this process + also results with a DVC file. - `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote) From 820cbd63e2edd164d2392c41cb245e3c93861c96 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Tue, 12 Jan 2021 14:57:35 +0300 Subject: [PATCH 05/39] checkout => pull --- content/docs/command-reference/import-url.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index fed22b384d..5cab153633 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -365,8 +365,8 @@ Running stage 'prepare' with command: If you want to move a dataset or a model from a distant location into your remote storage, and while doing that you also want to track it in case you might -later need to [checkout](/docs/command-reference/checkout) it locally, -`--to-remote` option can come to your help on that case. +later need to [pull](/docs/command-reference/pull) it locally, `--to-remote` +option can come to your help on that case. ```dvc $ mkdir /tmp/dvc-import-url-straight-to-remote/ From 4c83bdf455d758bfd7013e7208b3127ba582a9ae Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Thu, 14 Jan 2021 11:01:20 +0300 Subject: [PATCH 06/39] Address some reviews --- content/docs/command-reference/add.md | 55 +++++++++++--------- content/docs/command-reference/import-url.md | 42 +++++++++------ 2 files changed, 57 insertions(+), 40 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 21c83bdb04..4dae2d351d 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -74,18 +74,23 @@ large files. DVC also supports other link types for use on file systems without ### Transferring data directly to the remote -Giving `--to-remote` option would change the behavior described above. Instead -of only being able to give it something from local/remote workspace, it would be -able to support all kinds of remote locations that you can import something -(listed in [import-url](/doc/command-reference/import-url)). The main difference -is that it won't actually do anything on the workspace beside creating a DVC -file. It will take the data in batches from the given target and transfer it -through 'the local system' to the -[remote storage](/doc/command-reference/remote). This option especially targets -cases where the running system doesn't have the means of storage that data as a -whole fits in but it can later have (or another user's system who shares the -same project). So that the DVC file would allow checking out that data from the -same remote storage when the system is ready to handle it. +When you have a very big dataset that you want to move from some remote location +to one of your remotes, but at the same time you don't have time or resources to +store it locally on your local system, you can use `--to-remote` to add that +remote location straight to remote instead of your local workspace. The remote +location can be any of the ones that are listed under +[import-url](/doc/command-reference/import-url) page. When you add a remote +location with `--to-remote`, it will get the dataset from the given location and +transfer it to the remote you specified (or the default one). It will create a +DVC file just like you added something locally, but there won't be any data that +you can access, unless you [pull](/doc/command-reference/pull) it. In that case, +it will pull it from the remote storage unit to your workspace and you can start +using it. + +This flag is extremely useful when your current system can't handle the data as +whole, but you still want to track and store it in a remote storage unit, so +that whenever you switch to a different system that can handle it as a whole (or +partially) you can simply get the data and start working on it. ### Adding entire directories @@ -165,29 +170,29 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - transfer data straight to remote, when the used system doesn't - have the means to store it locally. So instead of transferring it to the local - cache and link it to the working directory, it is transferred through the - local computer in batches to the remote storage (can be configured using - `--remote `) and can be checked out locally when the necessary means - have been established since this process also results with a DVC file. +- `--to-remote` - add target data into DVC and create a .dvc file, but instead + of caching it into DVC cache, transfer it straight to remote storage. Check + [this](#transferring-data-directly-to-the-remote) section for the details. If + this option is specified target can be any cloud or local URL, not necessarily + a local file or directory from the workspace as it is required in the regular + dvc addworkflow. -* `-o `, `--out ` - destination path for the transferred - data. (Can only be used with `--to-remote`) +- `-o `, `--out ` - destination path for the transferred data. (Can + only be used with `--to-remote`) -* `-r `, `--remote ` - name of the +- `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote). (Can only be used with `--to-remote`) -* `--desc ` - user description of the data (optional). This doesn't affect +- `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. -* `-h`, `--help` - prints the usage/help message, and exit. +- `-h`, `--help` - prints the usage/help message, and exit. -* `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1. -* `-v`, `--verbose` - displays detailed tracing information. +- `-v`, `--verbose` - displays detailed tracing information. ## Example: Single file diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 5cab153633..4ab521ca46 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -145,12 +145,8 @@ original source. finish the operation(s)); or if the target data already exist locally and you want to "DVCfy" this state of the project (see also `dvc commit`). -- `--to-remote` - transfer data straight to remote, when the used system doesn't - have the means to store it locally. So instead of importing it to the - workspace, it is transferred through the local computer in batches to the - remote storage (can be configured using `--remote `) and can be checked - out locally when the necessary means have been established since this process - also results with a DVC file. +- `--to-remote` - import data straight to remote storage and create a .dvc file. + Check [this](#example-import-straight-to-the-remote) section for the details. - `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote) @@ -363,10 +359,18 @@ Running stage 'prepare' with command: ## Example: Import straight to the remote -If you want to move a dataset or a model from a distant location into your -remote storage, and while doing that you also want to track it in case you might -later need to [pull](/docs/command-reference/pull) it locally, `--to-remote` -option can come to your help on that case. +When you have a massive dataset in a distant location, and working on a computer +which can't actually store it locally (due to not having enough disk space) but +you still want to take it under control of DVC just like in the scenario of +importing it and then pushing it to the remote, then you can use `--to-remote` +flag. + +It will try to import the data into the remote storage that you choose, and when +you or any of your colleagues want to copy the data to their systems, they could +just simply [pull](/doc/command-reference/remote). Let's do a simple example + +We initalize 2 directories, one being the remote storage unit and the other one +is the workspace. ```dvc $ mkdir /tmp/dvc-import-url-straight-to-remote/ @@ -379,8 +383,16 @@ $ dvc remote add tmp_remote /tmp/remote For transferring a source from a remote location, to the given remote you can combine `import-url` with `--to-remote` option which basically does the whole -transferring operation without actually a need of fitting the dataset as a whole -to your system. +importing and [push](/doc/command-reference/push)ing operation under the hood +but without actually downloading everything in once, but rather transferring +gradually. + +When you run the `import-url` with `--to-remote`, you pass as usual the remote +location and the output filename, afterward if you haven't set a default +[remote](/doc/command-reference/remote) yet, you can simply pass the name of the +remote with `-r`/`--remote` flag and it will start the transfer and leave a DVC +file as an only side effect on your workspace (everything else happens in the +remote storage unit) ``` $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml --to-remote -r tmp_remote @@ -389,9 +401,9 @@ To track the changes with git, run: git add data.xml.dvc ``` -This operation will result with a DVC file (`data.xml.dvc`) and no local cache / -data at all. When you move to a more suitable system, which can store the data -locally `dvc pull` will simply get it for you. +Whenever anyone wants to actually get this file, like when they have a system +which can handle it, it is just a simple [pull](/doc/command-reference/pull) +operation. ``` $ dvc pull data.xml.dvc -r tmp_remote From aaa0273cc4e9e93b6260bb9b495c5f991d407940 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Thu, 14 Jan 2021 11:15:30 +0300 Subject: [PATCH 07/39] Reference to the example in the docs --- content/docs/command-reference/import-url.md | 24 ++++++++------------ 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 4ab521ca46..b4fb8d4bf6 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -2,8 +2,8 @@ Download a file or directory from a supported URL (for example `s3://`, `ssh://`, and other protocols) into the workspace (or to the -[remote storage](/doc/command-reference/remote), and track it (an -import `.dvc` file is created). +[remote storage](/doc/command-reference/remote), and track it (an import `.dvc` +file is created). > See `dvc import` to download and tack data/model files or directories from > other DVC repositories (e.g. hosted on GitHub). @@ -23,8 +23,9 @@ positional arguments: ## Description In some cases it's convenient to add a data file or directory from an external -location into the workspace (or to the [remote storage](/doc/command-reference/remote), -such that it can be updated later, if/when the external data source changes. Example scenarios: +location into the workspace (or to the +[remote storage](/doc/command-reference/remote), such that it can be updated +later, if/when the external data source changes. Example scenarios: - A remote system may produce occasional data files that are used in other projects. @@ -39,8 +40,10 @@ having to manually copy files from the supported locations (listed below), which may require installing a different tool for each type. When you don't actually want to store the whole data file / directory in your -local workspace but rather import it directly to the remote storage, `--to-remote` -option can be given. +local workspace but rather import it directly to the remote storage, +`--to-remote` option can be given. See the +["import straight to remote"](#example-import-straight-to-the-remote) example +for more details. The `url` argument specifies the external location of the data to be imported. The imported data is cached, and linked (or copied) to the current @@ -120,17 +123,8 @@ $ dvc run -n download_data \ wget https://data.dvc.org/get-started/data.xml -O data.xml ``` -<<<<<<< HEAD `dvc import-url` generates an _import `.dvc` file_ and `dvc run` a regular stage (in `dvc.yaml`). -======= -`dvc import-url` generates an _import stage_ `.dvc` file and `dvc run` a regular -stage (in `dvc.yaml`). - -⚠️ When not combined with `--to-remote`, DVC won't push or pull imported data -to/from [remote storage](/doc/command-reference/remote), it will rely on it's -original source. ->>>>>>> More mentions to --to-remote ## Options From 02f9ade2745705bdfe376f1bad5cbef830b66f6c Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Fri, 15 Jan 2021 11:32:35 +0300 Subject: [PATCH 08/39] remove brackets --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index b4fb8d4bf6..e3700174bc 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -1,7 +1,7 @@ # import-url Download a file or directory from a supported URL (for example `s3://`, -`ssh://`, and other protocols) into the workspace (or to the +`ssh://`, and other protocols) into the workspace or to the [remote storage](/doc/command-reference/remote), and track it (an import `.dvc` file is created). From 66c871036ff430786b71ae4d3088070284977632 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Fri, 15 Jan 2021 13:32:22 +0300 Subject: [PATCH 09/39] -j for import-url/add --- content/docs/command-reference/import-url.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index e3700174bc..2245513ab1 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -12,7 +12,8 @@ file is created). ```usage usage: dvc import-url [-h] [-q | -v] [--file ] [--no-exec] - [--to-remote] [-r ] [--desc ] + [--to-remote] [-r ] [-j ] + [--desc ] url [out] positional arguments: From c11ef079f30516e50011fcb01105004fe9384b51 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Mon, 18 Jan 2021 10:24:39 +0300 Subject: [PATCH 10/39] apply suggestions from jorge Co-authored-by: Jorge Orpinel --- content/docs/command-reference/add.md | 52 ++++++++++++--------------- 1 file changed, 22 insertions(+), 30 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 4dae2d351d..7611609b01 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -72,25 +72,19 @@ large files. DVC also supports other link types for use on file systems without `reflink` support, but they have to be specified manually. Refer to the `cache.type` config option in `dvc config cache` for more information. -### Transferring data directly to the remote - -When you have a very big dataset that you want to move from some remote location -to one of your remotes, but at the same time you don't have time or resources to -store it locally on your local system, you can use `--to-remote` to add that -remote location straight to remote instead of your local workspace. The remote -location can be any of the ones that are listed under -[import-url](/doc/command-reference/import-url) page. When you add a remote -location with `--to-remote`, it will get the dataset from the given location and -transfer it to the remote you specified (or the default one). It will create a -DVC file just like you added something locally, but there won't be any data that -you can access, unless you [pull](/doc/command-reference/pull) it. In that case, -it will pull it from the remote storage unit to your workspace and you can start -using it. - -This flag is extremely useful when your current system can't handle the data as -whole, but you still want to track and store it in a remote storage unit, so -that whenever you switch to a different system that can handle it as a whole (or -partially) you can simply get the data and start working on it. +### Transferring data directly to remote storage + +When you have a very big dataset that you want to move from some external +location to [remote storage](/doc/command-reference/remote) while avoiding +storing it locally, you can use the `--to-remote` option. This will transfer a +copy of the target data directly to a remote of your choice (or the default +one). A `.dvc` file will be created normally, but the data won't be found in +your local project until you `dvc pull` it. + +This option is useful when the local system can't handle the target data, but +you still want to track and store it in remote storage, so that whenever you +switch to a different system that can handle it, you can simply pull the data +and start working on it. ### Adding entire directories @@ -170,19 +164,17 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - add target data into DVC and create a .dvc file, but instead - of caching it into DVC cache, transfer it straight to remote storage. Check - [this](#transferring-data-directly-to-the-remote) section for the details. If - this option is specified target can be any cloud or local URL, not necessarily - a local file or directory from the workspace as it is required in the regular - dvc addworkflow. - -- `-o `, `--out ` - destination path for the transferred data. (Can - only be used with `--to-remote`) +- `--to-remote` - track a single external target file or directory (with a `.dvc` file), + but instead of caching and linking it locally, + [transfer](#transferring-data-directly-to-the-remote) it straight to remote + storage. - `-r `, `--remote ` - name of the - [remote storage](/doc/command-reference/remote). (Can only be used with - `--to-remote`) + [remote storage](/doc/command-reference/remote) to transfer external target to + (can only be used with `--to-remote`). + +- `-o `, `--out ` - destination path for the transferred data (can + only be used with `--to-remote`). - `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. From 0b79d10a81eb52cee5ccffda336365816e8b5dde Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Mon, 18 Jan 2021 10:28:32 +0300 Subject: [PATCH 11/39] Reorder parameters according to the core --- content/docs/command-reference/add.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 7611609b01..4267ea01f0 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -164,8 +164,8 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - track a single external target file or directory (with a `.dvc` file), - but instead of caching and linking it locally, +- `--to-remote` - track a single external target file or directory (with a + `.dvc` file), but instead of caching and linking it locally, [transfer](#transferring-data-directly-to-the-remote) it straight to remote storage. From 2ea1f226bd709a43e39ffbb9ac8e906e9446c2c7 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Mon, 18 Jan 2021 10:31:11 +0300 Subject: [PATCH 12/39] Apply a bunch more suggestions --- content/docs/command-reference/add.md | 10 +++++----- content/docs/command-reference/import-url.md | 7 +++---- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 4267ea01f0..0c7bb8e951 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -34,16 +34,16 @@ option to avoid this, and `dvc commit` to finish the process when needed). > intermediate and final results (like ML models). After checking that each `target` hasn't been added before (or tracked with -other DVC commands), a few actions are taken under the hood (if `--to-remote` is -not provided): +other DVC commands), a few actions are taken under the hood: 1. Calculate the file hash. -2. Move the file contents to the cache (by default in `.dvc/cache`), using the +2. Move the file contents to the cache (by default in `.dvc/cache`) (if + `--to-remote` option given, then move them to the remote storage), using the file hash to form the cached file path. (See [Structure of cache directory](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) for more details.) -3. Attempt to replace the file with a link to the cached data (more details on - file linking further down). +3. Attempt to replace the file with a link to the cached data (unless + `--to-remote` option given) (more details on file linking further down). 4. Create a corresponding `.dvc` file to track the file, using its path and hash to identify the cached data. The `.dvc` file lists the DVC-tracked file as an output (`outs` field). Unless the `--file` option is used, the diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 2245513ab1..b271d9b269 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -1,9 +1,8 @@ # import-url -Download a file or directory from a supported URL (for example `s3://`, -`ssh://`, and other protocols) into the workspace or to the -[remote storage](/doc/command-reference/remote), and track it (an import `.dvc` -file is created). +Track a file or directory found in an external location (`s3://`, `/local/path`, +etc.), and download it to the local project, or make a copy in +[remote storage](/doc/command-reference/remote). > See `dvc import` to download and tack data/model files or directories from > other DVC repositories (e.g. hosted on GitHub). From 3ff3d014a8ddc09beb6661278788270ec1979fbb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 19 Jan 2021 03:17:13 -0600 Subject: [PATCH 13/39] Update content/docs/command-reference/add.md --- content/docs/command-reference/add.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 0c7bb8e951..637078ed7e 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -173,7 +173,7 @@ not. [remote storage](/doc/command-reference/remote) to transfer external target to (can only be used with `--to-remote`). -- `-o `, `--out ` - destination path for the transferred data (can +- `-o `, `--out ` - destination `path` for the transferred data (can only be used with `--to-remote`). - `--desc ` - user description of the data (optional). This doesn't affect From a4cbe61449bd49fc4b73e25e7aecfd150291409f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:01:03 -0600 Subject: [PATCH 14/39] Update content/docs/command-reference/add.md --- content/docs/command-reference/add.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 637078ed7e..1d43cbe755 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -164,10 +164,9 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - track a single external target file or directory (with a - `.dvc` file), but instead of caching and linking it locally, - [transfer](#transferring-data-directly-to-the-remote) it straight to remote - storage. +- `--to-remote` - Track an external target, but don't move it into the + workspace, nor cache it. [Transfer](#transferring-data-directly-to-the-remote) + it directly to remote storage instead. Use `dvc pull` to get the data locally. - `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote) to transfer external target to From 4fb63eb72192be4364bbc58ef9f72e6c3c32b844 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:02:20 -0600 Subject: [PATCH 15/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index b271d9b269..8a901a5462 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -139,8 +139,9 @@ $ dvc run -n download_data \ finish the operation(s)); or if the target data already exist locally and you want to "DVCfy" this state of the project (see also `dvc commit`). -- `--to-remote` - import data straight to remote storage and create a .dvc file. - Check [this](#example-import-straight-to-the-remote) section for the details. +- `--to-remote` - Import an external target, but don't move it into the + workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it + directly to remote storage instead. Use `dvc pull` to get the data locally. - `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote) From d07166de253426d0b920e8bceb8d967326565c88 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:03:08 -0600 Subject: [PATCH 16/39] Update content/docs/command-reference/add.md --- content/docs/command-reference/add.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 1d43cbe755..46bc85bc29 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -37,9 +37,9 @@ After checking that each `target` hasn't been added before (or tracked with other DVC commands), a few actions are taken under the hood: 1. Calculate the file hash. -2. Move the file contents to the cache (by default in `.dvc/cache`) (if - `--to-remote` option given, then move them to the remote storage), using the - file hash to form the cached file path. (See +2. Move the file contents to the cache (by default in `.dvc/cache`) (or to + remote storage if `--to-remote` is given), using the file hash to form + the cached file path. (See [Structure of cache directory](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) for more details.) 3. Attempt to replace the file with a link to the cached data (unless From c249ee679181eb00805031c36aeb72f38e3cb656 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:03:23 -0600 Subject: [PATCH 17/39] Update content/docs/command-reference/add.md --- content/docs/command-reference/add.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 46bc85bc29..5291a0d495 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -42,8 +42,8 @@ other DVC commands), a few actions are taken under the hood: the cached file path. (See [Structure of cache directory](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) for more details.) -3. Attempt to replace the file with a link to the cached data (unless - `--to-remote` option given) (more details on file linking further down). +3. Attempt to replace the file with a link to the cached data (more details on + file linking further down). Skipped if `--to-remote` is used. 4. Create a corresponding `.dvc` file to track the file, using its path and hash to identify the cached data. The `.dvc` file lists the DVC-tracked file as an output (`outs` field). Unless the `--file` option is used, the From b16d407cb7171da7bfbf3bc0fc206d14244dcbd9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:04:16 -0600 Subject: [PATCH 18/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 8a901a5462..720e96b232 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -23,8 +23,8 @@ positional arguments: ## Description In some cases it's convenient to add a data file or directory from an external -location into the workspace (or to the -[remote storage](/doc/command-reference/remote), such that it can be updated +location into the workspace (or to +[remote storage](/doc/command-reference/remote)), such that it can be updated later, if/when the external data source changes. Example scenarios: - A remote system may produce occasional data files that are used in other From 570f38cb576b90bdbbef576cbd49f13f73732fea Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:04:51 -0600 Subject: [PATCH 19/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 720e96b232..004fe63930 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -39,11 +39,11 @@ later, if/when the external data source changes. Example scenarios: having to manually copy files from the supported locations (listed below), which may require installing a different tool for each type. -When you don't actually want to store the whole data file / directory in your -local workspace but rather import it directly to the remote storage, -`--to-remote` option can be given. See the -["import straight to remote"](#example-import-straight-to-the-remote) example -for more details. +When you don't want to store the target data in your local system, you can still +create an import `.dvc` file while transferring a file or directory directly to +remote storage, by using the `--to-remote` option. See the +[Import straight to remote](#example-import-straight-to-remote) example for +more details. The `url` argument specifies the external location of the data to be imported. The imported data is cached, and linked (or copied) to the current From 6c8a592f2f4cc8d2de1e064e4d0835567fc34c62 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 22 Jan 2021 23:07:35 -0600 Subject: [PATCH 20/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 004fe63930..0fc0b60cee 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -352,7 +352,7 @@ Running stage 'prepare' with command: python src/prepare.py data/data.xml ``` -## Example: Import straight to the remote +## Example: Import straight to remote When you have a massive dataset in a distant location, and working on a computer which can't actually store it locally (due to not having enough disk space) but From 5737bd21c1dcebca4ffa61358a9a3fb91911d02a Mon Sep 17 00:00:00 2001 From: "Restyled.io" Date: Sat, 23 Jan 2021 05:07:55 +0000 Subject: [PATCH 21/39] Restyled by prettier --- content/docs/command-reference/import-url.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 0fc0b60cee..6559e8c86d 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -42,8 +42,8 @@ may require installing a different tool for each type. When you don't want to store the target data in your local system, you can still create an import `.dvc` file while transferring a file or directory directly to remote storage, by using the `--to-remote` option. See the -[Import straight to remote](#example-import-straight-to-remote) example for -more details. +[Import straight to remote](#example-import-straight-to-remote) example for more +details. The `url` argument specifies the external location of the data to be imported. The imported data is cached, and linked (or copied) to the current From 96d767fd85b32e565c37ec55621fa28193193d9e Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Fri, 5 Feb 2021 12:41:48 +0300 Subject: [PATCH 22/39] proper initalization Co-authored-by: Jorge Orpinel --- content/docs/command-reference/import-url.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 6559e8c86d..e0b2181da6 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -364,16 +364,15 @@ It will try to import the data into the remote storage that you choose, and when you or any of your colleagues want to copy the data to their systems, they could just simply [pull](/doc/command-reference/remote). Let's do a simple example -We initalize 2 directories, one being the remote storage unit and the other one -is the workspace. +Let's initialize a new project, and add a local [remote](/doc/command-reference/remote): ```dvc -$ mkdir /tmp/dvc-import-url-straight-to-remote/ -$ mkdir /tmp/remote -$ cd /tmp/dvc-import-url-straight-to-remote/ +$ mkdir example # workspace +$ mkdir /tmp/dvc-storage # remote storage +$ cd example $ git init $ dvc init -$ dvc remote add tmp_remote /tmp/remote +$ dvc remote add local_remote /tmp/dvc-storage ``` For transferring a source from a remote location, to the given remote you can From 133a939c967f32c6ee6db1182c9d0e4d07085269 Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Fri, 5 Feb 2021 12:49:13 +0300 Subject: [PATCH 23/39] suggestions --- content/docs/command-reference/import-url.md | 43 ++++++++------------ 1 file changed, 17 insertions(+), 26 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index e0b2181da6..d9d89b06c3 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -352,19 +352,13 @@ Running stage 'prepare' with command: python src/prepare.py data/data.xml ``` -## Example: Import straight to remote +## Example: Transfer to remote storage -When you have a massive dataset in a distant location, and working on a computer -which can't actually store it locally (due to not having enough disk space) but -you still want to take it under control of DVC just like in the scenario of -importing it and then pushing it to the remote, then you can use `--to-remote` -flag. +When you have a massive dataset in a distant location and want to import it to +your remote storage directly, you can use `--to-remote` option. -It will try to import the data into the remote storage that you choose, and when -you or any of your colleagues want to copy the data to their systems, they could -just simply [pull](/doc/command-reference/remote). Let's do a simple example - -Let's initialize a new project, and add a local [remote](/doc/command-reference/remote): +Let's initialize a new project, and add a local +[remote](/doc/command-reference/remote): ```dvc $ mkdir example # workspace @@ -375,11 +369,16 @@ $ dvc init $ dvc remote add local_remote /tmp/dvc-storage ``` -For transferring a source from a remote location, to the given remote you can -combine `import-url` with `--to-remote` option which basically does the whole -importing and [push](/doc/command-reference/push)ing operation under the hood -but without actually downloading everything in once, but rather transferring -gradually. +Now let's create an import .dvc file without downloading the target data, but +transferring directly to remote storage instead: + +``` +$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ + --to-remote -r local_remote +To track the changes with git, run: + + git add data.xml.dvc +``` When you run the `import-url` with `--to-remote`, you pass as usual the remote location and the output filename, afterward if you haven't set a default @@ -388,16 +387,8 @@ remote with `-r`/`--remote` flag and it will start the transfer and leave a DVC file as an only side effect on your workspace (everything else happens in the remote storage unit) -``` -$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml --to-remote -r tmp_remote -To track the changes with git, run: - - git add data.xml.dvc -``` - -Whenever anyone wants to actually get this file, like when they have a system -which can handle it, it is just a simple [pull](/doc/command-reference/pull) -operation. +Whenever anyone wants to actually download the imported data (for example from a +system that can handle it), they can use `dvc pull` as usual: ``` $ dvc pull data.xml.dvc -r tmp_remote From 6c7f65a32336350ccf85e03131bb882b10feeaca Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Fri, 5 Feb 2021 12:58:37 +0300 Subject: [PATCH 24/39] rebase --- content/docs/command-reference/add.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 5291a0d495..71ed2b7d06 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -38,8 +38,8 @@ other DVC commands), a few actions are taken under the hood: 1. Calculate the file hash. 2. Move the file contents to the cache (by default in `.dvc/cache`) (or to - remote storage if `--to-remote` is given), using the file hash to form - the cached file path. (See + remote storage if `--to-remote` is given), using the file hash to form the + cached file path. (See [Structure of cache directory](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) for more details.) 3. Attempt to replace the file with a link to the cached data (more details on From 194a764cb7e77cdb9d4bf7005a8072299f9949e3 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 17:56:17 -0600 Subject: [PATCH 25/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index d9d89b06c3..e9b568c1ad 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -42,7 +42,7 @@ may require installing a different tool for each type. When you don't want to store the target data in your local system, you can still create an import `.dvc` file while transferring a file or directory directly to remote storage, by using the `--to-remote` option. See the -[Import straight to remote](#example-import-straight-to-remote) example for more +[Import straight to remote](#example-transfer-to-remote-storage) example for more details. The `url` argument specifies the external location of the data to be imported. From 0dd63c776975e63a64f63f941018d39e978d37db Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 17:56:30 -0600 Subject: [PATCH 26/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index e9b568c1ad..cc54e1aede 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -140,7 +140,7 @@ $ dvc run -n download_data \ want to "DVCfy" this state of the project (see also `dvc commit`). - `--to-remote` - Import an external target, but don't move it into the - workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it + workspace, nor cache it. [Transfer](#example-transfer-to-remote-storage) it directly to remote storage instead. Use `dvc pull` to get the data locally. - `-r `, `--remote ` - name of the From 8e66b2b6be8d1cae7984d57ad9d0f7b4622fe40d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:00:18 -0600 Subject: [PATCH 27/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index cc54e1aede..4e3f5cf276 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -354,11 +354,10 @@ Running stage 'prepare' with command: ## Example: Transfer to remote storage -When you have a massive dataset in a distant location and want to import it to -your remote storage directly, you can use `--to-remote` option. - -Let's initialize a new project, and add a local -[remote](/doc/command-reference/remote): +When you have a large dataset in an external location and want to import it to +you project without using your local disk, you can use the `--to-remote` option +to transfer it directly to remote storage. Let's initialize a new project, and +add a local [remote](/doc/command-reference/remote): ```dvc $ mkdir example # workspace From a473848288730319aaf3fe231e8b0efbcc7e1547 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:08:15 -0600 Subject: [PATCH 28/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 4e3f5cf276..425f2e48e7 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -354,10 +354,11 @@ Running stage 'prepare' with command: ## Example: Transfer to remote storage -When you have a large dataset in an external location and want to import it to -you project without using your local disk, you can use the `--to-remote` option -to transfer it directly to remote storage. Let's initialize a new project, and -add a local [remote](/doc/command-reference/remote): +When you have a large dataset in an external location, you may want to import it +to you project without downloading it to the local file system. The +`--to-remote` option lets you transfer it directly to +[remote storage](/doc/command-reference/remote). Let's initialize a DVC project, +and setup a remote: ```dvc $ mkdir example # workspace From e5b9d4ef17684527e527a1bff3657363dff68953 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:11:21 -0600 Subject: [PATCH 29/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 425f2e48e7..97dd869caf 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -362,11 +362,11 @@ and setup a remote: ```dvc $ mkdir example # workspace -$ mkdir /tmp/dvc-storage # remote storage +$ mkdir /tmp/dvc-storage $ cd example $ git init $ dvc init -$ dvc remote add local_remote /tmp/dvc-storage +$ dvc remote add myremote /tmp/dvc-storage ``` Now let's create an import .dvc file without downloading the target data, but From d7ca231664a83dc9ce86dbabc493c25f07cdb71d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:14:03 -0600 Subject: [PATCH 30/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 97dd869caf..b6c31f255d 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -369,8 +369,8 @@ $ dvc init $ dvc remote add myremote /tmp/dvc-storage ``` -Now let's create an import .dvc file without downloading the target data, but -transferring directly to remote storage instead: +Now let's create an import `.dvc` file without downloading the target data, +transferring it directly to remote storage instead: ``` $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ From 65ce340e34b0fe115582c45849af9f8c297f52fb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:16:10 -0600 Subject: [PATCH 31/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index b6c31f255d..14ed788990 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -355,10 +355,10 @@ Running stage 'prepare' with command: ## Example: Transfer to remote storage When you have a large dataset in an external location, you may want to import it -to you project without downloading it to the local file system. The -`--to-remote` option lets you transfer it directly to -[remote storage](/doc/command-reference/remote). Let's initialize a DVC project, -and setup a remote: +to you project without downloading it to the local file system (for using it +later/elsewhere). The `--to-remote` option lets you skip the download, while +storing the imported data [remotely](/doc/command-reference/remote). Let's +initialize a DVC project, and setup a remote: ```dvc $ mkdir example # workspace From c6351f35ba8c484b1214b58f5a4349d576025240 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:17:10 -0600 Subject: [PATCH 32/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 14ed788990..5756911991 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -375,9 +375,7 @@ transferring it directly to remote storage instead: ``` $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ --to-remote -r local_remote -To track the changes with git, run: - - git add data.xml.dvc +... ``` When you run the `import-url` with `--to-remote`, you pass as usual the remote From ee249635194136791c3b2da1638b668c769ca9e0 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 6 Feb 2021 18:17:48 -0600 Subject: [PATCH 33/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 5756911991..62f8260c0e 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -374,7 +374,7 @@ transferring it directly to remote storage instead: ``` $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ - --to-remote -r local_remote + --to-remote -r myremote ... ``` From 89c1bb9d4bff7d1da423297e224ccdd82f1f7ebb Mon Sep 17 00:00:00 2001 From: Batuhan Taskaya Date: Mon, 8 Feb 2021 14:44:51 +0300 Subject: [PATCH 34/39] changes --- content/docs/command-reference/add.md | 7 +++--- content/docs/command-reference/import-url.md | 25 +++++++++++--------- 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 71ed2b7d06..763f4fb073 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -164,9 +164,10 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - Track an external target, but don't move it into the - workspace, nor cache it. [Transfer](#transferring-data-directly-to-the-remote) - it directly to remote storage instead. Use `dvc pull` to get the data locally. +- `--to-remote` - Import an external target, but don't move it into the + workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it + directly to remote storage (the default one, unless `-r` is specified) + instead. Use `dvc pull` to get the data locally. - `-r `, `--remote ` - name of the [remote storage](/doc/command-reference/remote) to transfer external target to diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 62f8260c0e..178df8f9f8 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -42,8 +42,8 @@ may require installing a different tool for each type. When you don't want to store the target data in your local system, you can still create an import `.dvc` file while transferring a file or directory directly to remote storage, by using the `--to-remote` option. See the -[Import straight to remote](#example-transfer-to-remote-storage) example for more -details. +[Import straight to remote](#example-transfer-to-remote-storage) example for +more details. The `url` argument specifies the external location of the data to be imported. The imported data is cached, and linked (or copied) to the current @@ -140,11 +140,13 @@ $ dvc run -n download_data \ want to "DVCfy" this state of the project (see also `dvc commit`). - `--to-remote` - Import an external target, but don't move it into the - workspace, nor cache it. [Transfer](#example-transfer-to-remote-storage) it - directly to remote storage instead. Use `dvc pull` to get the data locally. + workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it + directly to remote storage (the default one, unless `-r` is specified) + instead. Use `dvc pull` to get the data locally. - `-r `, `--remote ` - name of the - [remote storage](/doc/command-reference/remote) + [remote storage](/doc/command-reference/remote) (can only be used with + `--to-remote`). - `--desc ` - user description of the data (optional). This doesn't affect any DVC operations. @@ -378,12 +380,13 @@ $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ ... ``` -When you run the `import-url` with `--to-remote`, you pass as usual the remote -location and the output filename, afterward if you haven't set a default -[remote](/doc/command-reference/remote) yet, you can simply pass the name of the -remote with `-r`/`--remote` flag and it will start the transfer and leave a DVC -file as an only side effect on your workspace (everything else happens in the -remote storage unit) +After importing `data.xml` to our remote storage unit, the only change in our +local workspace is the newly created dvc file for `data.xml`. + +``` +$ ls +data.xml.rc +``` Whenever anyone wants to actually download the imported data (for example from a system that can handle it), they can use `dvc pull` as usual: From 1d5ef740a37b86ad76e02c21e4f83ddb82d492ef Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Feb 2021 22:24:13 -0600 Subject: [PATCH 35/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 6360949b19..2fe66a1cab 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -368,10 +368,10 @@ initialize a DVC project, and setup a remote: ```dvc $ mkdir example # workspace -$ mkdir /tmp/dvc-storage $ cd example $ git init $ dvc init +$ mkdir /tmp/dvc-storage $ dvc remote add myremote /tmp/dvc-storage ``` From 25b0cdf097dec4b4088917f21450d2e2161dc316 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Feb 2021 22:30:42 -0600 Subject: [PATCH 36/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 2fe66a1cab..1a04fc4642 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -384,12 +384,12 @@ $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ ... ``` -After importing `data.xml` to our remote storage unit, the only change in our -local workspace is the newly created dvc file for `data.xml`. +The only change in our local workspace is a newly created import +`.dvc` file: -``` +```dvc $ ls -data.xml.rc +data.xml.dvc ``` Whenever anyone wants to actually download the imported data (for example from a From c036a0765e97102d97e996269686d8f4ae6efbad Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Feb 2021 00:31:16 -0600 Subject: [PATCH 37/39] Update content/docs/command-reference/add.md --- content/docs/command-reference/add.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 763f4fb073..0d0a527acd 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -164,7 +164,7 @@ not. > Note that external outputs typically require an external cache setup. See > link above for more details. -- `--to-remote` - Import an external target, but don't move it into the +- `--to-remote` - import an external target, but don't move it into the workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it directly to remote storage (the default one, unless `-r` is specified) instead. Use `dvc pull` to get the data locally. From 46b51645d1b51e700ef4edc9c408e134965ca9fb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Feb 2021 00:31:57 -0600 Subject: [PATCH 38/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 1a04fc4642..024dfffa4c 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -42,7 +42,7 @@ may require installing a different tool for each type. When you don't want to store the target data in your local system, you can still create an import `.dvc` file while transferring a file or directory directly to remote storage, by using the `--to-remote` option. See the -[Import straight to remote](#example-transfer-to-remote-storage) example for +[Transfer to remote storage](#example-transfer-to-remote-storage) example for more details. The `url` argument specifies the external location of the data to be imported. From d58af5bfdf4a992156eaa84f85acdb1cf955c4fc Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Feb 2021 00:32:22 -0600 Subject: [PATCH 39/39] Update content/docs/command-reference/import-url.md --- content/docs/command-reference/import-url.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 024dfffa4c..c00998fabb 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -139,7 +139,7 @@ $ dvc run -n download_data \ finish the operation(s)); or if the target data already exist locally and you want to "DVCfy" this state of the project (see also `dvc commit`). -- `--to-remote` - Import an external target, but don't move it into the +- `--to-remote` - import an external target, but don't move it into the workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it directly to remote storage (the default one, unless `-r` is specified) instead. Use `dvc pull` to get the data locally.