From abd8af58fc5fff1c507057ef7363f955deaa32f4 Mon Sep 17 00:00:00 2001 From: Casper da Costa-Luis Date: Fri, 9 Apr 2021 13:53:22 +0100 Subject: [PATCH] respond to misc review comments --- content/docs/start/data-and-model-access.md | 2 +- .../docs/start/data-and-model-versioning.md | 24 ++++++++++--------- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/content/docs/start/data-and-model-access.md b/content/docs/start/data-and-model-access.md index 6a7525bb8f..d99512d978 100644 --- a/content/docs/start/data-and-model-access.md +++ b/content/docs/start/data-and-model-access.md @@ -114,5 +114,5 @@ with dvc.api.open( 'get-started/data.xml', repo='https://github.com/iterative/dataset-registry' ) as fd: - # fd is a file descriptor which can be used here + # fd is a file descriptor which can be processed normally ``` diff --git a/content/docs/start/data-and-model-versioning.md b/content/docs/start/data-and-model-versioning.md index 6096e14fe5..faddf580cd 100644 --- a/content/docs/start/data-and-model-versioning.md +++ b/content/docs/start/data-and-model-versioning.md @@ -25,8 +25,8 @@ To start tracking a file or directory, use `dvc add`: ### ⚙️ Expand to get an example dataset. -Having initialized a project in the previous section, get the data file which we -will be using later like this: +Having initialized a project in the previous section, we can get the data file +(which we'll be using later) like this: ```dvc $ dvc get https://github.com/iterative/dataset-registry \ @@ -48,16 +48,18 @@ $ dvc add data/data.xml ``` DVC stores information about the added file (or a directory) in a special `.dvc` -file named `data/data.xml.dvc` - a small text file with a human-readable -[format](/doc/user-guide/project-structure/dvc-files). This metadata file can be -easily versioned like source code with Git. The original data, meanwhile, is -listed in `.gitignore`: +file named `data/data.xml.dvc` — a small text file with a human-readable +[format](/doc/user-guide/project-structure/dvc-files). This metadata file is a +placeholder for the original data, and can be easily versioned like source code +with Git: ```dvc $ git add data/data.xml.dvc data/.gitignore $ git commit -m "Add raw data" ``` +The original data, meanwhile, is listed in `.gitignore`. +
### 💡 Expand to see what happens under the hood. @@ -89,7 +91,7 @@ outs: You can upload DVC-tracked data or model files with `dvc push`, so they're safely stored [remotely](/doc/command-reference/remote). This also means they can be retrieved on other environments later with `dvc pull`. First, we need to -setup a storage provider: +setup a storage location: ```dvc $ dvc remote add -d storage s3://mybucket/dvcstore @@ -103,7 +105,7 @@ $ git commit -m "Configure remote storage"
-### ⚙️ Expand to set up a remote storage provider ☁ +### ⚙️ Expand to set up a remote storage location. DVC remotes let you store a copy of the data tracked by DVC outside of the local cache (usually a cloud storage service). For simplicity, let's set up a _local @@ -156,7 +158,7 @@ run it after `git clone` and `git pull`.
-### ⚙️ Expand to refresh the project ⟳ +### ⚙️ Expand to delete locally cached data. If you've run `dvc push`, you can delete the cache (`.dvc/cache`) and `data/data.xml` to experiment with `dvc pull`: @@ -237,8 +239,8 @@ $ git commit data/data.xml.dvc -m "Revert dataset updates"
-Yes, DVC is technically not even a version control system! `.dvc` files' content -defines data file versions. Git itself provides the version control. DVC in turn +Yes, DVC is technically not even a version control system! `.dvc` file contents +define data file versions. Git itself provides the version control. DVC in turn creates these `.dvc` files, updates them, and synchronizes DVC-tracked data in the workspace efficiently to match them.