|
1 | 1 | # DVC Get Started
|
2 | 2 |
|
3 | 3 | This is an auto-generated repository for use in https://dvc.org/doc/get-started.
|
4 |
| -Please report any issues in |
| 4 | +Please report any issues in its source project, |
5 | 5 | [example-repos-dev](https://github.com/iterative/example-repos-dev).
|
6 | 6 |
|
7 | 7 | 
|
@@ -34,8 +34,10 @@ $ source .env/bin/activate
|
34 | 34 | $ pip install -r src/requirements.txt
|
35 | 35 | ```
|
36 | 36 |
|
37 |
| -This DVC project comes with a preconfigured remote DVC storage that has raw data |
38 |
| -(input), intermediate, and final results that are produced. |
| 37 | +This DVC project comes with a preconfigured DVC |
| 38 | +[remote storage](https://dvc.org/doc/commands-reference/remote) that holds raw |
| 39 | +data (input), intermediate, and final results that are produced. This is a |
| 40 | +read-only HTTP remote. |
39 | 41 |
|
40 | 42 | ```console
|
41 | 43 | $ dvc remote list
|
@@ -87,38 +89,41 @@ are run in the DVC [get started](https://dvc.org/doc/get-started) guide. Feel
|
87 | 89 | free to checkout one of them and play with the DVC commands having the
|
88 | 90 | playground ready.
|
89 | 91 |
|
90 |
| -- `0-empty` - empty Git repository. |
91 |
| -- `1-initialize` - DVC has been initialized. The `.dvc` with the cache directory |
| 92 | +- `0-empty`: Empty Git repository initialized. |
| 93 | +- `1-initialize`: DVC has been initialized. `.dvc/` with the cache directory |
92 | 94 | created.
|
93 |
| -- `2-remote` - remote HTTP storage initialized. It is a shared read only storage |
| 95 | +- `2-remote`: Remote HTTP storage initialized. It's a shared read only storage |
94 | 96 | that contains all data artifacts produced during next steps.
|
95 |
| -- `3-add-file` - raw data file `data.xml` downloaded and put under DVC |
96 |
| - control with [`dvc add`](https://man.dvc.org/add). First `.dvc` meta-file |
97 |
| - created. |
98 |
| -- `4-source` - source code downloaded and put under Git control. |
99 |
| -- `5-preparation` - first DVC stage created using |
| 97 | +- `3-add-file`: Raw data file `data.xml` downloaded and put under DVC control |
| 98 | + with [`dvc add`](https://man.dvc.org/add). First DVC-file (`.dvc` file |
| 99 | + extension) created. |
| 100 | +- `4-source`: Source code downloaded and put under Git control. |
| 101 | +- `5-preparation`: First stage file (DVC-file) created using |
100 | 102 | [`dvc run`](https://man.dvc.org/run). It transforms XML data into TSV.
|
101 |
| -- `6-featurization` - feature extraction step added. It also includes the split |
102 |
| - step for simplicity. It takes data in TSV format and produces two `.pkl` files |
103 |
| - that contain serialized feature matrices. |
104 |
| -- `7-train` - the model training stage added. It produces `model.pkl` file - the |
105 |
| - actual result that can be then deployed somewhere and classify questions. |
106 |
| -- `8-evaluate` - evaluate stage, we run it on a test dataset to see the AUC |
107 |
| - value for the model. The result is dumped into a DVC metric file so that we |
108 |
| - can compare it with other experiments later. |
109 |
| -- `9-bigrams` - bigrams experiment, code has been modified to extract more |
| 103 | +- `6-featurization`: Feature extraction stage created. It takes data in TSV |
| 104 | + format and produces two `.pkl` files that contain serialized feature matrices. |
| 105 | +- `7-train`: Model training stage created. It produces `model.pkl` file – the |
| 106 | + actual result that can then get deployed to an app that implements NLP |
| 107 | + classification. |
| 108 | +- `8-evaluate`: Evaluation stage. Runs the model on a test dataset to produce |
| 109 | + its performance AUC value. The result is dumped into a DVC metric file so that |
| 110 | + we can compare it with other experiments later. |
| 111 | +- `9-bigrams-model`: Bigrams experiment, code has been modified to extract more |
110 | 112 | features. We run [`dvc repro`](https://man.dvc.org/repro) for the first time
|
111 | 113 | to illustrate how DVC can reuse cached files and detect changes along the
|
112 |
| - computational graph. |
| 114 | + computational graph, regenerating the model with the updated data. |
| 115 | +- `10-bigrams-experiment`: Reproduce the evaluation stage with the bigrams based |
| 116 | + model. |
113 | 117 |
|
114 | 118 | There are two additional tags:
|
115 | 119 |
|
116 |
| -- `baseline-experiment` - the first end-to-end result that we performance metric |
| 120 | +- `baseline-experiment`: First end-to-end result that we have performance metric |
117 | 121 | for.
|
118 |
| -- `bigrams-experiment` - second version of the experiment. |
| 122 | +- `bigrams-experiment`: Second experiment (model trained using bigrams |
| 123 | + features). |
119 | 124 |
|
120 |
| -Both these tags could be used to illustrate `-a` or `-T` options across |
121 |
| -different [DVC commands](https://man.dvc.org/). |
| 125 | +These tags can be used to illustrate `-a` or `-T` options across different |
| 126 | +[DVC commands](https://man.dvc.org/). |
122 | 127 |
|
123 | 128 | ## Project Structure
|
124 | 129 |
|
|
0 commit comments