|
1 | 1 | # DVC Get Started |
2 | 2 |
|
3 | 3 | This is an auto-generated repository for use in https://dvc.org/doc/get-started. |
4 | | -Please report any issues in |
| 4 | +Please report any issues in its source project, |
5 | 5 | [example-repos-dev](https://github.com/iterative/example-repos-dev). |
6 | 6 |
|
7 | 7 |  |
@@ -34,8 +34,10 @@ $ source .env/bin/activate |
34 | 34 | $ pip install -r src/requirements.txt |
35 | 35 | ``` |
36 | 36 |
|
37 | | -This DVC project comes with a preconfigured remote DVC storage that has raw data |
38 | | -(input), intermediate, and final results that are produced. |
| 37 | +This DVC project comes with a preconfigured DVC |
| 38 | +[remote storage](https://dvc.org/doc/commands-reference/remote) that holds raw |
| 39 | +data (input), intermediate, and final results that are produced. This is a |
| 40 | +read-only HTTP remote. |
39 | 41 |
|
40 | 42 | ```console |
41 | 43 | $ dvc remote list |
@@ -87,38 +89,41 @@ are run in the DVC [get started](https://dvc.org/doc/get-started) guide. Feel |
87 | 89 | free to checkout one of them and play with the DVC commands having the |
88 | 90 | playground ready. |
89 | 91 |
|
90 | | -- `0-empty` - empty Git repository. |
91 | | -- `1-initialize` - DVC has been initialized. The `.dvc` with the cache directory |
| 92 | +- `0-empty`: Empty Git repository initialized. |
| 93 | +- `1-initialize`: DVC has been initialized. `.dvc/` with the cache directory |
92 | 94 | created. |
93 | | -- `2-remote` - remote HTTP storage initialized. It is a shared read only storage |
| 95 | +- `2-remote`: Remote HTTP storage initialized. It's a shared read only storage |
94 | 96 | that contains all data artifacts produced during next steps. |
95 | | -- `3-add-file` - raw data file `data.xml` downloaded and put under DVC |
96 | | - control with [`dvc add`](https://man.dvc.org/add). First `.dvc` meta-file |
97 | | - created. |
98 | | -- `4-source` - source code downloaded and put under Git control. |
99 | | -- `5-preparation` - first DVC stage created using |
| 97 | +- `3-add-file`: Raw data file `data.xml` downloaded and put under DVC control |
| 98 | + with [`dvc add`](https://man.dvc.org/add). First DVC-file (`.dvc` file |
| 99 | + extension) created. |
| 100 | +- `4-source`: Source code downloaded and put under Git control. |
| 101 | +- `5-preparation`: First stage file (DVC-file) created using |
100 | 102 | [`dvc run`](https://man.dvc.org/run). It transforms XML data into TSV. |
101 | | -- `6-featurization` - feature extraction step added. It also includes the split |
102 | | - step for simplicity. It takes data in TSV format and produces two `.pkl` files |
103 | | - that contain serialized feature matrices. |
104 | | -- `7-train` - the model training stage added. It produces `model.pkl` file - the |
105 | | - actual result that can be then deployed somewhere and classify questions. |
106 | | -- `8-evaluate` - evaluate stage, we run it on a test dataset to see the AUC |
107 | | - value for the model. The result is dumped into a DVC metric file so that we |
108 | | - can compare it with other experiments later. |
109 | | -- `9-bigrams` - bigrams experiment, code has been modified to extract more |
| 103 | +- `6-featurization`: Feature extraction stage created. It takes data in TSV |
| 104 | + format and produces two `.pkl` files that contain serialized feature matrices. |
| 105 | +- `7-train`: Model training stage created. It produces `model.pkl` file – the |
| 106 | + actual result that can then get deployed to an app that implements NLP |
| 107 | + classification. |
| 108 | +- `8-evaluate`: Evaluation stage. Runs the model on a test dataset to produce |
| 109 | + its performance AUC value. The result is dumped into a DVC metric file so that |
| 110 | + we can compare it with other experiments later. |
| 111 | +- `9-bigrams-model`: Bigrams experiment, code has been modified to extract more |
110 | 112 | features. We run [`dvc repro`](https://man.dvc.org/repro) for the first time |
111 | 113 | to illustrate how DVC can reuse cached files and detect changes along the |
112 | | - computational graph. |
| 114 | + computational graph, regenerating the model with the updated data. |
| 115 | +- `10-bigrams-experiment`: Reproduce the evaluation stage with the bigrams based |
| 116 | + model. |
113 | 117 |
|
114 | 118 | There are two additional tags: |
115 | 119 |
|
116 | | -- `baseline-experiment` - the first end-to-end result that we performance metric |
| 120 | +- `baseline-experiment`: First end-to-end result that we have performance metric |
117 | 121 | for. |
118 | | -- `bigrams-experiment` - second version of the experiment. |
| 122 | +- `bigrams-experiment`: Second experiment (model trained using bigrams |
| 123 | + features). |
119 | 124 |
|
120 | | -Both these tags could be used to illustrate `-a` or `-T` options across |
121 | | -different [DVC commands](https://man.dvc.org/). |
| 125 | +These tags can be used to illustrate `-a` or `-T` options across different |
| 126 | +[DVC commands](https://man.dvc.org/). |
122 | 127 |
|
123 | 128 | ## Project Structure |
124 | 129 |
|
|
0 commit comments