Skip to content

Commit

Permalink
update tag names and descriptions in example-get-started/code/README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeorpinel committed Aug 23, 2019
1 parent a205fdf commit 2bd28e9
Showing 1 changed file with 30 additions and 25 deletions.
55 changes: 30 additions & 25 deletions example-get-started/code/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# DVC Get Started

This is an auto-generated repository for use in https://dvc.org/doc/get-started.
Please report any issues in
Please report any issues in its source project,
[example-repos-dev](https://github.com/iterative/example-repos-dev).

![](https://dvc.org/static/img/example-flow-2x.png)
Expand Down Expand Up @@ -34,8 +34,10 @@ $ source .env/bin/activate
$ pip install -r src/requirements.txt
```

This DVC project comes with a preconfigured remote DVC storage that has raw data
(input), intermediate, and final results that are produced.
This DVC project comes with a preconfigured DVC
[remote storage](https://dvc.org/doc/commands-reference/remote) that holds raw
data (input), intermediate, and final results that are produced. This is a
read-only HTTP remote.

```console
$ dvc remote list
Expand Down Expand Up @@ -87,38 +89,41 @@ are run in the DVC [get started](https://dvc.org/doc/get-started) guide. Feel
free to checkout one of them and play with the DVC commands having the
playground ready.

- `0-empty` - empty Git repository.
- `1-initialize` - DVC has been initialized. The `.dvc` with the cache directory
- `0-empty`: Empty Git repository initialized.
- `1-initialize`: DVC has been initialized. `.dvc/` with the cache directory
created.
- `2-remote` - remote HTTP storage initialized. It is a shared read only storage
- `2-remote`: Remote HTTP storage initialized. It's a shared read only storage
that contains all data artifacts produced during next steps.
- `3-add-file` - raw data file `data.xml` downloaded and put under DVC
control with [`dvc add`](https://man.dvc.org/add). First `.dvc` meta-file
created.
- `4-source` - source code downloaded and put under Git control.
- `5-preparation` - first DVC stage created using
- `3-add-file`: Raw data file `data.xml` downloaded and put under DVC control
with [`dvc add`](https://man.dvc.org/add). First DVC-file (`.dvc` file
extension) created.
- `4-source`: Source code downloaded and put under Git control.
- `5-preparation`: First stage file (DVC-file) created using
[`dvc run`](https://man.dvc.org/run). It transforms XML data into TSV.
- `6-featurization` - feature extraction step added. It also includes the split
step for simplicity. It takes data in TSV format and produces two `.pkl` files
that contain serialized feature matrices.
- `7-train` - the model training stage added. It produces `model.pkl` file - the
actual result that can be then deployed somewhere and classify questions.
- `8-evaluate` - evaluate stage, we run it on a test dataset to see the AUC
value for the model. The result is dumped into a DVC metric file so that we
can compare it with other experiments later.
- `9-bigrams` - bigrams experiment, code has been modified to extract more
- `6-featurization`: Feature extraction stage created. It takes data in TSV
format and produces two `.pkl` files that contain serialized feature matrices.
- `7-train`: Model training stage created. It produces `model.pkl` file – the
actual result that can then get deployed to an app that implements NLP
classification.
- `8-evaluate`: Evaluation stage. Runs the model on a test dataset to produce
its performance AUC value. The result is dumped into a DVC metric file so that
we can compare it with other experiments later.
- `9-bigrams-model`: Bigrams experiment, code has been modified to extract more
features. We run [`dvc repro`](https://man.dvc.org/repro) for the first time
to illustrate how DVC can reuse cached files and detect changes along the
computational graph.
computational graph, regenerating the model with the updated data.
- `10-bigrams-experiment`: Reproduce the evaluation stage with the bigrams based
model.

There are two additional tags:

- `baseline-experiment` - the first end-to-end result that we performance metric
- `baseline-experiment`: First end-to-end result that we have performance metric
for.
- `bigrams-experiment` - second version of the experiment.
- `bigrams-experiment`: Second experiment (model trained using bigrams
features).

Both these tags could be used to illustrate `-a` or `-T` options across
different [DVC commands](https://man.dvc.org/).
These tags can be used to illustrate `-a` or `-T` options across different
[DVC commands](https://man.dvc.org/).

## Project Structure

Expand Down

0 comments on commit 2bd28e9

Please sign in to comment.