Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc-cml container working with Gitlab and Github #12

Merged
merged 4 commits into from
Mar 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: deploy
shcheklein marked this conversation as resolved.
Show resolved Hide resolved

on:
push:
branches:
- master

jobs:
publish_dockehub:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Publish to dockerhub
uses: elgohr/Publish-Docker-Github-Action@master
with:
name: dvcorg/dvc-cml
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
dockerfile: ./docker/Dockerfile
context: ./
cache: true
tag_names: true

2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: test

on: [pull_request]
on: [push, pull_request]

jobs:
check:
Expand Down
149 changes: 96 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,48 +8,58 @@
## Introduction

DVC is a great tool as a data versioning system, but also is great as a build
tool for ML experimentation. This action offers the possibility of using DVC to
establish your ML pipeline to be run by Github Actions CI/CD were you could use
[your own runners](https://help.github.com/en/actions/hosting-your-own-runners)
with special capabilities like GPUs.
tool for ML experimentation. This repo offers the possibility of using DVC to
establish your ML pipeline to be run by Github Actions runners or Gitlab
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github Action runners

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Github product is named Github Action, maybe has to be double quoted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was about typo Actions - Action .. in english you don't put two plurals one after another

runners.

Major beneficts of using DVC-action in your ML projects includes:
You can also deploy
[your own Github runners](https://help.github.com/en/actions/hosting-your-own-runners)
or [your own Gitlab runners](https://docs.gitlab.com/runner/) with special
capabilities like GPUs...

Major benefits of using DVC-CML in your ML projects includes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still don't understand this list of benefits ... can you summarize them w/o this official language - like A,B,C - the way you understand them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


- Reproducibility: DVC is always in charge of maintain your experiment tracking
all the dependencies, so you don't have to. Additionally your experiment is
always running under the same software constrains so you dont have to worry
about replicating the same enviroment again.
always running under the same constrains so you don't have to worry about
replicating the same environment again.
- Observability: DVC offers you metrics to be tracked. In DVC-action we make
those metrics more human friendly and we also offer direct access to other
experiments run.
- Releases: DVC-action tags every experiment that runs with repro. Aside of that
DVC-action is just a job inside your workflow that could generate your model
releases or deployment according to your bussiness requirements.
experiments run through the DVC Report offered as checks in Github or Releases
in Gitlab.
- Releases: DVC-action tags every experiment that runs with repro generating the
report. Aside of that DVC-CML is just a step in your
[Github Workflow](https://help.github.com/en/actions/getting-started-with-github-actions/core-concepts-for-github-actions#workflow)
or [Gitlab Pipeline](https://docs.gitlab.com/ee/ci/quick_start/) that could
generate your model releases or deployment according to your business
requirements.
- Teaming: Give visibility to your experiments or releases to your teammates
working toguether.
working together.

The action performs in your push or pull requests:
DVC-cml performs in your push or pull requests:

1. DVC [repro](https://dvc.org/doc/command-reference/repro)
2. Push changes into DVC remote and Git remote
3. Generates a DVC Report as a Github check displaying all the experiment
metrics
3.

- In Github generates a Github check displaying the DVC Report
- In Gitlab generates a Tag/Release displaying the DVC Report

![image](https://user-images.githubusercontent.com/414967/75673142-854ad800-5c82-11ea-97f4-256beca83754.png)
![image](https://user-images.githubusercontent.com/414967/75673087-677d7300-5c82-11ea-8ccb-be6a4f81eb5d.png)

## Usage

<details>
<summary>DVC-CML for Github</summary>

> :eyes: Knowledge of [Github Actions](https://help.github.com/en/actions) and
> [DVC pipeline](https://dvc.org/doc/get-started/pipeline) is very useful for a
> fully comprenhension.
> fully comprehension.

This action depends on:
Example of a simple DVC-CML workflow:

- actions/checkout V2
- actions/setup-python

Example of a simple DVC-action workflow:
> :eyes: Note the use of the container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain why do I need to note this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not forget it in the job definition. It might be a pitfall. People adding DVC-CML inside an existing job that and the don't add the container section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, put that explanation in the note itself?


```yaml
name: your-workflow-name
Expand All @@ -59,66 +69,98 @@ on: [push, pull_request]
jobs:
run:
runs-on: ubuntu-latest
container: docker://dvcorg/dvc-cml:latest

steps:
- uses: actions/checkout@v2

- name: setup python
uses: actions/setup-python@v1
with:
python-version: 3.6

- uses: iterative/dvc-action
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
repro_targets: your-file.dvc
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: dvc_action_run
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
repro_targets: your_dvc_target.dvc
run: |
# install your project dependencies
dvc_action_run
```

</details>

<details>
<summary>DVC-CML for Gitlab</summary>

> :eyes: Knowledge of
> [Gitlab CI/CD Pipeline](https://docs.gitlab.com/ee/ci/quick_start/) and
> [DVC pipeline](https://dvc.org/doc/get-started/pipeline) is very useful for a
> fully comprehension.

Example of a simple DVC-CML workflow in Gitlab:

> :eyes: Some required environment variables like remote credentials and
> GITLAB_TOKEN are set as CI/CD environment variables in Gitlab's UI

```yaml
stages:
- dvc_action_run

dvc:
stage: dvc_action_run
image: dvcorg/dvc-cml:latest
variables:
repro_targets: 'eval.dvc'
script:
- pip install tensorflow wget
- dvc_action_run
```

This workflow will run everytime that you push code or do a Pull Request. When
triggered DVC-action will setup the runner and DVC will run the pipelines
specified by repro_targets. Two scenarios may happen:
</details>

This workflow will run every time that you push code or do a Pull/Merge Request.
When triggered DVC-CML will setup the runner and DVC will run the pipelines
specified by `repro_targets`. Two scenarios may happen:

1. DVC repro is up to date and there is nothing to do. This means that the
commit that you have done in your code is not related to your DVC pipelines
and there is nothing to do.
2. DVC pipeline has changed and DVC will run repro, updating the output that may
generate (models, data...) in your DVC remote storage and then commiting,
generate (models, data...) in your DVC remote storage and then committing,
tagging and pushing the changes in git remote.

Additionally, you may extend your workflow to generate your releases or even
deploy automatically your models.
Additionally, you may extend your CI/CD Pipeline/Workflow to generate your
releases or even deploy automatically your models.

## Input variables
### Support for [ci skip] comment in Github

| Variable | Type | Required | Default | Info |
| -------------------- | ------ | -------- | ------------- | -------------------------------------------------------------------------------------- |
| github_token | string | yes | | Is the github_token, this is setted automatically by Github as a secret. |
| repro_targets | string | no | Dvcfile | Comma delimited array of DVC files. If None is given will skip the process. |
| metrics_diff_targets | string | no | | Comma delimited array of metrics files. If not specified will use all the metric files |
| rev | string | no | origin/master | Revision to be compared with current experiment. I.E. HEAD~1. |

### Support for [ci skip] comment

Many CI/CD verdors supports a special comment [ci skip] in the commit avoid run
Many CI/CD vendors supports a special comment [ci skip] in the commit avoid run
the CI. We support this, ff your commit comment includes the tag the DVC action
will skip the CI returning an exit code of 0. We know that ideally the code
should be 78 however, at the time of this writing, Github is only accepting 0 or
1 as status codes.

### env variables

| Variable | Type | Required | Default | Info |
| -------------------- | ------ | -------- | ------------- | ------------------------------------------------------------------------------------------- |
| github_token | string | yes | | Is the github_token, this is set automatically by Github as a secret. Only Needed on Github |
| repro_targets | string | no | Dvcfile | Comma delimited array of DVC files. If None is given will skip the process. |
| metrics_diff_targets | string | no | | Comma delimited array of metrics files. If not specified will use all the metric files |
| rev | string | no | origin/master | Revision to be compared with current experiment. I.E. HEAD~1. |

> :warning: In Gitlab is required that you generate the GITLAB_TOKEN that is
> analogous to GITHUB_TOKEN. See
> [Tensorflow Mnist in Gitlab](#tensorflow-mnist-in-gitlab) example for a
> complete walkthrough.

DVC remote is setup using env variables see
[Working with DVC remotes](#working-with-dvc-remotes).

## Working with DVC remotes

DVC support different kinds of remote
[storage](https://dvc.org/doc/command-reference/remote/add). To setup them
properly you have to setup credentials (if needed) as enviroment variables. We
choose env variables and not inputs to be compatible with other github actions
properly you have to setup credentials (if needed) as environment variables. We
choose env variables and not inputs to be compatible with other Github Actions
that set credentials like
https://github.com/aws-actions/configure-aws-credentials.
We recommend you to set those variables as
Expand Down Expand Up @@ -193,4 +235,5 @@ env:

## Examples

- [Tensorflow Mnist](https://github.com/DavidGOrtega/dvc-action/wiki/Tensorflow-Mnist)
- [Tensorflow Mnist for Github Actions](https://github.com/iterative/dvc-cml/wiki/Tensorflow-Mnist-for-Github-Actions)
- [Tensorflow Mnist for Gitlab CI](https://github.com/iterative/dvc-cml/wiki/Tensorflow-Mnist-for-Gitlab-CI)
27 changes: 2 additions & 25 deletions action.yml
Original file line number Diff line number Diff line change
@@ -1,28 +1,5 @@
name: 'DVC-action'
description: 'DVC - continuous delivery for machine learning'

inputs:
github_token:
required: true

dvc_pull:
required: false
default: true

repro_targets:
required: false
default: 'Dvcfile'

metrics_diff_targets:
required: false

metrics_format:
required: false
default: '0[.][0000000]'

rev:
required: false
default: 'origin/master'
name: 'dvc-cml'
description: 'DVC - continuous machine learning'

runs:
using: 'node12'
Expand Down
3 changes: 3 additions & 0 deletions bin/dvc-action-run.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env node

require('./../index');
2 changes: 1 addition & 1 deletion dist/index.js

Large diffs are not rendered by default.

27 changes: 27 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
FROM ubuntu:18.04

LABEL Iterative, Inc

RUN apt-get update --fix-missing
RUN apt-get install -y \
curl \
apt-transport-https \
ca-certificates \
software-properties-common

RUN add-apt-repository ppa:git-core/ppa -y
RUN apt-get update
RUN apt-get install -y git

RUN apt-get install -y python-pip
RUN apt-get install -y python3-pip
RUN yes | pip3 install dvc[all]

RUN curl -sL https://deb.nodesource.com/setup_12.x | bash
RUN apt install -y nodejs

ADD "./" "/dvc-cml"
RUN npm install -g /dvc-cml

COPY "./docker/entrypoint.sh" "/entrypoint.sh"
CMD ["/entrypoint.sh"]
3 changes: 3 additions & 0 deletions docker/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
set -e
eval $run
dvc_action_run
Loading