Skip to content

Commit

Permalink
who needs custom docker anyway
Browse files Browse the repository at this point in the history
  • Loading branch information
casperdcl committed May 18, 2022
1 parent 230682d commit f86e796
Showing 1 changed file with 18 additions and 112 deletions.
130 changes: 18 additions & 112 deletions content/blog/2022-05-24-local-experiments-to-cloud-with-tpi-docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
title:
Moving Local Experiments to the Cloud with Terraform Provider Iterative (TPI)
and Docker
date: 2022-02-28
date: 2022-05-24
description: >
Tutorial for easily running experiments in the cloud with the help of
Terraform Provider Iterative (TPI) and Docker.
descriptionLong: >
In this tutorial, learn how to build a Docker image and use it to run
experiments in the cloud with Terraform Provider Iterative (TPI).
In this tutorial, learn how to use Docker images to run experiments in the
cloud with Terraform Provider Iterative (TPI).
picture: 2022-02-28/unsplash-containers.jpg
author: maria_khalusova
author: casper_dcl
# todo: commentsUrl:
tags:
- MLOps
Expand Down Expand Up @@ -54,9 +54,9 @@ Platform, and Kubernetes. Your Docker image is cloud provider-agnostic. There
are thousands of [pre-defined Docker images online](https://hub.docker.com/)
too.

In the first part of this tutorial, we'll use an existing Docker image. The
second part of this tutorial will then cover some basics for building and using
your own Docker images.
In this tutorial, we'll use an existing Docker image that comes with most of our
requirements already installed. We'll then add add a few more dependencies on
top and run our training pipeline in the cloud as before!

## Run GPU-enabled Docker containers

Expand Down Expand Up @@ -113,114 +113,20 @@ infrastructure, sync data and code to it, set up the environment, and run the
training process. If you'd like to tinker with this example you can
[find it on GitHub](https://github.com/iterative/blog-tpi-bees/tree/docker).

## Building your own Docker image
<admon type="tip">

Suppose, we wanted our Docker container to include all the Python libraries
required to run our training pipeline? Well, these would be quite specific to
our project, so we'd have to build our own Docker image. Luckily we don't have
to build one completely from scratch, instead we can use one of the existing
publicly available images as a base to build on top of.
Don't forget to `terraform refresh && terraform show` to check the status, and
`terraform destroy` to download results & shut everything down.

For this example, let's use `nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu20.04` from
NVIDIA. Earlier, we didn't need to install Docker on the local machine because
it was only used on the cloud machine, but if you're going to build your own
image, you'll need to start by
[installing Docker locally](https://docs.docker.com/get-docker/).

To build an image, create a file called `Dockerfile` in your project's root
directory. This is a text-based script of instructions that will be used to
create a container image. In our case it'll be quite simple as most of the
complex setup is taken care of by NVIDIA's image we're using as a base:

```
FROM --platform=linux/amd64 nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu20.04
# Install wget and python
RUN apt-get update && apt-get install -y wget python3-pip
# Add the required packages to the environment
COPY requirements.txt .
RUN pip3 install -r requirements.txt && rm requirements.txt
WORKDIR /tpi
```

Unlike Terraform, where you had a declarative description of the desired
infrastructure in the main.tf, `Dockerfile` is a script, so the order of the
steps matters. The `FROM` instruction sets the base image for subsequent
instructions, and it is a required first instruction. To execute commands in a
new layer on top of the current image and commit the results, we use the `RUN`
instruction. The resulting committed image will be used for the next step in the
`Dockerfile`.

To install the project-specific Python packages, we'll need to copy the
`requirements.txt` to the filesystem of the container, and we can do this with
the `COPY` instruction. Then we'll use `RUN` again to install them.

Finally, with `WORKDIR` we can specify what we want to call the directory inside
the container where we'll run things inside the container.

To build your own image locally use the following command:
`docker build -t bees:latest .`, where `bees` is the name of my Docker image but
you can name yours however you like.

Hooray! We've got our own Docker image, but we're not done just yet. To use this
image from the Terraform config file, we should publish it on
[Docker Hub](https://hub.docker.com/). Your company may already have a business
account for Docker Hub, but if you're learning Docker and want to try it as an
individual, you can create your own free
[personal account here](https://hub.docker.com/).

Once you have ayour account,

1. Log in to the Docker public registry on your local machine from command line
with `docker login`.
2. Tag your image (aka give it a name):
`docker tag bees [YOUR_DOCKERHUB_ID]/bees:bees`.
3. Publish the image: `docker push [YOUR_DOCKERHUB_ID]/bees:bees`
</admon>

Once complete, this bees image becomes publicly available which means we can now
use it in our Terraform config, and it is going to be nearly identical to the
previous example:
Now you know the basics of using convenient Docker images together with
[TPI][tpi] for provisioning your MLOps infrastructure!

```hcl
terraform {
required_providers { iterative = { source = "iterative/iterative"} }
}
provider "iterative" {}
resource "iterative_task" "tpi-docker-examples" {
name = "tpi-docker-examples"
cloud = "aws"
region = "us-east-2"
machine = "m+k80"
workdir { input = "." }
script = <<-END
#!/bin/bash
sudo apt update -qq && sudo apt install -yqq software-properties-common build-essential ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo curl -fsSL https://get.docker.com | sudo sh -
sudo usermod -aG docker ubuntu
sudo setfacl --modify user:ubuntu:rw /var/run/docker.sock
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu20.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update -qq && sudo apt install -yqq nvidia-docker2
sudo systemctl restart docker
nvidia-smi
docker run --rm --gpus all -v "$PWD":/tpi [YOUR_DOCKERHUB_ID]/bees:bees python3 src/train.py
END
}
```
<admon type="tip">

Note how we are mounting the volume to the working directory that we have
specified when creating an image. This is not strictly required, but makes
things neat and tidy. Find this example together with the Dockerfile in this
[GitHub repo](https://github.com/iterative/blog-tpi-bees/tree/own-docker-image).
If you have a lot of custom dependencies that rarely change (e.g. a large
`requirements.txt` that is rarely updated), it's a good idea to put build it
into your own custom Docker image. Let us know if you'd like a tutorial on this!

Now you know the basics of using Docker images together with Iterative Terraform
Provider for provisioning your MLOps infrastructure, and you can start building
your own Docker images too! Hope you found this tutorial useful, now it's time
to build great things!
</admon>

0 comments on commit f86e796

Please sign in to comment.