Skip to content

Commit

Permalink
docs & readme: what/why/how (#433)
Browse files Browse the repository at this point in the history
  • Loading branch information
casperdcl authored Mar 15, 2022
1 parent 96b20b5 commit fcc25f3
Show file tree
Hide file tree
Showing 6 changed files with 226 additions and 155 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright 2020-2021 Iterative, Inc.
Copyright Iterative, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
144 changes: 95 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,127 @@
![Terraform Provider Iterative](https://static.iterative.ai/img/cml/banner-terraform.png)
![TPI](https://static.iterative.ai/img/cml/banner-tpi.svg)

# Iterative Provider [![](https://img.shields.io/badge/-documentation-5c4ee5?logo=terraform)](https://registry.terraform.io/providers/iterative/iterative/latest/docs)
# Terraform Provider Iterative (TPI)

The Iterative Provider is a Terraform plugin that enables full lifecycle
management of computing resources for machine learning pipelines, including GPUs, from your favorite cloud vendors.
[![docs](https://img.shields.io/badge/-docs-5c4ee5?logo=terraform)](https://registry.terraform.io/providers/iterative/iterative/latest/docs)
[![tests](https://img.shields.io/github/workflow/status/iterative/terraform-provider-iterative/Test?label=tests&logo=GitHub)](https://github.com/iterative/terraform-provider-iterative/actions/workflows/test.yml)
[![Apache-2.0][licence-badge]][licence-file]

The Iterative Provider makes it easy to:
TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. Full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert.

- Rapidly move local machine learning experiments to a cloud infrastructure
- Take advantage of training models on spot instances without losing any progress
- Unify configuration of various cloud compute providers
- Automatically destroy unused cloud resources (compute instances are terminated on job completion/failure, and storage is removed when results are downloaded)
- **Provision Resources**: create cloud compute (CPU, GPU, RAM) & storage resources without reading pages of documentation
- **Sync & Execute**: easily sync & run local data & code in the cloud
- **Low cost**: transparent auto-recovery from interrupted low-cost spot/preemptible instances
- **No waste**: auto-cleanup unused resources (terminate compute instances upon job completion/failure & remove storage upon download of results)
- **No lock-in**: switch between several cloud vendors with ease due to concise unified configuration

The Iterative Provider can provision resources with the following cloud providers and orchestrators:
Supported cloud vendors include:

- Amazon Web Services
- Amazon Web Services (AWS)
- Microsoft Azure
- Google Cloud Platform
- Kubernetes
- Google Cloud Platform (GCP)
- Kubernetes (K8s)

## Documentation
## Usage

See the [Getting Started](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) guide to learn how to use the Iterative Provider. More details on configuring and using the Iterative Provider are in the [documentation](https://registry.terraform.io/providers/iterative/iterative/latest/docs).
### Requirements

## Support
- [Install Terraform 1.0+](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform), e.g.:
- Brew (Homebrew/Mac OS): `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`
- Choco (Chocolatey/Windows): `choco install terraform`
- Conda (Anaconda): `conda install -c conda-forge terraform`
- Debian (Ubuntu/Linux):
```
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform
```
- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication)
Have a feature request or found a bug? Let us know via [GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues). Have questions? Join our [community on Discord](https://discord.gg/bzA6uY7); we'll be happy to help you get started!
### Define a Task
## License
In a project root directory, create a file named `main.tf` with the following contents:
Iterative Provider is released under the [Apache 2.0 License](https://github.com/iterative/terraform-provider-iterative/blob/master/LICENSE).
```hcl
terraform {
required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}
resource "iterative_task" "example" {
cloud = "aws" # or any of: gcp, az, k8s
machine = "m" # medium. Or any of: l, xl, m+k80, xl+v100, ...
spot = 0 # auto-price. Or -1 to disable, or >0 to set a hourly USD limit
disk_size = 30 # GB
storage {
workdir = "."
output = "results"
}
script = <<-END
#!/bin/bash
mkdir results
echo "Hello World!" > results/greeting.txt
END
}
```

## Development
See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#argument-reference) for the full list of options for `main.tf` -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type) with and without GPUs.

### Install Go 1.17+
Run this once (in the directory containing `main.tf`) to download the `required_providers`:

Refer to the [official documentation](https://golang.org/doc/install) for specific instructions.
```
terraform init
```

### Clone the repository
### Run Task

```console
git clone https://github.com/iterative/terraform-provider-iterative
cd terraform-provider-iterative
```
terraform apply
```

### Install the provider
This launches a `machine` in the `cloud`, uploads `workdir`, and runs the `script`. Upon completion (or error), the `machine` is terminated.

Build the provider and install the resulting binary to the [local mirror directory](https://www.terraform.io/docs/cli/config/config-file.html#implied-local-mirror-directories):
With spot/preemptible instances (`spot >= 0`), auto-recovery logic and persistent storage will be used to relaunch interrupted tasks.

```console
make install
```
### Query Status

Results and logs are periodically synced to persistent cloud storage. To query this status and view logs:

### Create a test file
```
terraform refresh
terraform show
```

Create a file named `main.tf` in an empty directory with the following contents:
### Stop Tasks

```hcl
terraform {
required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}
# ... other resource blocks ...
```
terraform destroy
```

**Note:** to use your local build, specify `source = "github.com/iterative/iterative"` (`source = "iterative/iterative"` will download the latest stable release instead).
This terminates the `machine` (if still running), downloads `output`, and removes the persistent `disk_size` storage.

### Initialize the provider
## Help

Run this command after every `make install` to use the new build:
The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information.

```console
terraform init --upgrade
```
Feature requests and bugs can be [reported via GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues), while general questions and feedback are very welcome on our active [Discord server](https://discord.gg/bzA6uY7).

### Test the provider
## Contributing

```console
terraform apply
```
Instead of using the latest stable release, a local copy of the repository must be used.

1. [Install Go 1.17+](https://golang.org/doc/install)
2. Clone the repository & build the provider
```
git clone https://github.com/iterative/terraform-provider-iterative
cd terraform-provider-iterative
make install
```
3. Use `source = "github.com/iterative/iterative"` in your `main.tf` to use the local repository (`source = "iterative/iterative"` will download the latest release instead), and run `terraform init --upgrade`

## Copyright

This project and all contributions to it are distributed under [![Apache-2.0][licence-badge]][licence-file]

[licence-badge]: https://img.shields.io/badge/licence-Apache%202.0-blue
[licence-file]: https://github.com/iterative/terraform-provider-iterative/blob/master/LICENSE
42 changes: 42 additions & 0 deletions docs/guides/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
page_title: Authentication
---

# Authentication

Environment variables are the only supported authentication method. They should be present when running any of the `terraform` commands. For example:

```bash
$ export GOOGLE_APPLICATION_CREDENTIALS_DATA="$(cat service_account.json)"
$ terraform apply
```

## Amazon Web Services

- `AWS_ACCESS_KEY_ID` - Access key identifier.
- `AWS_SECRET_ACCESS_KEY` - Secret access key.
- `AWS_SESSION_TOKEN` - (Optional) Session token.

See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) for more information.

## Microsoft Azure

- `AZURE_CLIENT_ID` - Client identifier.
- `AZURE_CLIENT_SECRET` - Client secret.
- `AZURE_SUBSCRIPTION_ID` - Subscription identifier.
- `AZURE_TENANT_ID` - Tenant identifier.

See the [Azure documentation](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential) for more information.

## Google Cloud Platform

- `GOOGLE_APPLICATION_CREDENTIALS` - Path to (or contents of) a service account JSON key file.

See the [GCP documentation](https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account) for more information.

## Kubernetes

Either one of:

- `KUBECONFIG` - Path to a [`kubeconfig` file](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/#the-kubeconfig-environment-variable).
- `KUBECONFIG_DATA` - Alternatively, the **contents** of a `kubeconfig` file.
77 changes: 43 additions & 34 deletions docs/guides/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,51 @@ page_title: Getting Started

# Getting Started

To use the Iterative Provider you will need to:

- [Install Terraform 1.0](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform) or greater
- Create an account with your preferred cloud compute provider and expose its [authentication credentials via environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs#authentication)
## Requirements

- [Install Terraform 1.0+](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform), e.g.:
- Brew (Homebrew/Mac OS): `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`
- Choco (Chocolatey/Windows): `choco install terraform`
- Conda (Anaconda): `conda install -c conda-forge terraform`
- Debian (Ubuntu/Linux):
```
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform
```
- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables][authentication]
[authentication]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication
## Defining a Task
In the project root directory:

1. Create a directory named `shared` to store input data and output artefacts.
2. Create a file named `main.tf` with the following contents:
In a project root directory, create a file named `main.tf` with the following contents:
```hcl
terraform {
required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}
resource "iterative_task" "task" {
cloud = "aws" # or any of: gcp, az, k8s
machine = "m"
workdir {
input = "${path.root}/shared"
output = "${path.root}/shared"
resource "iterative_task" "example" {
cloud = "aws" # or any of: gcp, az, k8s
machine = "m" # medium. Or any of: l, xl, m+k80, xl+v100, ...
spot = 0 # auto-price. Or -1 to disable, or >0 to set a hourly USD limit
disk_size = 30 # GB
storage {
workdir = "."
output = "results"
}
script = <<-END
#!/bin/bash
echo "Hello World!" > greeting.txt
mkdir results
echo "Hello World!" > results/greeting.txt
END
}
```

See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task) for a full list of options -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type).
See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#argument-reference) for the full list of options for `main.tf` -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type) with and without GPUs.

-> **Note:** The `script` argument must begin with a valid [shebang](<https://en.wikipedia.org/wiki/Shebang_(Unix)>), and can take the form of a [heredoc string](https://www.terraform.io/docs/language/expressions/strings.html#heredoc-strings) or [a `file()` function](https://www.terraform.io/docs/language/functions/file.html) function (e.g. `file("task_run.sh")`).

Expand All @@ -45,24 +57,21 @@ The project layout should look similar to this:
```
project/
├── main.tf
└── shared/
└── ...
└── results/
└── greeting.txt (created in the cloud and downloaded locally)
```

## Initializing Terraform
## Initialise Terraform

```console
$ terraform init
```

This command will:

1. Download and install the Iterative Provider.
2. Initialize Terraform in the current directory.
This command will check `main.tf` and download the required TPI plugin.

~> **Note:** None of the subsequent commands will work without first setting some [authentication environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs#authentication).
~> **Warning:** None of the subsequent commands will work without first setting some [authentication environment variables][authentication].

## Launching Tasks
## Run Task

```console
$ terraform apply
Expand All @@ -71,31 +80,31 @@ $ terraform apply
This command will:

1. Create all the required cloud resources.
2. Upload the specified shared `input` working directory to the cloud.
2. Upload the working directory (`workdir`) to the cloud.
3. Launch the task `script`.

## Viewing Task Statuses
With spot/preemptible instances (`spot >= 0`), auto-recovery logic and persistent storage will be used to relaunch interrupted tasks.

## Query Status

```console
$ terraform refresh && terraform show
```

This command will:
These commands will:

1. Query the task status from the cloud.
2. Display the task status.

## Deleting Tasks
## Stop Task

```console
$ terraform destroy
```

This command will:

1. Download the specified shared working directory from the cloud.
1. Download the `output` directory from the cloud.
2. Delete all the cloud resources created by `terraform apply`.

## Viewing Task Results

After running `terraform destroy`, the `shared` directory should contain a file named `greeting.txt` with the text `Hello, World!`
In this example, after running `terraform destroy`, the `results` directory should contain a file named `greeting.txt` with the text `Hello, World!`
Loading

0 comments on commit fcc25f3

Please sign in to comment.