-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
226 additions
and
155 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,81 +1,127 @@ | ||
 | ||
 | ||
|
||
# Iterative Provider [](https://registry.terraform.io/providers/iterative/iterative/latest/docs) | ||
# Terraform Provider Iterative (TPI) | ||
|
||
The Iterative Provider is a Terraform plugin that enables full lifecycle | ||
management of computing resources for machine learning pipelines, including GPUs, from your favorite cloud vendors. | ||
[](https://registry.terraform.io/providers/iterative/iterative/latest/docs) | ||
[](https://github.com/iterative/terraform-provider-iterative/actions/workflows/test.yml) | ||
[![Apache-2.0][licence-badge]][licence-file] | ||
|
||
The Iterative Provider makes it easy to: | ||
TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. Full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert. | ||
|
||
- Rapidly move local machine learning experiments to a cloud infrastructure | ||
- Take advantage of training models on spot instances without losing any progress | ||
- Unify configuration of various cloud compute providers | ||
- Automatically destroy unused cloud resources (compute instances are terminated on job completion/failure, and storage is removed when results are downloaded) | ||
- **Provision Resources**: create cloud compute (CPU, GPU, RAM) & storage resources without reading pages of documentation | ||
- **Sync & Execute**: easily sync & run local data & code in the cloud | ||
- **Low cost**: transparent auto-recovery from interrupted low-cost spot/preemptible instances | ||
- **No waste**: auto-cleanup unused resources (terminate compute instances upon job completion/failure & remove storage upon download of results) | ||
- **No lock-in**: switch between several cloud vendors with ease due to concise unified configuration | ||
|
||
The Iterative Provider can provision resources with the following cloud providers and orchestrators: | ||
Supported cloud vendors include: | ||
|
||
- Amazon Web Services | ||
- Amazon Web Services (AWS) | ||
- Microsoft Azure | ||
- Google Cloud Platform | ||
- Kubernetes | ||
- Google Cloud Platform (GCP) | ||
- Kubernetes (K8s) | ||
|
||
## Documentation | ||
## Usage | ||
|
||
See the [Getting Started](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) guide to learn how to use the Iterative Provider. More details on configuring and using the Iterative Provider are in the [documentation](https://registry.terraform.io/providers/iterative/iterative/latest/docs). | ||
### Requirements | ||
|
||
## Support | ||
- [Install Terraform 1.0+](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform), e.g.: | ||
- Brew (Homebrew/Mac OS): `brew tap hashicorp/tap && brew install hashicorp/tap/terraform` | ||
- Choco (Chocolatey/Windows): `choco install terraform` | ||
- Conda (Anaconda): `conda install -c conda-forge terraform` | ||
- Debian (Ubuntu/Linux): | ||
``` | ||
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl | ||
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add - | ||
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | ||
sudo apt-get update && sudo apt-get install terraform | ||
``` | ||
- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication) | ||
Have a feature request or found a bug? Let us know via [GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues). Have questions? Join our [community on Discord](https://discord.gg/bzA6uY7); we'll be happy to help you get started! | ||
### Define a Task | ||
## License | ||
In a project root directory, create a file named `main.tf` with the following contents: | ||
Iterative Provider is released under the [Apache 2.0 License](https://github.com/iterative/terraform-provider-iterative/blob/master/LICENSE). | ||
```hcl | ||
terraform { | ||
required_providers { iterative = { source = "iterative/iterative" } } | ||
} | ||
provider "iterative" {} | ||
resource "iterative_task" "example" { | ||
cloud = "aws" # or any of: gcp, az, k8s | ||
machine = "m" # medium. Or any of: l, xl, m+k80, xl+v100, ... | ||
spot = 0 # auto-price. Or -1 to disable, or >0 to set a hourly USD limit | ||
disk_size = 30 # GB | ||
storage { | ||
workdir = "." | ||
output = "results" | ||
} | ||
script = <<-END | ||
#!/bin/bash | ||
mkdir results | ||
echo "Hello World!" > results/greeting.txt | ||
END | ||
} | ||
``` | ||
|
||
## Development | ||
See [the reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#argument-reference) for the full list of options for `main.tf` -- including more information on [`machine` types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type) with and without GPUs. | ||
|
||
### Install Go 1.17+ | ||
Run this once (in the directory containing `main.tf`) to download the `required_providers`: | ||
|
||
Refer to the [official documentation](https://golang.org/doc/install) for specific instructions. | ||
``` | ||
terraform init | ||
``` | ||
|
||
### Clone the repository | ||
### Run Task | ||
|
||
```console | ||
git clone https://github.com/iterative/terraform-provider-iterative | ||
cd terraform-provider-iterative | ||
``` | ||
terraform apply | ||
``` | ||
|
||
### Install the provider | ||
This launches a `machine` in the `cloud`, uploads `workdir`, and runs the `script`. Upon completion (or error), the `machine` is terminated. | ||
|
||
Build the provider and install the resulting binary to the [local mirror directory](https://www.terraform.io/docs/cli/config/config-file.html#implied-local-mirror-directories): | ||
With spot/preemptible instances (`spot >= 0`), auto-recovery logic and persistent storage will be used to relaunch interrupted tasks. | ||
|
||
```console | ||
make install | ||
``` | ||
### Query Status | ||
|
||
Results and logs are periodically synced to persistent cloud storage. To query this status and view logs: | ||
|
||
### Create a test file | ||
``` | ||
terraform refresh | ||
terraform show | ||
``` | ||
|
||
Create a file named `main.tf` in an empty directory with the following contents: | ||
### Stop Tasks | ||
|
||
```hcl | ||
terraform { | ||
required_providers { iterative = { source = "iterative/iterative" } } | ||
} | ||
provider "iterative" {} | ||
# ... other resource blocks ... | ||
``` | ||
terraform destroy | ||
``` | ||
|
||
**Note:** to use your local build, specify `source = "github.com/iterative/iterative"` (`source = "iterative/iterative"` will download the latest stable release instead). | ||
This terminates the `machine` (if still running), downloads `output`, and removes the persistent `disk_size` storage. | ||
|
||
### Initialize the provider | ||
## Help | ||
|
||
Run this command after every `make install` to use the new build: | ||
The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information. | ||
|
||
```console | ||
terraform init --upgrade | ||
``` | ||
Feature requests and bugs can be [reported via GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues), while general questions and feedback are very welcome on our active [Discord server](https://discord.gg/bzA6uY7). | ||
|
||
### Test the provider | ||
## Contributing | ||
|
||
```console | ||
terraform apply | ||
``` | ||
Instead of using the latest stable release, a local copy of the repository must be used. | ||
|
||
1. [Install Go 1.17+](https://golang.org/doc/install) | ||
2. Clone the repository & build the provider | ||
``` | ||
git clone https://github.com/iterative/terraform-provider-iterative | ||
cd terraform-provider-iterative | ||
make install | ||
``` | ||
3. Use `source = "github.com/iterative/iterative"` in your `main.tf` to use the local repository (`source = "iterative/iterative"` will download the latest release instead), and run `terraform init --upgrade` | ||
|
||
## Copyright | ||
|
||
This project and all contributions to it are distributed under [![Apache-2.0][licence-badge]][licence-file] | ||
|
||
[licence-badge]: https://img.shields.io/badge/licence-Apache%202.0-blue | ||
[licence-file]: https://github.com/iterative/terraform-provider-iterative/blob/master/LICENSE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
page_title: Authentication | ||
--- | ||
|
||
# Authentication | ||
|
||
Environment variables are the only supported authentication method. They should be present when running any of the `terraform` commands. For example: | ||
|
||
```bash | ||
$ export GOOGLE_APPLICATION_CREDENTIALS_DATA="$(cat service_account.json)" | ||
$ terraform apply | ||
``` | ||
|
||
## Amazon Web Services | ||
|
||
- `AWS_ACCESS_KEY_ID` - Access key identifier. | ||
- `AWS_SECRET_ACCESS_KEY` - Secret access key. | ||
- `AWS_SESSION_TOKEN` - (Optional) Session token. | ||
|
||
See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) for more information. | ||
|
||
## Microsoft Azure | ||
|
||
- `AZURE_CLIENT_ID` - Client identifier. | ||
- `AZURE_CLIENT_SECRET` - Client secret. | ||
- `AZURE_SUBSCRIPTION_ID` - Subscription identifier. | ||
- `AZURE_TENANT_ID` - Tenant identifier. | ||
|
||
See the [Azure documentation](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential) for more information. | ||
|
||
## Google Cloud Platform | ||
|
||
- `GOOGLE_APPLICATION_CREDENTIALS` - Path to (or contents of) a service account JSON key file. | ||
|
||
See the [GCP documentation](https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account) for more information. | ||
|
||
## Kubernetes | ||
|
||
Either one of: | ||
|
||
- `KUBECONFIG` - Path to a [`kubeconfig` file](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/#the-kubeconfig-environment-variable). | ||
- `KUBECONFIG_DATA` - Alternatively, the **contents** of a `kubeconfig` file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.