docs & readme: what/why/how #433

casperdcl · 2022-03-08T17:29:33Z

update README
update registry docs
maybe restructure registry docs a bit (credentials, getting started, and full reference are in 3 different counter-intuitive places?)
depends on task: make workdir.output more intuitive #414 (or revert be6aa67)

fixes #415

0x2b3bfa0

Left some accessory comments. Looks great!

docs/resources/task.md

docs/guides/getting-started.md

README.md

DavidGOrtega · 2022-03-09T16:58:47Z

@casperdcl Looking good!

- fixes #415

- depends on #414

Co-authored-by: Helio Machado <0x2b3bfa0+git@googlemail.com>

dmpetrov

Great change. Please take a look at my feedback online.

README.md

dmpetrov

A couple more comments/ideas.

README.md

dmpetrov · 2022-03-11T23:12:58Z

README.md

-```console
-make install
-```
+- `terraform apply`: launch cloud instance(s), upload `workdir`, and run `script`


It might be confusing a bit. Should user upload workdir after terraform apply? This how I read this for the 1st time.

I'd suggest keeping one line per action/command as we have above. Like:

Launch an training job

terraform apply

This command creates an instance in a cloud based on main.tf specification. After the instance is running it uploads the current directory (workdir) and data directory (we don't have this one, right? :) ) to the instance and executes your training script (script).

In case of spot instances, the cloud will run recovery logic using the auto-scaling group.

Once the script is finishes (or fails) the instance will be terminated automatically. Results and logs are periodicaly syncing to cloud storage.

Like this....

well I was trying to avoid a tqdm-style Readme but sure can do 😅

It should work e2e and show the use case. Otherwise, we are increasing time-to-value quite a lot. One of the most important metrics.

better now?

dmpetrov · 2022-03-11T23:26:03Z

README.md

-
-## Development
-
-### Install Go 1.17+


I was thinking... what if a user does not have any ml training script on heirs hands? Or a user has a script but the environment settings will prevent user from executing it? Does it make sense to provide a script in additional to the TF file?

It should reduce the entrance bar quite significantly.

An ideal tutorial should have all the commands, scripts and data I need to run to get a result. Example - dvc tutorial.

user does not have any ml training script

I don't follow, do you mean #305?

user has a script but the environment settings will prevent executing

Do you mean the script won't execute locally or won't execute on the cloud? And why won't it execute? Because of missing env vars such as NUM_EPOCHS or something? Or missing dependencies like numpy?

I mean, user might not have any ml training script at all (or it won't work).
In DVC tutorial, user downloads code as the first step. Do we need something similar here?

ah so an example repo? No we don't have one (yet). Technically we'd probably need 3 - one each for AWS/Azure/GCP.

The current example simply uses AWS and an echo Hello World script, so the user just copy-pastes the main.tf and has no other dependencies.

we'd probably need 3 - one each for AWS/Azure/GCP

😬 Unless we use iterative/example-repos-dev or similar, it doesn't look like the best idea from a maintainability standpoint.

we'd probably need 3 - one each for AWS/Azure/GCP.

I was thinking that TPI is suppose abstract it out 😄 One code file with one "small" data file that user can get with wget should be enough.

It would be great to have some "realistic" repo with checkpoint etc... like minst but slower :) Fashion-mnist?

Yes, the only required change is cloud = "whatever" 😅

I would think at a minimum in example repos:

main.tf: cloud = "X"

README.md: "How to authenticate/export X_CREDENTIALS, or use [cloud Y](link to example repo for Y) or [cloud Z](link to example repo for Z)"

Plus probably:

requirements.txt

run.py

.github/workflows/cml.yaml

don't think this complexity is needed in the readme here. Right now it's a self-contained¹ example that supports users both with and without a script file.

Footnotes

apart from the how-to-setup-credentials external link ↩

dmpetrov

a couple of comments :)

dmpetrov · 2022-03-13T23:58:50Z

README.md

-
-## Development
-
-### Install Go 1.17+


we'd probably need 3 - one each for AWS/Azure/GCP.

I was thinking that TPI is suppose abstract it out 😄 One code file with one "small" data file that user can get with wget should be enough.

It would be great to have some "realistic" repo with checkpoint etc... like minst but slower :) Fashion-mnist?

dmpetrov · 2022-03-14T00:00:26Z

README.md

-```console
-make install
-```
+- `terraform apply`: launch cloud instance(s), upload `workdir`, and run `script`


It should work e2e and show the use case. Otherwise, we are increasing time-to-value quite a lot. One of the most important metrics.

arcticbear · 2022-03-15T11:54:06Z

Update from the discussion in Slack here. Agreed to use the universal TPI banner with the dark background.

casperdcl · 2022-03-15T14:48:41Z

I'd suggest merging for now @dmpetrov @0x2b3bfa0 since it's probably best to iterate on docs in follow-up issues/PRs.

0x2b3bfa0 · 2022-03-15T14:51:08Z

Sounds good, but please let's keep track of the unresolved discussions. There are several good points and ideas we'll probably want to consider in the short/mid term.

casperdcl · 2022-03-15T14:56:06Z

naturally :)

I think it's just the "end-to-end" point in #363, but let me know if anything else is unresolved.

jorgeorpinel · 2022-03-24T08:29:01Z

docs/index.md


-Use the Iterative Provider to launch resource-intensive tasks in popular cloud providers with a single Terraform file.
+![TPI](https://static.iterative.ai/img/cml/banner-terraform.png)


Broken, it seems

https://registry.terraform.io/providers/iterative/iterative/latest/docs

Yes, should be https://static.iterative.ai/img/cml/banner-tpi.svg

Instead of https://static.iterative.ai/img/cml/banner-terraform.png

See iterative/static#8

🔔 @casperdcl

jorgeorpinel · 2022-03-24T08:36:41Z

README.md


-# Iterative Provider [![](https://img.shields.io/badge/-documentation-5c4ee5?logo=terraform)](https://registry.terraform.io/providers/iterative/iterative/latest/docs)
+# Terraform Provider Iterative (TPI)


I know it's beyond scope but is a preposition missing from the tool's expanded name? I.g. Terraform Provider for Iterative (Tools)?

"Iterative's Terraform Provider" would also be more readable IMO but at this point changing the TLA is prob. out of the question.

Also on the tool's name: The confusion is reinforced by the official docs site calling it just "iterative provider" (top of nav).

Yes, TPI was named after the repository name, but (cough) doesn't make sense to humans.

jorgeorpinel · 2022-03-24T08:39:39Z

README.md


-The Iterative Provider makes it easy to:
+TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. Full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert.


Could we be more emphatic on why the machine learning focus is important? I guess it's not a common integration or offering out there in the world of TF providers.

Also should we mention it's designed specifically (though not exclusively I think) for certain Iterative tools? Connecting this with CML early could make sense for example. Rn it feels from this description like it's mainly a stand-alone tool for DYI ML?Ops.

jorgeorpinel · 2022-03-24T08:42:02Z

docs/index.md

+- [Getting Started](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started)
+  - [Authentication](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication)
+- [Full reference](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task)


These links are in a special arrangement different from the left-side content structure. Causes confusion as to what's more important info. "Full reference" is not even on the left side (at least with that title).

vs

p.s. also these links open in a separate window unexpectedly.

On the Links: an idea is to mention/link the Contributing info. in the repo instead in here.

jorgeorpinel · 2022-03-24T08:48:34Z

README.md

+### Requirements

-## Support
+- [Install Terraform 1.0+](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform), e.g.:
+  - Brew (Homebrew/Mac OS): `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`
+  - Choco (Chocolatey/Windows): `choco install terraform`


OK I see from the README that the Usage section (Get Started page in docs) comes first. So Authentication should be under GS in the nav, I think.

Usage section (Get Started page in docs

Call both Usage or both Get Started, BTW? I'd incline for the latter based on how the content is presented.

jorgeorpinel · 2022-03-24T08:51:30Z

README.md


-Have a feature request or found a bug? Let us know via [GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues). Have questions? Join our [community on Discord](https://discord.gg/bzA6uY7); we'll be happy to help you get started!
+### Define a Task


💅🏼 Docs should also read "Define" I think. Currently the GS reads "Defining".

jorgeorpinel · 2022-03-24T08:53:09Z

docs/guides/getting-started.md

+    sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
+    sudo apt-get update && sudo apt-get install terraform
+    ```
+- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables][authentication]


Hmmm looks like all internal links (except nav) open in a separate window. Idk, feels strange to me.

jorgeorpinel · 2022-03-24T08:54:13Z

docs/guides/getting-started.md

+    ```
+- Create an account with any supported cloud vendor and expose its [authentication credentials via environment variables][authentication]
+
+[authentication]: https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/authentication


Would relative links work? E.g. /providers/iterative/iterative/latest/docs/guides/authentication or even just (../)authentication above.

jorgeorpinel · 2022-03-24T08:57:54Z

README.md

+## Help

-Run this command after every `make install` to use the new build:
+The [getting started guide](https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/getting-started) has some more information.


[The GS] has some more information.

Does it? They look vey similar at first glance. Why refer to almost the same doc but in another site? Hard to find meaningful differences

Could instead link from relevant places in the README to the other docs, for now Azure Kubernetes Service and Generic Machine Types.

jorgeorpinel · 2022-03-24T09:05:44Z

docs/resources/task.md

@@ -30,22 +39,24 @@ resource "iterative_task" "task" {
 ### Required

 - `cloud` - (Required) Cloud provider to run the task on; valid values are `aws`, `gcp`, `az` and `k8s`.
- `script` - (Required) Script to run (relative to `workdir.input`); must begin with a valid [shebang](<https://en.wikipedia.org/wiki/Shebang_(Unix)>). Can use a string, including a [heredoc](https://www.terraform.io/docs/language/expressions/strings.html#heredoc-strings), or the contents of a file returned by the [`file`](https://www.terraform.io/docs/language/functions/file.html) function.
+- `script` - (Required) Script to run (relative to `storage.workdir`); must begin with a valid [shebang](<https://en.wikipedia.org/wiki/Shebang_(Unix)>). Can use a string, including a [heredoc](https://www.terraform.io/docs/language/expressions/strings.html#heredoc-strings), or the contents of a file returned by the [`file`](https://www.terraform.io/docs/language/functions/file.html) function.

 ### Optional

 - `name` - (Optional) Deterministic task name.
 - `region` - (Optional) [Cloud region/zone](#cloud-regions) to run the task on.
 - `machine` - (Optional) See [Machine Types](#machine-types) below.


Why is Generic Machine Types under "Development" in the nav? (What does Development even refer to here?) Feels like machine types should be under the same section as the iterative_task res ref. somehow.

Nav structure idea:

TPI

GS

Guides

Auth

Azure K8s

Ref

Task Res

Machine types

As long as abbreviations aren't part of the proposal, sounds good. The only thing I find rather dissonant is moving the Machine Types page under the Reference section.

Abbreviations aren't part of the proposal; I was just being lazy.

I find rather dissonant is moving the Machine Types page under the Ref

What does "Development" refer to as-is now? Maybe I'm not getting the current intended struct.

An XY solution would be disguising “Machine types“ as a guide (e.g. how to choose machine types) 🤔

Yeah either put it in the guide or in the ref IMO 😬

The issue is that the “Resources” section wat meant isn't a “Reference” in the general sense of the word; it's just a list of resources (hundreds in some providers).

We can move is under “Resources”, but it's rather unorthodox. 🙃

Yeah that's fine, call the section Resources (just also use that word in links if possible, instead of "reference") And put machine types in the guide.

jorgeorpinel · 2022-03-24T09:07:00Z

docs/resources/task.md

 This resource will:

 1. Create cloud resources (machines and storage) for the task.


💅🏼 The title of this page is Task Resource but the nav entry is iterative_task. Seems inconsistent

Yes, it should probably be Resource: iterative_task like in other providers.

0x2b3bfa0 · 2022-03-24T14:19:14Z

@jorgeorpinel, even if your review is a bit late (post–merge), you make lots of good points that should be considered and addressed. 😅

0x2b3bfa0 · 2022-03-24T14:19:43Z

@casperdcl, should this be discussed here and then applied with a separate pull request?

casperdcl added the documentation Markdown files label Mar 8, 2022

casperdcl self-assigned this Mar 8, 2022

restyled-io bot mentioned this pull request Mar 8, 2022

Restyle docs & readme: what/why/how #434

Merged

casperdcl marked this pull request as ready for review March 8, 2022 17:47

casperdcl mentioned this pull request Mar 8, 2022

EPIC task internal iter 2 #391

Closed

8 tasks

casperdcl requested review from a team and dmpetrov March 8, 2022 19:56

0x2b3bfa0 reviewed Mar 8, 2022

View reviewed changes

docs/resources/task.md Outdated Show resolved Hide resolved

docs/guides/getting-started.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

casperdcl commented Mar 8, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

casperdcl and others added 10 commits March 10, 2022 17:09

first pass README

1bcd9fc

- fixes #415

Restyled by prettier-markdown

5368f7e

readme: more restructuring

7cfe25b

readme: have a basic script

6992d0e

docs: unify examples

76e11f6

intuitive output

f0b6814

- depends on #414

docs/task: input realtive to script

e932990

Co-authored-by: Helio Machado <0x2b3bfa0+git@googlemail.com>

minor tweak

a4e14d7

some rewording

b1fa61b

docs: rename workdir.input => storage.workdir

2a11af5

casperdcl force-pushed the highlights branch from 5e7aeb3 to 2a11af5 Compare March 10, 2022 17:21

casperdcl added 4 commits March 10, 2022 17:29

describe machine type better

66a5a42

docs: update site landing

794304c

update badges

c17f707

note on workdir/output relation

ddf4910

restyled-io bot mentioned this pull request Mar 10, 2022

Restyle docs & readme: what/why/how #439

Merged

Restyled by prettier-markdown

fb2ac4f

dmpetrov requested changes Mar 10, 2022

View reviewed changes

casperdcl mentioned this pull request Mar 10, 2022

TPI: narrower banner iterative/static#8

Merged

casperdcl commented Mar 10, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

explicit CPU/GPU/RAM

474ace9

dmpetrov requested changes Mar 11, 2022

View reviewed changes

dmpetrov requested changes Mar 14, 2022

View reviewed changes

casperdcl added 3 commits March 14, 2022 22:57

spotify

6b91aa3

re-separate commands, add disk_size

9f2a105

update banner

aac92bc

restyled-io bot mentioned this pull request Mar 14, 2022

Restyle docs & readme: what/why/how #446

Merged

Restyled by prettier-markdown

fd848e2

casperdcl merged commit fcc25f3 into master Mar 15, 2022

casperdcl deleted the highlights branch March 15, 2022 14:49

casperdcl mentioned this pull request Mar 15, 2022

docs: task requirements 2 #363

Open

21 tasks

casperdcl mentioned this pull request Mar 15, 2022

docs: permissions & reference roles #443

Merged

3 tasks

jorgeorpinel reviewed Mar 24, 2022

View reviewed changes

casperdcl mentioned this pull request Mar 24, 2022

task feedback 2 #457

Closed

17 tasks


		Use the Iterative Provider to launch resource-intensive tasks in popular cloud providers with a single Terraform file.
		![TPI](https://static.iterative.ai/img/cml/banner-terraform.png)


		# Iterative Provider [![](https://img.shields.io/badge/-documentation-5c4ee5?logo=terraform)](https://registry.terraform.io/providers/iterative/iterative/latest/docs)
		# Terraform Provider Iterative (TPI)


		The Iterative Provider makes it easy to:
		TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. Full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert.


		Have a feature request or found a bug? Let us know via [GitHub issues](https://github.com/iterative/terraform-provider-iterative/issues). Have questions? Join our [community on Discord](https://discord.gg/bzA6uY7); we'll be happy to help you get started!
		### Define a Task

		This resource will:

		1. Create cloud resources (machines and storage) for the task.

docs & readme: what/why/how #433

docs & readme: what/why/how #433

Conversation

casperdcl commented Mar 8, 2022 • edited Loading

0x2b3bfa0 left a comment

Choose a reason for hiding this comment

DavidGOrtega commented Mar 9, 2022

dmpetrov left a comment

Choose a reason for hiding this comment

dmpetrov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Launch an training job

casperdcl Mar 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

casperdcl Mar 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

casperdcl Mar 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

dmpetrov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arcticbear commented Mar 15, 2022

casperdcl commented Mar 15, 2022

0x2b3bfa0 commented Mar 15, 2022 • edited Loading

casperdcl commented Mar 15, 2022

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0x2b3bfa0 commented Mar 24, 2022

casperdcl commented Mar 8, 2022 •

edited

Loading

casperdcl Mar 12, 2022 •

edited

Loading

casperdcl Mar 12, 2022 •

edited

Loading

casperdcl Mar 14, 2022 •

edited

Loading

0x2b3bfa0 commented Mar 15, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 24, 2022 •

edited

Loading

jorgeorpinel Mar 29, 2022 •

edited

Loading

jorgeorpinel Mar 30, 2022 •

edited

Loading