Skip to content

Commit

Permalink
Merge pull request #318 from gridai/dev
Browse files Browse the repository at this point in the history
0.8.72 Docs Release
  • Loading branch information
alexandercort authored Jul 14, 2022
2 parents 45df1ec + d473cd0 commit bacf6e0
Show file tree
Hide file tree
Showing 7 changed files with 62 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -292,4 +292,5 @@ grid run --strategy none \
--beta "[1, 2, 3, 4]"
```

This will schedule exactly one experiment and pass each script argument as-is without evaluation.
This will schedule exactly one experiment and pass each script argument as-is without evaluation. Another example is when you want to
pass extra arguments via the CLI with [Hydra](https://github.com/facebookresearch/hydra).
3 changes: 2 additions & 1 deletion docs/features/runs/1_Creating Runs/1_README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Grid Runs Overview

:::note
Grid Runs support the use of Private PyPi repositories. Please see the following [documentation](https://docs.readthedocs.io/en/stable/guides/private-python-packages.html) for various methods of pip installing from private sources.
Grid Runs support the use of Private PyPi repositories. Please see the following [documentation](https://docs.readthedocs.io/en/stable/guides/private-python-packages.html) for various methods of pip installing from private sources. Additionally, Runs by default will check for the latest commit hash on your repo branch.
If you'd like to test code that hasn't been pushed to your repo please use the [--localdir option](https://docs.grid.ai/features/runs/private-repos#the---localdir-option).
:::

There are many ways to use Runs in Grid. Below we provide some examples to get you started with working with this core feature. The examples cover the following:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ FROM python:3.9.6-slim
WORKDIR /gridai/project
COPY . .
# Update package list
RUN apt-get update
# any RUN commands you'd like to run
# use this to install dependencies
RUN pip install pytorch-lightning && \
Expand Down
1 change: 0 additions & 1 deletion docs/getting-started/typical-workflow-web-user.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ This is exactly what _Sessions_ were created for.

Start a Session named _resnet-debugging_ with 2 M60 GPUs on it and attach our **CIFAR-5** dataset.

**Note: A credit card needs to be added to use GPU machines**

![](/images/examples/cifar-session.gif)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: BYOC Prereqs
sidebar_label: BYOC Prereqs
title: BYOC Prerequisites
sidebar_label: BYOC Prerequisites
---

# Overview
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@ Grid creates clusters designed for large AI workloads. In order to do so, your A
| EC2 Spot \(instance family you are interested in\) | 1000+ |
| EC2 On-demand \(instance family you are interested in\) | 1000+ |

Grid will create a number of AWS resources in order to provision your BYOC cluster as seen in the table below. If creating these resources would exceed your quota then the BYOC cluster creation process will fail. In order to address this issue you should either delete existing unused resources or increase your AWS quotas.

| Resource | Required Quota |
| :--- | :--- |
| AWS IAM roles | 15 |
| AWS IAM policies | 15 |
| VPC | 5 |
| S3 Buckets | 5 |

AWS STS regional endpoints have to be enabled in the target region. Go to [AWS account settings](https://console.aws.amazon.com/iam/home#/account_settings) and verify the regional endpoint is activated. In most cases your region already has AWS STS regional endpoint enabled, see [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html).

:::note
Expand Down Expand Up @@ -298,14 +307,52 @@ Or if you're using config file set the `.compute.provider.cluster` field to the

Your cluster will be available for use on Grid, so use it \(or any other cluster\) as you wish.

## Editing and Deleting Clusters
## Editing Clusters

Use `grid edit` to see instance types available and update as necessary.

```bash
grid edit cluster <cluster name>
```

An editor in your command line will show the json configuration for the Cluster like the one below (we have omitted with ellipsis `...` some attributes to make this section easier to understand)
```
{
"cluster_type": "CLUSTER_TYPE_BYOC",
"cost_factor": "",
"desired_state": "CLUSTER_STATE_RUNNING",
"driver": {
"external": null,
"kubernetes": {
"aws": {
...
"instance_types": [
{
"name": "g4dn.xlarge",
"overprovisioned_ondemand_count": 0,
},
{
"name": "m5ad.xlarge",
"overprovisioned_ondemand_count": 0,
},
],
...
},
...
},
},
...
"performance_profile": "CLUSTER_PERFORMANCE_PROFILE_DEFAULT"
}
```
Some important attributes you can chagne:
- __instance_types__: Here you can add or remove Instance Type following AWS naming, but at the moment only instances that are amd64 compatible can be used. You can also change the `overprovisioned_ondemand_count` for the instance if you want to pre-allocate instances for faster start but that will also make you incur in extra costs.
- __performance_profile__: You can change the profile for the cluster. It can either be
- `CLUSTER_PERFORMANCE_PROFILE_DEFAULT` with extra nodes for larger clusters and metrics and monitoring capabilities
- `CLUSTER_PERFORMANCE_PROFILE_COST_SAVING` for smaller clusters but also without metrics and monitoring capabilities but also less expensive to run.

## Deleting Clusters

Use `grid delete` to delete cluster. Deleting a cluster will delete its resources, including runing resources. The deletion will take ~20-30 minutes. Use with care! The flag `--wait` is also available here, in the case of using, grid CLI will wait until the cluster is deleted.

:::note
Expand Down
5 changes: 5 additions & 0 deletions docs/platform/3_credentials.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,12 @@ example:

</div>

### 0. Change from deault cluster context
By default the [cluster context](./2_Custom%20Cloud%20Credentials/5_grid-cluster-context.md) is set to `Grid Cloud`. Change this to your BYOC cluster you created.

```
grid user set-cluster-context <byoc cluster name>
```

### 1. Generate Trust and Permission Policiess

Expand Down

0 comments on commit bacf6e0

Please sign in to comment.