Skip to content

Commit

Permalink
Merge pull request #304 from gridai/dev
Browse files Browse the repository at this point in the history
0.8.67 docs
  • Loading branch information
Esther Quansah authored Jun 29, 2022
2 parents 6744a40 + e1f91cb commit c6853bd
Show file tree
Hide file tree
Showing 29 changed files with 597 additions and 22 deletions.
145 changes: 145 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,148 @@ yarn-error.log*

grid
docs/platform/changelog.md



# Created by https://www.toptal.com/developers/gitignore/api/visualstudiocode,intellij
# Edit at https://www.toptal.com/developers/gitignore?templates=visualstudiocode,intellij

### Intellij ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# AWS User-specific
.idea/**/aws.xml

# Generated files
.idea/**/contentModel.xml

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/artifacts
# .idea/compiler.xml
# .idea/jarRepositories.xml
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# SonarLint plugin
.idea/sonarlint/

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser

### Intellij Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721

# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr

# Sonarlint plugin
# https://plugins.jetbrains.com/plugin/7973-sonarlint
.idea/**/sonarlint/

# SonarQube Plugin
# https://plugins.jetbrains.com/plugin/7238-sonarqube-community-plugin
.idea/**/sonarIssues.xml

# Markdown Navigator plugin
# https://plugins.jetbrains.com/plugin/7896-markdown-navigator-enhanced
.idea/**/markdown-navigator.xml
.idea/**/markdown-navigator-enh.xml
.idea/**/markdown-navigator/

# Cache file creation bug
# See https://youtrack.jetbrains.com/issue/JBR-2257
.idea/$CACHE_FILE$

# CodeStream plugin
# https://plugins.jetbrains.com/plugin/12206-codestream
.idea/codestream.xml

# Azure Toolkit for IntelliJ plugin
# https://plugins.jetbrains.com/plugin/8053-azure-toolkit-for-intellij
.idea/**/azureSettings.xml

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets

# Local History for Visual Studio Code
.history/

# Built Visual Studio Code Extensions
*.vsix

### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide

# Support for Project snippet scope
.vscode/*.code-snippets

# Ignore code-workspaces
*.code-workspace

# End of https://www.toptal.com/developers/gitignore/api/visualstudiocode,intellij
2 changes: 2 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{
}
42 changes: 42 additions & 0 deletions changelog/2022-06-28.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## :zap: June 28, 2022

**CLI version: 0.8.67**

In addition to several stabilty improvements, this release introduces two very exciting new Datastores features for our BYOC users! If you're not a BYOC user, but would like to learn more or try out these features, don't hesitate to reach out to us at support@grid.ai

## :file_cabinet: What's New with Datastores

### Private S3 Mounting (BYOC Users Only)

Grid now supports the ability to create Datastores from private AWS S3 buckets by using
the `--no-copy` mode via the CLI. This is particularly valuable for incrementally adding data to the source bucket and for speeding up datastore creation when working with large buckets.

In order to allow Grid to access your private buckets,
you'll need to create an authorized AWS Role using the `grid credential create --type s3`
command (explained in detail in the link below). After creating a role, you can run the
`grid datastore create S3://<private-bucket-name-here> --no-copy` command as usual - no
modifications needed.

[Create a credential](../docs/platform/3_credentials.md)

[Create a Datastore from a private S3 bucket](../docs/features/datastores/2_Using%20Datastores/2_creating-datastores.md#creating-datastore-from-private-aws-s3-buckets-byoc-users-only)

### High-Performance Datastores (BYOC Users Only)

High Performance Datastores (HPDs) allow Bring Your Own Cloud customers who are looking to scale large datasets to optimize latency and significantly speed up data access. Currently, HPDs are backed by the FSx for Lustre service and offer more scalability and higher throughput than conventional Grid datastores backed by AWS S3.

HPDs are most useful for very large datasets (>1TB) or when a dataset is going to be using by a large number of concurrent experiments or sessions.

[Create a High-Performance Datastore](../docs/features/datastores/2_Using%20Datastores/7_high-performance-datastores.md)

:::note
If you are interested in learning more, or enabling either of these features, you can contact support@grid.ai
:::

## Session Memory Improvements

- Disabled virtual memory limiting for GPU machines in Sessions, preventing out of memory failures
- Grid Runs now default to 0 CPUs. We recently discovered an issue with runs where setting `--cpus` to 1 would also reduce the memory, causing lots of OOM issues. In previous versions of Grid, this was the default behavior. We've updated this behavior to set `--cpus` to 0 by default. By setting `--cpus` to 0, Grid will allocate all available CPU and memory to the experiment.

---

77 changes: 69 additions & 8 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,25 +12,82 @@ Upgrade your CLI with `pip install lightning-grid --upgrade`
:heart: Find us in our [Slack Community](http://gridai-community.slack.com) to say hi and/or to express your thoughts/questions.

---
## :zap: June 28, 2022

**CLI version: 0.8.67**

In addition to several stabilty improvements, this release introduces two very exciting new Datastores features for our BYOC users! If you're not a BYOC user, but would like to learn more or try out these features, don't hesitate to reach out to us at support@grid.ai

## :file_cabinet: What's New with Datastores

### Private S3 Mounting (BYOC Users Only)

Grid now supports the ability to create Datastores from private AWS S3 buckets by using
the `--no-copy` mode via the CLI. This is particularly valuable for incrementally adding data to the source bucket and for speeding up datastore creation when working with large buckets.

In order to allow Grid to access your private buckets,
you'll need to create an authorized AWS Role using the `grid credential create --type s3`
command (explained in detail below). After creating a role, you can run the
`grid datastore create S3://<private-bucket-name-here> --no-copy` command as usual - no
modifications needed.

[Create a credential](../docs/platform/3_credentials.md)
[Create a Datastore from a private S3 bucket](../docs/features/datastores/2_Using%20Datastores/2_creating-datastores.md#creating-datastore-from-private-aws-s3-buckets-byoc-users-only)

### High-Performance Datastores (BYOC Users Only)

High Performance Datastores (HPDs) allow Bring Your Own Cloud customers who are looking to scale large datasets to optimize latency and significantly speed up data access. Currently, HPDs are backed by the FSx for Lustre service and offer more scalability and higher throughput than conventional Grid datastores backed by AWS S3.

HPDs are most useful for very large datasets (>1TB) or when a dataset is going to be using by a large number of concurrent experiments or sessions.

[Create a High-Performance Datastore](../docs/features/datastores/2_Using%20Datastores/7_high-performance-datastores.md)

:::note
If you are interested in learning more, or enabling either of these features, you can contact support@grid.ai
:::

## Session Memory Improvements

- Disabled virtual memory limiting for GPU machines in Sessions, preventing out of memory failures
- Grid Runs now default to 0 CPUs. We recently discovered an issue with runs where setting `--cpus` to 1 would also reduce the memory, causing lots of OOM issues. In previous versions of Grid, this was the default behavior. We've updated this behavior to set `--cpus` to 0 by default. By setting `--cpus` to 0, Grid will allocate all available CPU and memory to the experiment.


---


## :warning: June 24, 2022

**CLI version: 0.8.65 **

This release includes an important update to how CPU and memory are allocated to experiments.

Prior to this release, Grid would set the default number of CPUs to 1 when creating runs and not explictly specifying `--cpus`.

We recently discovered an issue with runs where setting `--cpus` to 1 would also reduce the memory, causing lots of OOM issues.

So we've updated this behavior to set `--cpus` to 0 by default. This applies when creating runs with GPUs as well. By setting `--cpus` to 0, the backend will allocate all available CPU and memory to the experiment.


---

## :zap: June 7, 2022

**CLI version: 0.8.58**

Today's release includes several bug fixes that improve the Grid experience.

## Grid Cloud Instance Types

We've made some changes to the platform that will impact start times for Sessions and Runs.

As a result of these changes, you'll experience longer start times for Sessions and Runs that use the `p3.2xlarge` instance type. If you're looking for a faster start time, we suggest using the `g4dn.xlarge instance type instead`.
As a result of these changes, you'll experience longer start times for Sessions and Runs that use the `p3.2xlarge` instance type. If you're looking for a faster start time, we suggest using the `g4dn.xlarge` instance type instead.

**In future Grid releases, the following instance types will be supported:**

| Name | CPU | GPU | Memory | Accelerator | numberOfAccelerators acceleratorType availableMemory |
| :--- | :--- | :--- | :--- | :--- | :--- |
| m5a.large (recommended for fast startup times) | 2 | 0 | 8 | CPU | 2_CPU_8GB |
| **m5a.large (recommended for fast startup times)** | 2 | 0 | 8 | CPU | 2_CPU_8GB |
| m5a.2xlarge | 8 | 0 | 32 | CPU | 8_CPU_32GB |
| g4dn.xlarge (recommended for fast startup times) | 4 | 1 | 16 | T4 | 1_T4_16GB |
| **g4dn.xlarge (recommended for fast startup times)** | 4 | 1 | 16 | T4 | 1_T4_16GB |
| p3.2xlarge | 8 | 1 | 61 | V100 | 1_V100_61GB |
| p3.8xlarge | 32 | 4 | 244 | V100 | 4_V100_244GB |

Expand All @@ -43,20 +100,24 @@ In changing how we manage certain instance types, we're able to offer faster sta

### BYOC Instance Types

If you are currently using the BYOC feature, you will continue to have access to the full list of [supported AWS instance types](../docs/platform/3_machines.md#machines). If you are not currently using BYOC and want access to or information about additional instance types, reach out to us at support@grid.ai.
If you are currently using the BYOC feature, you will continue to have access to the full list of [supported AWS instance types](../docs/platform/4_machines.md#machines). If you are not currently using BYOC and want access to or information about additional instance types, reach out to us at support@grid.ai.


If you've got questions about these changes, reach out to us at support@grid.ai.

## Fixes and Enhancements

- Adds UI support for [skipping parameter evaluation](../docs/features/runs/1_Creating%20Runs/1_Basic%20Runs/3_sweep-syntax.md#skipping-parameter-evaluation) when running hyperparemeter sweeps

- Improvements to the process of integrating Grid with public and private Github organizations

- BYOC users: Fixes issue with starting runs with unavailable instance types. If the default instance type is not available, the first instance in the specified list of instances will be used instead.
- BYOC users: Fixes issue with starting runs with unavailable instance types. If the default instance type is not available, the first instance in the specified list of instances will be used instead

- Stability improvements in the UI to make analzying experiment results a better experience

- Better error messaging in the CLI!
- Better error messaging in the CLI

- Fixes CLI issue where users could only retrieve the 50 most recent runs. To request details for a specific run in your run history, use `grid status RUN_NAME`.
- Fixes CLI issue where users could only retrieve the 50 most recent runs. To request details for a specific run in your run history, use `grid status RUN_NAME`

## :warning: Known Issues

Expand Down
16 changes: 8 additions & 8 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ grid datastore [OPTIONS] COMMAND [ARGS]...
| Name | Type | Description | Default |
| ---- | ---- | ----------- | ------- |
| `--global` | boolean | Fetch sessions from everyone in the team when flag is passed | `False` |
| `--cluster` | text | The cluster id to list datastores for. | `prod-2` |
| `--cluster` | text | The cluster id to list datastores for. | `test-7` |
| `--show-incomplete` | boolean | Show any datastore uploads which were started, but killed or errored before they finished uploading all data and became "viewable" on the grid datastore user interface. | `False` |
| `--help` | boolean | Show this message and exit. | `False` |

Expand Down Expand Up @@ -197,7 +197,7 @@ grid datastore create [OPTIONS] [SOURCE]
| ---- | ---- | ----------- | ------- |
| `--source` | text | N/A | None |
| `--name` | text | Name of the datastore | None |
| `--cluster` | text | cluster id to create the datastore on. (Bring Your Own Cloud Customers Only). | `prod-2` |
| `--cluster` | text | cluster id to create the datastore on. (Bring Your Own Cloud Customers Only). | `test-7` |
| `--help` | boolean | Show this message and exit. | `False` |

### delete
Expand All @@ -219,7 +219,7 @@ grid datastore delete [OPTIONS]
| ---- | ---- | ----------- | ------- |
| `--name` | text | Name of the datastore | |
| `--version` | integer | Version of the datastore | |
| `--cluster` | text | cluster id to delete the datastore from. (Bring Your Own Cloud Customers Only). | `prod-2` |
| `--cluster` | text | cluster id to delete the datastore from. (Bring Your Own Cloud Customers Only). | `test-7` |
| `--help` | boolean | Show this message and exit. | `False` |

### resume
Expand Down Expand Up @@ -404,7 +404,7 @@ grid instance-types [OPTIONS]

| Name | Type | Description | Default |
| ---- | ---- | ----------- | ------- |
| `--cluster` | text | Cluster ID whence the instance types needs to be fetched. (Bring Your Own Cloud Customers Only). | `prod-2` |
| `--cluster` | text | Cluster ID whence the instance types needs to be fetched. (Bring Your Own Cloud Customers Only). | `test-7` |
| `--help` | boolean | Show this message and exit. | `False` |

## grid login
Expand Down Expand Up @@ -475,14 +475,14 @@ grid run [OPTIONS] [RUN_COMMAND]...
| ---- | ---- | ----------- | ------- |
| `--config` | Path | Path to Grid config YML. | None |
| `--name` | text | Name for this run | None |
| `--cluster` | text | N/A | `prod-2` |
| `--cluster` | text | N/A | `test-7` |
| `--strategy` | choice (`grid_search` &#x7C; `random_search`) | Hyper-parameter search strategy | None |
| `--num_trials` | text | Number of samples from full search space that are used by the random_search strategy | None |
| `--seed` | text | Seed value for the `random_search` strategy | None |
| `--instance_type` | text | Instance type to start training session in | `t2.medium` |
| `--gpus` | integer | Number of GPUs to allocate per experiment | `0` |
| `--cpus` | integer | Number of CPUs to allocate per experiment. This parameter also affects memory (RAM) allocating for your experiment using the following rule: the amount of memory for the experiments will be allocated in the same proportion as the CPU allocated for the instance type chosen for the experiments. For example, if you plan to choose a machine with 16 CPUs and 64 Gb RAM and use a default value of CPUs (1 CPU) for your experiments, 1/16 * 64 Gb = 4 Gb of RAM will be allocated per each experiment. | `1` |
| `--memory` | text | How much disk memory (storage) an experiment needs, Gb | `100` |
| `--cpus` | integer | Number of CPUs to allocate per experiment | `1` |
| `--memory` | text | How much memory an experiment needs | `100` |
| `--datastore_name` | text | Datastore name to be mounted in training | None |
| `--datastore_version` | integer | Datastore version to be mounted in training | None |
| `--datastore_mount_dir` | text | Directory to mount Datastore in training job. The default datastore mount location is /datastores | None |
Expand Down Expand Up @@ -572,7 +572,7 @@ grid session create [OPTIONS]

| Name | Type | Description | Default |
| ---- | ---- | ----------- | ------- |
| `--cluster` | text | Cluster to run on | `prod-2` |
| `--cluster` | text | Cluster to run on | `test-7` |
| `--instance_type` | text | Instance type to start session in. | `t2.medium` |
| `--use_spot` | boolean | Use spot instance. The spot instances, or preemptive instance can be shut down at will | `False` |
| `--disk_size` | integer | The disk size in GB to allocate to the session. | `200` |
Expand Down
7 changes: 6 additions & 1 deletion docs/features/datastores/1_datastore-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ Datastores today have 3 main capabilities:
[Sessions](../../features/sessions/README.md)
3. Create-able from your local machine, Sessions, or Cluster!

### High-Performance Datastores (BYOC users only)

High Performance Datastores (HPDs) allow Bring Your Own Cloud customers who are looking to scale large datasets to optimize latency and significantly speed up data access. Currently, HPDs are backed by the FSx for Lustre service and offer more scalability and higher throughput than conventional Grid datastores backed by AWS S3.

HPDs are most useful for very large datasets (>1TB) or when a dataset is going to be using by a large number of concurrent experiments or sessions.

### How do I access the data in a datastore?

By default, datastores are mounted at `/datastores/<datastore-name>/` on both
Expand All @@ -73,7 +79,6 @@ manually specify the Datastore mount path when using the CLI. Please refer to

## Next Steps


For more information on using Datastores, start with the first section of the
[Using Datastores](./2_Using%20Datastores/1_How-to-use-datastores.md) tutorial.

Expand Down
Loading

0 comments on commit c6853bd

Please sign in to comment.