Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAX Hello World Multi-Node GKE H100 with GPUDirectTCPx tutorial #1236 #1237

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

parambole
Copy link

Description

Adds JAX Hello World Multi-Node GKE H100 with GPUDirectTCPx tutorial

Tasks

  • The contributing guide has been read and followed.
  • The samples added / modified have been fully tested.
  • Workflow files have been added / modified, if applicable.
  • Region tags have been properly added, if new samples.
  • All dependencies are set to up-to-date versions, as applicable.

Copy link
Member

@bourgeoisor bourgeoisor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few minor comments. Feel free to ping me when the PR is stable and ready for review, @parambole !

@@ -0,0 +1,53 @@
# JAX Mult-Node 'Hello World' on GKE + H100-80GB with GPUDirectTCPx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we'll have instructions be the Google Cloud docs tutorial, and point to that from here. This reduces duplication and makes it so we only have to modify the instructions in one source of truth (the tutorial)

@@ -0,0 +1,14 @@
FROM python:3.10-slim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a workflow that does at minimum docker build as a dry-run that it does build? You can find some examples in the .github/workflows directory

https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/.github/CONTRIBUTING.md#samples-requirements

@bourgeoisor
Copy link
Member

@parambole hi! Friendly ping for these samples

@NimJay NimJay marked this pull request as draft August 20, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants