A4 Llama 3.1 70B recipe on NeMo 2.0 with GCSFuse storage #37

akansha1812 · 2025-11-03T23:30:41Z

Add complete helm chart with readme and tests the scripts.

TODO: update src/helm-charts/storage/gcs-fuse/templates/pv.yaml with comment when to add machine-type:a3-highgpu-8g based on b/450059657#comment27

training/a4/llama3-1-70b/nemo2-pretraining-gke/README.md

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/README.md

training/a4/llama3-1-70b/nemo2-pretraining-gke/README.md

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/README.md

src/helm-charts/storage/gcs-fuse/templates/pv.yaml

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/README.md

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/values.yaml

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/README.md

mkmg

Thanks!

with GCSFuse storage This change added deployment configs and instructions for A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage. I followed [previous training storage recipe PR](AI-Hypercomputer#37) and modified based on existing [CMCS recipe with HuggingFace](AI-Hypercomputer#50) TESTED=unit tests

…with GCSFuse storage (#55) * [A4X TensorRT Inference Benchmark] A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage This change added deployment configs and instructions for A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage. I followed [previous training storage recipe PR](#37) and modified based on existing [CMCS recipe with HuggingFace](#50) TESTED=unit tests * Fix readme * Fix README * Resolve comments * Format the content table * Format content tables * Correct grammar issue in README * Correct format

akansha1812 added 6 commits November 3, 2025 23:27

A4 Llama 3.1 70B recipe on NeMo 2.0 with GCSFuse storage

2614505

update project-id

bb9977f

remove default values

ada2196

files will be set as part of helm command

00c8b7a

test all the commands and fix

668a204

fix typo

4a85a19

mkmg reviewed Nov 13, 2025

View reviewed changes

akansha1812 added 4 commits November 13, 2025 21:42

resolve comments

0fd8ed4

gcs suffix

0e7c510

resolve comments

327947d

add gcs suffix

0d94e7b

mkmg reviewed Nov 13, 2025

View reviewed changes

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/README.md Outdated Show resolved Hide resolved

akansha1812 added 3 commits November 14, 2025 07:41

change dataset path

555c87d

update readme

658c35b

update readme

e8bda69

akansha1812 requested a review from mkmg November 17, 2025 19:16

akansha1812 commented Nov 17, 2025

View reviewed changes

src/helm-charts/storage/gcs-fuse/templates/pv.yaml Show resolved Hide resolved

mkmg reviewed Nov 17, 2025

View reviewed changes

training/a4/llama3-1-70b/nemo-pretraining-gke/16node-bf16-seq8192-gbs512-gcs/README.md Outdated Show resolved Hide resolved

update readme

e88b14c

akansha1812 requested a review from mkmg November 19, 2025 01:04

machine-type comment

2cf2379

mkmg reviewed Nov 19, 2025

View reviewed changes

resolve comments and other typos

ffd6a80

akansha1812 requested a review from mkmg November 24, 2025 18:03

mkmg reviewed Nov 24, 2025

View reviewed changes

resolve comments

4661caa

mkmg approved these changes Nov 24, 2025

View reviewed changes

akansha1812 requested a review from mkmg November 24, 2025 19:11

mkmg merged commit ce84f75 into AI-Hypercomputer:main Nov 24, 2025
1 check passed

lepan-google mentioned this pull request Dec 3, 2025

[A4X TensorRT Inference Benchmark] A4X DeepSeek R1 NVFP4 on TensorRT with GCSFuse storage #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A4 Llama 3.1 70B recipe on NeMo 2.0 with GCSFuse storage #37

A4 Llama 3.1 70B recipe on NeMo 2.0 with GCSFuse storage #37

Uh oh!

akansha1812 commented Nov 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkmg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

A4 Llama 3.1 70B recipe on NeMo 2.0 with GCSFuse storage #37

A4 Llama 3.1 70B recipe on NeMo 2.0 with GCSFuse storage #37

Uh oh!

Conversation

akansha1812 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkmg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akansha1812 commented Nov 3, 2025 •

edited

Loading