From f3c4ccb002517a0ec4619dec84f129abde419e6d Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Fri, 5 Apr 2024 10:28:22 -0700 Subject: [PATCH 1/8] Update Latest News Adds links to articles on * NeMo framework on GKE * Responsible Gen AI using NeMo and Picasso * NeMo powering Amazon Titan foundation models Signed-off-by: Shashank Verma --- README.rst | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 1beef67832f0..b4353b046bcf 100644 --- a/README.rst +++ b/README.rst @@ -41,7 +41,18 @@ Latest News ----------- -- 2023/12/06 `New NVIDIA NeMo Framework Features and NVIDIA H200 `_ +- `Accelerate your generative AI journey with NVIDIA NeMo framework on GKE `_ (2024/03/16) + +NVIDIA NeMo now includes instructions on how to train generative AI models on the Google Kubernetes Engine (GKE) using NVIDIA accelerated computing and the NVIDIA NeMo Framework. An end-to-end walkthrough is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and use the NVIDIA NeMo Megatron Generative Pre-trained Transformer (GPT) with the NeMo Framework. + + +- `Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso `_ (2024/03/06) + +NVIDIA NeMo now supplies Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises, with the NVIDIA NeMo Framework and NVIDIA Picasso. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. + +- `New NVIDIA NeMo Framework Features and NVIDIA H200 `_ (2023/12/06) + +NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. .. image:: https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png :target: https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility @@ -52,7 +63,9 @@ NeMo Framework has been updated with state-of-the-art features, such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups up to 4.2x for Llama-2 pre-training on H200. **All of these features will be available in an upcoming release.** - +- `NVIDIA now powers training for Amazon Titan Foundation models `_ (2023/11/28) + +NVIDIA NeMo now empowers the Amazon Titan Foundation models (FM) with efficient training and high-quality generative AI. The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running large language models (LLMs). Amazon Web Services (AWS) leverage the NeMo Framework to create and fine-tune Titan models which benefit from its extensibility, scalability, parallelism techniques, and high GPU utilization. Introduction ------------ From d7d251421d6ebd24a7066e853a688b8b4156c727 Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Fri, 5 Apr 2024 11:26:25 -0700 Subject: [PATCH 2/8] Minor updates to latest news in README * Remove bullets * Editing text for clarity Signed-off-by: Shashank Verma --- README.rst | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/README.rst b/README.rst index b4353b046bcf..b762a5b7975a 100644 --- a/README.rst +++ b/README.rst @@ -41,16 +41,17 @@ Latest News ----------- -- `Accelerate your generative AI journey with NVIDIA NeMo framework on GKE `_ (2024/03/16) +`Accelerate your generative AI journey with NVIDIA NeMo framework on GKE `_ (2024/03/16) -NVIDIA NeMo now includes instructions on how to train generative AI models on the Google Kubernetes Engine (GKE) using NVIDIA accelerated computing and the NVIDIA NeMo Framework. An end-to-end walkthrough is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and use the NVIDIA NeMo Megatron Generative Pre-trained Transformer (GPT) with the NeMo Framework. +An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework. -- `Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso `_ (2024/03/06) - -NVIDIA NeMo now supplies Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises, with the NVIDIA NeMo Framework and NVIDIA Picasso. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. +`Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso `_ (2024/03/06) + +Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. + -- `New NVIDIA NeMo Framework Features and NVIDIA H200 `_ (2023/12/06) +`New NVIDIA NeMo Framework Features and NVIDIA H200 `_ (2023/12/06) NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. @@ -59,13 +60,11 @@ NVIDIA NeMo Framework now includes several optimizations and enhancements, inclu :alt: H200-NeMo-performance :width: 600 -NeMo Framework has been updated with state-of-the-art features, -such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups up to 4.2x for Llama-2 pre-training on H200. -**All of these features will be available in an upcoming release.** -- `NVIDIA now powers training for Amazon Titan Foundation models `_ (2023/11/28) - -NVIDIA NeMo now empowers the Amazon Titan Foundation models (FM) with efficient training and high-quality generative AI. The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running large language models (LLMs). Amazon Web Services (AWS) leverage the NeMo Framework to create and fine-tune Titan models which benefit from its extensibility, scalability, parallelism techniques, and high GPU utilization. +`NVIDIA now powers training for Amazon Titan Foundation models `_ (2023/11/28) + +NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs. + Introduction ------------ From 9e31ba46ca9653a0f70cc15afe1c75b2f294163b Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Mon, 8 Apr 2024 14:36:25 -0700 Subject: [PATCH 3/8] Format latest news as a dropdown list * Uses embedded html to format news to dropdown, hiding lengthy details * Fixes formatting of the title Signed-off-by: Shashank Verma --- README.rst | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/README.rst b/README.rst index b762a5b7975a..e4eb6bd66bc5 100644 --- a/README.rst +++ b/README.rst @@ -36,34 +36,44 @@ .. _main-readme: **NVIDIA NeMo Framework** -=============== +========================= + Latest News ----------- -`Accelerate your generative AI journey with NVIDIA NeMo framework on GKE `_ (2024/03/16) - -An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework. +.. raw:: html +
+ Accelerate your generative AI journey with NVIDIA NeMo framework on GKE (2024/03/16) -`Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso `_ (2024/03/06) + An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework. +

-Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. +
+
+ Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso (2024/03/06) -`New NVIDIA NeMo Framework Features and NVIDIA H200 `_ (2023/12/06) + Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. +

+
-NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. +
+ New NVIDIA NeMo Framework Features and NVIDIA H200 (2023/12/06) -.. image:: https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png - :target: https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility - :alt: H200-NeMo-performance - :width: 600 + NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. + H200-NeMo-performance +

+
-`NVIDIA now powers training for Amazon Titan Foundation models `_ (2023/11/28) +
+ NVIDIA now powers training for Amazon Titan Foundation models (2023/11/28) -NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs. + NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs. +

+
Introduction From dca4f65c9768846aa42dabec3eb61172e7387c01 Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Mon, 8 Apr 2024 14:40:16 -0700 Subject: [PATCH 4/8] Add break to improve readability of latest news image Signed-off-by: Shashank Verma --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index e4eb6bd66bc5..621b66071e17 100644 --- a/README.rst +++ b/README.rst @@ -63,7 +63,7 @@ Latest News New NVIDIA NeMo Framework Features and NVIDIA H200 (2023/12/06) NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. - +

H200-NeMo-performance

From 2b8cefcdddc6c277a8da4ce60f100f2a6ae0a419 Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Tue, 16 Apr 2024 20:55:42 -0700 Subject: [PATCH 5/8] Add LLM and MM section in latest news Signed-off-by: Shashank Verma --- README.rst | 49 +++++++++++++++++++++++++++---------------------- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/README.rst b/README.rst index 2dc98fa8e3cd..37a86499ece7 100644 --- a/README.rst +++ b/README.rst @@ -43,36 +43,41 @@ Latest News .. raw:: html -
- Accelerate your generative AI journey with NVIDIA NeMo framework on GKE (2024/03/16) +
+ Large Language Models and Multimodal +
+ Accelerate your generative AI journey with NVIDIA NeMo framework on GKE (2024/03/16) - An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework. -

+ An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework. +

+
-
+
+ Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso (2024/03/06) -
- Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso (2024/03/06) + Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. +

+
- Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference. -

-
+
+ New NVIDIA NeMo Framework Features and NVIDIA H200 (2023/12/06) -
- New NVIDIA NeMo Framework Features and NVIDIA H200 (2023/12/06) + NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. +

+ H200-NeMo-performance +

+
- NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs. -

- H200-NeMo-performance -

-
+
+ NVIDIA now powers training for Amazon Titan Foundation models (2023/11/28) -
- NVIDIA now powers training for Amazon Titan Foundation models (2023/11/28) + NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs. +

+
- NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs. -

-
+
+ + Introduction From 8b2f440db8ac6b81eb10863116373fd994a4b548 Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Tue, 16 Apr 2024 21:24:11 -0700 Subject: [PATCH 6/8] Add margin in latest news expandable lists Signed-off-by: Shashank Verma --- README.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.rst b/README.rst index 37a86499ece7..7e7e4b17119f 100644 --- a/README.rst +++ b/README.rst @@ -43,6 +43,12 @@ Latest News .. raw:: html + +
Large Language Models and Multimodal
From e5feb40e428226ed2035466edd8e7e386e37d269 Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Tue, 16 Apr 2024 21:31:08 -0700 Subject: [PATCH 7/8] Remove styling of expandable list * Github appears to not render styled elements when embedded as raw html in rst Signed-off-by: Shashank Verma --- README.rst | 6 ------ 1 file changed, 6 deletions(-) diff --git a/README.rst b/README.rst index 7e7e4b17119f..37a86499ece7 100644 --- a/README.rst +++ b/README.rst @@ -43,12 +43,6 @@ Latest News .. raw:: html - -
Large Language Models and Multimodal
From 6a14069b9e1f804fe309fc70f6155b1a04825215 Mon Sep 17 00:00:00 2001 From: Shashank Verma Date: Wed, 17 Apr 2024 16:26:15 -0700 Subject: [PATCH 8/8] Fold the first news item by default Signed-off-by: Shashank Verma --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 37a86499ece7..f4c2a541960f 100644 --- a/README.rst +++ b/README.rst @@ -45,7 +45,7 @@ Latest News
Large Language Models and Multimodal -
+
Accelerate your generative AI journey with NVIDIA NeMo framework on GKE (2024/03/16) An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.