From 49b54c30557e0a401b2f56a31cd149951e338662 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 15 Feb 2024 23:54:42 -0700 Subject: [PATCH] mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li Co-authored-by: Huiying --- docs/source/nlp/nemo_megatron/mcore_customization.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/nlp/nemo_megatron/mcore_customization.rst b/docs/source/nlp/nemo_megatron/mcore_customization.rst index 1f6648d9c899..618ece3f2e48 100644 --- a/docs/source/nlp/nemo_megatron/mcore_customization.rst +++ b/docs/source/nlp/nemo_megatron/mcore_customization.rst @@ -1,7 +1,7 @@ Megatron Core Customization --------------------------- -Megatron Core (Mcore) offers a range of functionalities, one of the most notable being the ability for users to train GPT models on an epic scale. Users can use ``megatron.core.models.gpt.GPTModel`` (Mcore GPTModel) to initialize the model, and then pretrain/load weights into the model. Mcore GPTModel adopts the typical GPT structure, beginning with embedding layer, positional encoding, followed by a series of transformer layers and finally output layer. +Megatron Core (Mcore) offers a range of functionalities, one of the most notable being the ability for users to train Transformer models on an epic scale. Users can enable decoder/GPT variants by using ``megatron.core.models.gpt.GPTModel`` (Mcore GPTModel) to initialize the model, and then pretrain/load weights into the model. Mcore GPTModel adopts the typical GPT structure, beginning with embedding layer, positional encoding, followed by a series of transformer layers and finally output layer. In the rapidly advancing world of LLM, it is increasingly important to experiment with various configurations of the transformer block within each transformer layer. Some of these configurations involve the use of different module classes. While it is possible to achieve this with “if else” statements in Mcore, doing so makes Mcore less readable and less maintainable in the long term. Mcore spec intends to solve this challenge by allowing users to specify a customization of the transformer block in each layer, without modifying code in mcore. We will dive more into the details of mcore spec in the first section of this blog. Then, we will demonstrate the usefulness of mcore spec using Falcon as an example.