Update PEFT Doc (#8501)

* update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental commit Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental commit Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
NVIDIA · Feb 26, 2024 · 5d3ab6a · 5d3ab6a
1 parent 3f83fa2
commit 5d3ab6a
Show file tree

Hide file tree

Showing 5 changed files with 18 additions and 1,192 deletions.
diff --git a/README.rst b/README.rst
@@ -57,19 +57,19 @@ such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups
 Introduction
 ------------
 
-NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers 
+NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers
 working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR),
 and text-to-speech synthesis (TTS).
-The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia 
+The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia
 to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models.
 
 For technical documentation, please see the `NeMo Framework User Guide <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_.
 
 All NeMo models are trained with `Lightning <https://github.com/Lightning-AI/lightning>`_ and
 training is automatically scalable to 1000s of GPUs.
 
-When applicable, NeMo models take advantage of the latest possible distributed training techniques, 
-including parallelism strategies such as 
+When applicable, NeMo models take advantage of the latest possible distributed training techniques,
+including parallelism strategies such as
 
 * data parallelism
 * tensor parallelism
@@ -84,7 +84,7 @@ and mixed precision training recipes with bfloat16 and FP8 training.
 NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`_ for FP8 training on NVIDIA Hopper GPUs
 and leverages `NVIDIA Megatron Core <https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core>`_ for scaling transformer model training.
 
-NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF), 
+NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF),
 see `NVIDIA NeMo Aligner <https://github.com/NVIDIA/NeMo-Aligner>`_ for more details.
 
 NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) <https://developer.nvidia.com/nemo-microservices-early-access>`_.
@@ -93,7 +93,7 @@ NeMo ASR and TTS models can be optimized for inference and deployed for producti
 
 For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the `NVIDIA Framework Launcher <https://github.com/NVIDIA/NeMo-Megatron-Launcher>`_.
 The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an `Autoconfigurator <https://github.com/NVIDIA/NeMo-Megatron-Launcher#53-using-autoconfigurator-to-find-the-optimal-configuration>`_
-which can be used to find the optimal model parallel configuration for training on a specific cluster. 
+which can be used to find the optimal model parallel configuration for training on a specific cluster.
 To get started quickly with the NeMo Framework Launcher, please see the `NeMo Framework Playbooks <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_
 The NeMo Framework Launcher does not currently support ASR and TTS training but will soon.
 

diff --git a/docs/source/nlp/nemo_megatron/peft/landing_page.rst b/docs/source/nlp/nemo_megatron/peft/landing_page.rst
@@ -12,14 +12,14 @@ fraction of the computational and storage costs.
 NeMo supports four PEFT methods which can be used with various
 transformer-based models.
 
-==================== ===== ===== ========= ==
-\                    GPT 3 NvGPT LLaMa 1/2 T5
-==================== ===== ===== ========= ==
-Adapters (Canonical) ✅    ✅    ✅        ✅
-LoRA                 ✅    ✅    ✅        ✅
-IA3                  ✅    ✅    ✅        ✅
-P-Tuning             ✅    ✅    ✅        ✅
-==================== ===== ===== ========= ==
+==================== ===== ======== ========= ====== ==
+\                    GPT 3 Nemotron LLaMa 1/2 Falcon T5
+==================== ===== ======== ========= ====== ==
+LoRA                  ✅    ✅      ✅        ✅     ✅
+P-Tuning              ✅    ✅      ✅        ✅     ✅
+Adapters (Canonical)  ✅    ✅      ✅               ✅
+IA3                   ✅    ✅      ✅               ✅
+==================== ===== ======== ========= ====== ==
 
 Learn more about PEFT in NeMo with the :ref:`peftquickstart` which provides an overview on how PEFT works
 in NeMo. Read about the supported PEFT methods

diff --git a/docs/source/nlp/nemo_megatron/peft/quick_start.rst b/docs/source/nlp/nemo_megatron/peft/quick_start.rst
@@ -62,7 +62,7 @@ Base model classes
 PEFT in NeMo is built with a mix-in class that does not belong to any
 model in particular. This means that the same interface is available to
 different NeMo models. Currently, NeMo supports PEFT for GPT-style
-models such as GPT 3, NvGPT, LLaMa 1/2 (``MegatronGPTSFTModel``), as
+models such as GPT 3, Nemotron, LLaMa 1/2 (``MegatronGPTSFTModel``), as
 well as T5 (``MegatronT5SFTModel``).
 
 Full finetuning vs PEFT
@@ -78,11 +78,13 @@ PEFT.
    trainer = MegatronTrainerBuilder(config).create_trainer()
    model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)
 
+   ### Training API ###
    model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt
-   + peft_cfg = LoRAPEFTConfig(model_cfg)
+   + peft_cfg = LoraPEFTConfig(model_cfg)
    + model.add_adapter(peft_cfg) 
    trainer.fit(model)  # saves adapter weights only
 
+   ### Inference API ###
    # Restore from base then load adapter API 
    model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg)
    + model.load_adapters(adapter_save_path, peft_cfg)