-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pruning and distillation tutorial notebooks #11091
Conversation
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
@@ -17,6 +17,6 @@ This repository contains jupyter notebook tutorials using NeMo Framework for Lla | |||
* - `Llama 3.1 Law-Domain LoRA Fine-Tuning and Deployment with NeMo Framework and NVIDIA NIM <./sdg-law-title-generation>`_ | |||
- `Law StackExchange <https://huggingface.co/datasets/ymoslem/Law-StackExchange>`_ | |||
- Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a pre-requisite, follow the tutorial for `data curation using NeMo Curator <https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg>`__. | |||
* - `Llama 3.1 WikiText Pruning and Distillation with NeMo Framework <./pruning-distillation>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 5/5 fix capitalization
This repository contains Jupyter Notebook tutorials using the NeMo Framework for LLama-3 and LLama-3.1 models by Meta.
@@ -17,6 +17,6 @@ This repository contains jupyter notebook tutorials using NeMo Framework for Lla | |||
* - `Llama 3.1 Law-Domain LoRA Fine-Tuning and Deployment with NeMo Framework and NVIDIA NIM <./sdg-law-title-generation>`_ | |||
- `Law StackExchange <https://huggingface.co/datasets/ymoslem/Law-StackExchange>`_ | |||
- Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a pre-requisite, follow the tutorial for `data curation using NeMo Curator <https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 19/19 fix punctuation.
Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a prerequisite, follow the tutorial for data curation using NeMo Curator <https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg>
__.
"\n", | ||
"The dataset has to be preprocessed using the [preprocess_data_for_megatron.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/preprocess_data_for_megatron.py) script included in the NeMo Framework. This step will also tokenize data using the `meta-llama/Meta-Llama-3.1-8B` tokenizer model to convert the data into a memory map format.\n", | ||
"\n", | ||
"> `NOTE:` In the block of code below, pass the paths to your train, test and validation data files." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
In the block of code below, pass the paths to your train, test, and validation data files.
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"### Step 2: Finetune the teacher on the dataset\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
Step 2: Fine-tune the teacher on the dataset
"\n", | ||
"### Step 2: Finetune the teacher on the dataset\n", | ||
"\n", | ||
"NeMo framework includes a standard python script [megatron_gpt_pretraining.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_pretraining.py) for training a model. Once you have your model downloaded and the dataset ready, fine-tuning the teacher model with NeMo is essentially just running this script!\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation and capitalization
"NeMo Framework includes a standard Python script, megatron_gpt_pretraining.py, for training a model. Once you have your model downloaded and the dataset ready, fine-tuning the teacher model with NeMo is essentially just running this script!\n",
"\n", | ||
"NeMo framework includes a standard python script [megatron_gpt_pretraining.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_pretraining.py) for training a model. Once you have your model downloaded and the dataset ready, fine-tuning the teacher model with NeMo is essentially just running this script!\n", | ||
"\n", | ||
"We finetune the unpruned model on our dataset to correct the distribution shift across the original dataset the model was trained on. Per the [blog](https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/) and [tech report](https://arxiv.org/pdf/2408.11796), experiments showed that, without correcting for the distribution shift, the teacher provides suboptimal guidance on the dataset when being distilled.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
We fine-tune the unpruned model on our dataset to correct the distribution shift from the original dataset the model was trained on. According to the blog and tech report, experiments showed that without correcting for this distribution shift, the teacher provides suboptimal guidance on the dataset during distillation.
"metadata": {}, | ||
"source": [ | ||
"#### Validation Loss using depth-pruned model as student in distillation script\n", | ||
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script when we distill the knowledge from the finetuned teacher model to the depth-pruned student." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation, revise sentence
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script, where we distill the knowledge from the fine-tuned teacher model to the depth-pruned student."
{ | ||
"data": { | ||
"text/html": [ | ||
"<h5>Validation Loss over 30 Training Steps with Depth-Pruned model as Student</h5>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization
Validation Loss over 30 Training Steps with Depth-Pruned Model as Student
], | ||
"source": [ | ||
"from IPython.display import Image, display, HTML\n", | ||
"title = \"Validation Loss over 30 Training Steps with Depth-Pruned model as Student\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization
title = "Validation Loss over 30 Training Steps with Depth-Pruned Model as Student"\n",
"id": "f10041ae-6533-47de-9f76-f97d4469c27a", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Validation Loss using width-pruned model as student in distillation script\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization
Validation Loss Using Width-Pruned Model as Student in Distillation Script\n",
"metadata": {}, | ||
"source": [ | ||
"#### Validation Loss using width-pruned model as student in distillation script\n", | ||
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script when we distill the knowledge from the finetuned teacher model to the width-pruned student." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization, revise sentence
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script, where we distill the knowledge from the fine-tuned teacher model to the width-pruned student."
{ | ||
"data": { | ||
"text/html": [ | ||
"<h5>Validation Loss over 30 Training Steps with Width-Pruned model as Student</h5>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization
"
Validation Loss over 30 Training Steps with Width-Pruned Model as Student
"@@ -1,18 +1,26 @@ | |||
Llama 3.1 WikiText Pruning and Distillation with NeMo Framework | |||
Llama 3.1 Pruning and Distillation with NeMo Framework | |||
======================================================================================= | |||
|
|||
`Llama 3.1 <https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/>`_ are open-source large language models by Meta that deliver state-of-the-art performance on popular industry benchmarks. They have been pretrained on over 15 trillion tokens, and support a 128K token context length. They are available in three sizes, 8B, 70B, and 405B, and each size has two variants—base pretrained and instruction tuned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise paragraph
LLama 3.1 models, developed by Meta, are open-source large language models that deliver state-of-the-art performance on popular industry benchmarks. Pretrained on over 15 trillion tokens, they support a 128K token context length. These models are available in three sizes: 8B, 70B, and 405B. Each size offers two variants: base pretrained and instruction tuned.
@@ -1,18 +1,26 @@ | |||
Llama 3.1 WikiText Pruning and Distillation with NeMo Framework | |||
Llama 3.1 Pruning and Distillation with NeMo Framework | |||
======================================================================================= | |||
|
|||
`Llama 3.1 <https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/>`_ are open-source large language models by Meta that deliver state-of-the-art performance on popular industry benchmarks. They have been pretrained on over 15 trillion tokens, and support a 128K token context length. They are available in three sizes, 8B, 70B, and 405B, and each size has two variants—base pretrained and instruction tuned. | |||
|
|||
`NVIDIA NeMo Framework <https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html>`_ provides tools to perform teacher finetuning, pruning and distillation on Llama 3.1 to fit your use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
NVIDIA NeMo Framework <https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html>
_ provides tools to perform teacher fine-tuning, pruning, and distillation on Llama 3.1 to fit your use case.
======================================================================================= | ||
|
||
`Llama 3.1 <https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/>`_ are open-source large language models by Meta that deliver state-of-the-art performance on popular industry benchmarks. They have been pretrained on over 15 trillion tokens, and support a 128K token context length. They are available in three sizes, 8B, 70B, and 405B, and each size has two variants—base pretrained and instruction tuned. | ||
|
||
`NVIDIA NeMo Framework <https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html>`_ provides tools to perform teacher finetuning, pruning and distillation on Llama 3.1 to fit your use case. | ||
|
||
`NVIDIA TensorRT Model Optimizer <https://github.com/NVIDIA/TensorRT-Model-Optimizer>`_ is a library (referred to as **Model Optimizer**, or **ModelOpt**) comprising state-of-the-art model optimization techniques including `quantization <https://github.com/NVIDIA/TensorRT-Model-Optimizer#quantization>`_, `sparsity <https://github.com/NVIDIA/TensorRT-Model-Optimizer#sparsity>`_, `distillation <https://github.com/NVIDIA/TensorRT-Model-Optimizer#distillation>`_, and `pruning <https://github.com/NVIDIA/TensorRT-Model-Optimizer#pruning>`_ to compress models. | ||
|
||
`LLM Pruning and Distillation in Practice: The Minitron Approach <https://arxiv.org/abs/2408.11796>`_ provides tools to perform teacher finetuning, pruning and distillation on Llama 3.1 as described in the `tech report <https://arxiv.org/abs/2408.11796>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
LLM Pruning and Distillation in Practice: The Minitron Approach <https://arxiv.org/abs/2408.11796>
_ provides tools to perform teacher fine-tuning, pruning, and distillation on Llama 3.1 as described in the tech report <https://arxiv.org/abs/2408.11796>
_.
This tutorial shows how to perform depth-pruning, teacher finetuning and distillation on **Llama 3.1 8B** using the `WikiText-103-v1 <https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1>`_ dataset with NeMo Framework. The `WikiText-103-v1 <https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1>`_ language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. For this demonstration, we will perform teacher correction by running a light finetuning procedure on the ``Meta Llama 3.1 8B`` teacher model to generate a finetuned teacher model ``megatron_llama_ft.nemo`` needed for optimal distillation. This finetuned teacher model is then trimmed. There are two methods to prune a model: depth-pruning and width-pruning. We will be exploring both pruning techniques which will yield ``4b_depth_pruned_model.nemo`` and ``4b_width_pruned_model.nemo`` respectively. These models will serve as a starting point for distillation to create the final distilled 4B models. | ||
We are using models utilizing the ``meta-llama/Meta-Llama-3.1-8B`` tokenizer for this demonstration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation and revise paragraph
This tutorial demonstrates how to perform depth-pruning, teacher fine-tuning, and distillation on LLama 3.1 8B using the WikiText-103-v1 <https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1>
_ dataset with the NeMo Framework. The WikiText-103-v1 <https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1>
_ language modeling dataset comprises over 100 million tokens extracted from verified Good and Featured articles on Wikipedia.
For this demonstration, we will perform teacher correction by running a light fine-tuning procedure on the Meta LLama 3.1 8B
teacher model to generate a fine-tuned teacher model, megatron_llama_ft.nemo
, needed for optimal distillation. This fine-tuned teacher model is then trimmed. There are two methods to prune a model: depth-pruning and width-pruning. We will explore both techniques, yielding 4b_depth_pruned_model.nemo
and 4b_width_pruned_model.nemo
, respectively. These models will serve as starting points for distillation to create the final distilled 4B models.
We are using models utilizing the ``meta-llama/Meta-Llama-3.1-8B`` tokenizer for this demonstration. | ||
|
||
``NOTE:`` A subset of functions is being demonstrated in the notebooks. Some features like Neural Architecture Search (NAS) are unavailable but will be supported in future releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
NOTE:
A subset of functions is being demonstrated in the notebooks. Some features like Neural Architecture Search (NAS) are unavailable, but will be supported in future releases.
We are using models utilizing the ``meta-llama/Meta-Llama-3.1-8B`` tokenizer for this demonstration. | ||
|
||
``NOTE:`` A subset of functions is being demonstrated in the notebooks. Some features like Neural Architecture Search (NAS) are unavailable but will be supported in future releases. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 20/28 revise bullet text
Access to at least 8 NVIDIA GPUs, each with a memory of at least 80GB (e.g., 8 x H100-80GB or 8 x A100-80GB).
Line 23/31 fix punctuation
Authenticate with NVIDIA NGC <https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#ngc-authentication>
_ and downloadNGC CLI Tool <https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#ngc-cli-tool>
_. You will use this tool to download the model and customize it with NeMo Framework.
Line 27/35 revise note text
NOTE:
The default configuration in the notebook runs on 8 x 80GB NVIDIA GPUs. However, you can potentially reduce the Tensor Parallel size (TENSOR_PARALLEL_SIZE)
along with the Micro-Batchsize (MICRO_BATCH_SIZE)
in the teacher fine-tuning and distillation scripts to accommodate lower resource availability.
@@ -31,14 +39,16 @@ Create a pruned and distilled model with NeMo Framework | |||
|
|||
For pruning and distilling the model, you will use the NeMo Framework which is available as a `docker container <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo>`_. | |||
|
|||
``NOTE:`` These notebooks use `NVIDIA TensorRT Model Optimizer <https://github.com/NVIDIA/TensorRT-Model-Optimizer>`_ under the hood for pruning and distillation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise note
NOTE:
These notebooks use the NVIDIA TensorRT Model Optimizer <https://github.com/NVIDIA/TensorRT-Model-Optimizer>
_ under the hood for pruning and distillation.
|
||
This directory contains a list of notebooks which will go over all the steps to create a distilled 4B model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise text
This directory contains a list of notebooks that cover all the steps to create a distilled 4B model.
Results | ||
------------------------------------------------------------------------------ | ||
``NOTE:`` This notebook demonstrates the use of the teacher finetuning, pruning and the distillation script. These scripts should ideally be run on a multi-node cluster with a larger ``GLOBAL_BATCH_SIZE`` and ``STEPS`` to see improvement in the validation loss. | ||
``NOTE:`` This notebook demonstrates the use of the teacher finetuning, pruning and the distillation scripts. These scripts should ideally be run on a multi-node cluster with a larger ``GLOBAL_BATCH_SIZE`` and ``STEPS`` to see improvement in the validation loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
NOTE:
This notebook demonstrates the use of the teacher fine-tuning, pruning, and the distillation scripts. These scripts should ideally be run on a multi-node cluster with a larger GLOBAL_BATCH_SIZE
and STEPS
to see improvement in the validation loss.
|
||
.. figure:: https://github.com/NVIDIA/NeMo/releases/download/r2.0.0rc1/val_loss_distillation.png | ||
Figure 1: Validation Loss Plot when using the depth-pruned model as the student |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization
Figure 1: Validation Loss Plot When Using the Depth-Pruned Model as the Student
Figure 2: Validation Loss Plot when using the width-pruned model as the student |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization
Figure 2: Validation Loss Plot When Using the Width-Pruned Model as the Student
"This demonstration showcases performing pruning and distillation on **Llama 3.1-8B** with the [WikiText-103-v1](https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1) dataset using NeMo Framework. The [WikiText-103-v1](https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1) language modeling dataset is a collection of over 100 million tokens extracted from the set of verified 'Good' and 'Featured' articles on Wikipedia. \n", | ||
"\n", | ||
"For this demonstration, we will perform a light finetuning procedure on the `Meta Llama 3.1 8B` teacher model to generate a finetuned teacher model. This finetuned teacher model will then be trimmed. There are two methods to prune a model: depth-pruning and width-pruning. This workflow will showcase both methods which will yield `4b_depth_pruned_model.nemo` and `4b_width_pruned_model.nemo` respectively, that will serve as a starting point for distillation to the final 4B models. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation and revise paragraph
This tutorial demonstrates how to perform depth-pruning, teacher fine-tuning, and distillation on LLama 3.1 8B using the WikiText-103-v1 dataset with NeMo Framework. The WikiText-103-v1 language modeling dataset comprises over 100 million tokens extracted from verified Good and Featured articles on Wikipedia.
For this demonstration, we will perform teacher correction by running a light fine-tuning procedure on the Meta Llama 3.1 8B
teacher model to generate a fine-tuned teacher model, megatron_llama_ft.nemo, needed for optimal distillation. This fine-tuned teacher model is then trimmed. There are two methods to prune a model: depth-pruning and width-pruning. We will explore both techniques, yielding 4b_depth_pruned_model.nemo
and 4b_width_pruned_model.nemo
, respectively. These models will serve as starting points for distillation to create the final distilled 4B models.
"\n", | ||
"> `NOTE:` Ensure that you run this notebook inside the [NeMo Framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) which has all the required dependencies. \n", | ||
"\n", | ||
"**Instructions are available in the associated tutorial README to download the model and the container.**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise note text and add a link to the README file
"Instructions for downloading the model and the container are available in the README."
"source": [ | ||
"---\n", | ||
"## Prerequisites\n", | ||
"Ensure you have the following -\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise texy
Ensure you meet the prerequisites listed in this section.
"---\n", | ||
"## Prerequisites\n", | ||
"Ensure you have the following -\n", | ||
"1. **Get the teacher model**: Download the `Meta Llama 3.1 8B .nemo` model. You must follow the instructions in the associated README to download and mount the folder to the NeMo FW container." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full NeMo Framework name
"1. **Get the teacher model**: Download the `Meta Llama 3.1 8B .nemo` model. You must follow the instructions in the associated README to download and mount the folder to the NeMo Framework container."
}, | ||
"source": [ | ||
"---\n", | ||
"## Step-by-step instructions\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix capitalization in heading
"## Step-by-Step Instructions\n",
"This workflow is structured into seven notebooks:\n", | ||
"1. [Prepare the dataset](./01_data_preparation.ipynb)\n", | ||
"2. [Finetune the teacher on the dataset](./02_teacher_finetuning.ipynb)\n", | ||
"3. Prune the finetuned-teacher model to create a student \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
"3. Prune the fine-tuned teacher model to create a student\n",
"\n", | ||
"This workflow is structured into seven notebooks:\n", | ||
"1. [Prepare the dataset](./01_data_preparation.ipynb)\n", | ||
"2. [Finetune the teacher on the dataset](./02_teacher_finetuning.ipynb)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
" - 4.b. [Using width-pruned student](./04_b_distilling_width_pruned_student.ipynb)\n", | ||
"5. [Display the validation loss](./05_display_results.ipynb)\n", | ||
"\n", | ||
"> `NOTE:` We are exploring two methods to prune the finetuned teacher model: [depth-pruning](./03_a_depth_pruning.ipynb) and [width-pruning](./03_b_width_pruning.ipynb). Per the [tech report](https://arxiv.org/pdf/2408.11796), we can observe that width-pruning generally outperforms depth-pruning so users can choose to perform either [depth-pruning](./03_a_depth_pruning.ipynb) or [width-pruning](./03_b_width_pruning.ipynb) or both methods." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix punctuation
"> `NOTE:` We are exploring two methods to prune the fine-tuned teacher model: [depth-pruning](./03_a_depth_pruning.ipynb) and [width-pruning](./03_b_width_pruning.ipynb). Per the [tech report](https://arxiv.org/pdf/2408.11796), we can observe that width-pruning generally outperforms depth-pruning so users can choose to perform either [depth-pruning](./03_a_depth_pruning.ipynb) or [width-pruning](./03_b_width_pruning.ipynb) or both methods."
* Timestamps to transcribe (#10950) * inital version Signed-off-by: Nithin Rao Koluguri <nithinraok> * Support for RNNT, TDT, Hybrid Models Signed-off-by: Nithin Rao Koluguri <nithinraok> * move change of decoder stratery from mixin to individual model class Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> * update transcribe_speech.py Signed-off-by: Nithin Rao Koluguri <nithinraok> * uncomment Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> * add docs Signed-off-by: Nithin Rao Koluguri <nithinraok> * fix docs Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> * codeql fixes Signed-off-by: Nithin Rao Koluguri <nithinraok> * unit tests Signed-off-by: Nithin Rao Koluguri <nithinraok> * minor rebase fix Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> * add None case to restore the state set outside using decoding_stratergy() Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> * remove ipdb traces Signed-off-by: Nithin Rao Koluguri <nithinraok> * updates doc for transcription.py Signed-off-by: Nithin Rao Koluguri <nithinraok> * remove preserve alignment for AED models as it doesn;t support it Signed-off-by: Nithin Rao Koluguri <nithinraok> * lint warnings Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <nithinraok@users.noreply.github.com> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 1b8fce7 ! (#11247) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 47ff44e ! (#11254) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Handling tokenizer in PTQ for Nemo 2.0 (#11237) * Handling tokenizer in PTQ for Nemo 2.0 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Print log msg and enable overriding Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Warning for legacy tokenizer config Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Save HF tokenizer to make tokenizer_config.yaml (almost) redundant Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Handle tokenizer in a unified way Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Move saving context within export Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix typo in get_tokenzier Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Reduce diff Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop unused import Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix finetuning datamodule resume (#11187) * fix datamodule resume Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fix subclass Signed-off-by: Chen Cui <chcui@nvidia.com> * docstrings and formats Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> * ci: Move `bump mcore` to templates (#11229) * ci: Move `bump mcore` to templates Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * final Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix: Update baseline (#11205) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Remove deprecated builder_opt param from build command (#11259) Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * chore(beep boop 🤖): Bump `MCORE_TAG=aded519...` (2024-11-12) (#11260) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [Doc fixes] update file names, installation instructions, bad links (#11045) * rename eval_beamsearch_ngram.py to eval_beamsearch_ngram_ctc.py in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * replace out of date installation instructions with pointer to NeMo README installation section Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * point to user guide instead of readme Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * some link updates Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update more links Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> * fix(export): GPT models w/ bias=False convert properly (#11255) Signed-off-by: Terry Kong <terryk@nvidia.com> * ci: Run secrets detector on `pull_request_target` (#11263) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * fix(export): update API for disabling device reassignment in TRTLLM for Aligner (#10863) * fix(export): update API for disabling device reassignment in TRTLLM for Aligner [feat] Upgrade nemo-export path for aligner to TRTLLM-v12 and use python runtime Signed-off-by: Terry Kong <terryk@nvidia.com> fix: forgot to always set _disable_torch_cuda_device_set Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Apply isort and black reformatting Signed-off-by: terrykong <terrykong@users.noreply.github.com> invert torch device set Signed-off-by: Terry Kong <terryk@nvidia.com> * remove comment Signed-off-by: Terry Kong <terryk@nvidia.com> --------- Signed-off-by: Terry Kong <terryk@nvidia.com> * new vfm training features (#11246) Signed-off-by: Zeeshan Patel <zeeshanp@nvidia.com> Co-authored-by: Zeeshan Patel <zeeshanp@nvidia.com> * Update pruning and distillation tutorial notebooks (#11091) * Update pruning and distillation tutorial notebooks Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update batch size in width pruning script Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> --------- Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Beam search algorithm implementation for TDT models (#10903) * initial commit Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add: default beam search implementation Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: changed to removing duplicate hypothesis in separate function Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: changed to cartesian product in choosing best hyp Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: minor fixes in comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add: maes decoding strategy Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add: durations filtering in maes, lm fusion in progress Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: refactored, added comments, command line args, finalized Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: removed prints Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add: docs Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix: minor fix Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: rm beam_size=1 exception, rm duplicates check, fix error handling Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: error handling Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix: removed evaluations file Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rn: blank scoring Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * clean up Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rm: blank scoring and duration beam size Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix: removed durations_beam_size from default beam search Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add: logaddexp Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rm: prefix search Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rn: nested loop over extensions Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: bug with caching Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rm: topk on durations Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add: restored prefix search Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * clean up Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix: fixed comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * refactored duplicate merging Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * changes batch scoring Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * refactored rnnt batch scoring Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * alsd first working Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * refactored Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * clean up Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * remove stacking operations Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fixes im base class Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * clean up Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * default beam search minor fixes Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add test, fix maes timesteps Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rm file Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rm file Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * clean up Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * clean up Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add ngram lm test Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix maes_num_steps=1 Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix kenlm model path Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix kenlm model full path Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * made requested changes Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * merge after isort Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add prints to test Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * add Kenlm to asr requirements Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * remove prints in tests Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add kenlm to test requirements Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm kenlm from link, add package-name Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm second kenlm installation Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * rm kenlm from dependencies make test optional Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix in test Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix in test Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * add comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * add comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * splitted docstrings Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * add comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * splitted docstrings Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * add comments Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fixes to python3 type annotations Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * merging Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * merging Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix in return type Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * fix test Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> * rm time_idx Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> * fix comments to python3 style Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> --------- Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> Co-authored-by: lilithgrigoryan <lgrigoryan@nvidia.com> Co-authored-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update nemo1->2 conversion according to changes in main (#11253) * update nemo1->2 conversion according to changes in main Signed-off-by: Huiying Li <willwin.lee@gmail.com> * Apply isort and black reformatting Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com> * format fix Signed-off-by: Huiying Li <willwin.lee@gmail.com> * add docstrings Signed-off-by: Huiying Li <willwin.lee@gmail.com> --------- Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com> Co-authored-by: HuiyingLi <HuiyingLi@users.noreply.github.com> * Add llama 3.1 recipes (#11273) * add llama 3.1 recipes Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fix pylint Signed-off-by: Chen Cui <chcui@nvidia.com> * Fix llama3.1 wrong config in io.json --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Ao Tang <aot@nvidia.com> * Fix Finetune Recipe (#11267) * Fix Starcoder_15 SFT recipe * Fix PP type SFT recipe * Fix PP type SFT recipe * Fix Gemma2b SFT TP=1 * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * remove pp dtype * remove pp dtype * Configure no restart validation loop in nl.Trainer (#11029) * Configure no restart validation loop in nl.Trainer Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> * Skip validation whenever restarting=True Signed-off-by: Hemil Desai <hemild@nvidia.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> * Handle _io_unflatten_object when _thread_local.output_dir is not available (#11199) Signed-off-by: Hemil Desai <hemild@nvidia.com> * change default ckpt name (#11277) Signed-off-by: Maanu Grover <maanug@nvidia.com> * Use MegatronDataSampler in HfDatasetDataModule (#11274) * Use MegatronDataSampler in HfDataset Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Remove opencc upperbound (#10909) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Zeeshan Patel <zeeshanp@nvidia.com> Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com> Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: nithinraok <nithinraok@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Zeeshan Patel <zeeshanp@nvidia.com> Co-authored-by: gvenkatakris <gvenkatakris@nvidia.com> Co-authored-by: lilithgrigoryan <38436437+lilithgrigoryan@users.noreply.github.com> Co-authored-by: lilithgrigoryan <lgrigoryan@nvidia.com> Co-authored-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: HuiyingLi <HuiyingLi@users.noreply.github.com> Co-authored-by: Ao Tang <aot@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
* Update pruning and distillation tutorial notebooks Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update batch size in width pruning script Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> --------- Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
* Update pruning and distillation tutorial notebooks Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update batch size in width pruning script Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> --------- Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
* Update pruning and distillation tutorial notebooks Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update batch size in width pruning script Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> * Update README Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com> --------- Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
What does this PR do ?
Updating pruning and distillation notebooks
width-pruning notebook added