Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR Confidence update and tutorial #6810

Merged
merged 18 commits into from
Jul 15, 2023
Merged

Conversation

GNroy
Copy link
Collaborator

@GNroy GNroy commented Jun 4, 2023

What does this PR do ?

Minor fixes, renames, tests and tutorial.

Collection: ASR

Changelog

  • Remames: renui -> renyi, method -> measure.
  • Some classes and methods are moved to separate files.
  • Tests added.
  • Tutorial added.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

@Kipok @titu1994

@GNroy GNroy requested a review from Kipok June 4, 2023 16:42
@github-actions github-actions bot added the ASR label Jun 4, 2023
Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there is a lot of renaming on the level of config parameters and api variables. These are breaking changes that will likely result in old models not being loadable by default anymore. So I'm not sure if we can afford to do that renaming. If you want to do that, you'd probably have to add some logic to accept both old and new names and raise a deprecation warning/automatically rename old->new. Maybe @titu1994 can add more information on this topic

@GNroy
Copy link
Collaborator Author

GNroy commented Jun 5, 2023

It seems like there is a lot of renaming on the level of config parameters and api variables. These are breaking changes that will likely result in old models not being loadable by default anymore. So I'm not sure if we can afford to do that renaming. If you want to do that, you'd probably have to add some logic to accept both old and new names and raise a deprecation warning/automatically rename old->new. Maybe @titu1994 can add more information on this topic

@Kipok Could you point me to such models so I can check if these changes are really breaking?

@Kipok
Copy link
Collaborator

Kipok commented Jun 5, 2023

@Kipok Could you point me to such models so I can check if these changes are really breaking?

Can you try with any model that specified non default confidence config parameters? E.g., it seems that the code should fail if it sees the following in the config:

confidence_cfg:
  method_cfg:
    ...

because you construct ConfidenceConfig that does not take method_cfg parameter anymore

@Kipok
Copy link
Collaborator

Kipok commented Jun 5, 2023

I guess the code might work fine for models that didn't change confidence, because it's an optional parameter. If we expect that users didn't yet use this feature, maybe it's ok to make these changes, @titu1994?

@GNroy
Copy link
Collaborator Author

GNroy commented Jun 5, 2023

@Kipok Could you point me to such models so I can check if these changes are really breaking?

Can you try with any model that specified non default confidence config parameters? E.g., it seems that the code should fail if it sees the following in the config:

confidence_cfg:
  method_cfg:
    ...

because you construct ConfidenceConfig that does not take method_cfg parameter anymore

Sorry, but I don't know of any such model. If you have a model with a non default confidence config, I'd like you to give it to me please.

@Kipok
Copy link
Collaborator

Kipok commented Jun 5, 2023

Sorry, but I don't know of any such model. If you have a model with a non default confidence config, I'd like you to give it to me please.

You can simply take any existing model and change the config to make it have non-default confidence for the test purpose. If it breaks, this means that any users who might have used non-default confidence will have errors after your changes. Since we supported confidence for >6 months, I imagine someone might have tried to use it. If you think that's not the case, I don't mind merging this. It's still technically a breaking change, though, so I also want to hear what Som thinks (asked in a comment above).

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not gone through the tutorial, @Kipok please review that in detail.

No backward compatibility breaking changes are allowed.

Every single config change done in this pr needs to do cfg.get("new_key", cfg.get("old_key", default)) pattern for all access from old config to new config.

Dataclass cannot remove any argument, only add. So if you have to deprecate arguments - say temperature.
Add alpha to config with default value.
Set temperature value to a sentinel default (not None) - str "UNUSED" or something like that

Check if temperature is not sentinal value, warn that arg is deprecated and that will be ignored.

Use only alpha value c

@@ -141,6 +145,68 @@ def word_error_rate_detail(
return wer, words, ins_rate, del_rate, sub_rate


def word_error_rate_per_utt(hypotheses: List[str], references: List[str], use_cer=False) -> Tuple[List[float], float]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for this in wer tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

name: str = "entropy"
entropy_type: str = "tsallis"
temperature: float = 0.33
alpha: float = 0.33
entropy_norm: str = "exp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert. You can add alpha, you cannot remove temperature. Dataclass can only add config items not remove them otherwise they break model loading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



def auc_yc(y_true, y_score, return_std_maximum=False, return_curve=False, n_bins=100):
def auc_yc(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for all metrics

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GNroy GNroy requested a review from Kipok June 6, 2023 17:52
Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the tutorial and added detailed comments. About backward compatibility, I think the current code still is not fully backward compatible, e.g., should ctc/rnnt decoding configs also have deprecated parameters? In general, I'd suggest to remove renaming of "method"->"measure", since it does not change the meaning, but keep renaming of "alpha" -> "temperature". That will allow to remove most of the renaming refactoring. I didn't thoroughly review the new changes about deprecated parameters, though, so I'll let @titu1994 comment on that part.

@@ -0,0 +1 @@
{"cells":[{"cell_type":"code","execution_count":null,"id":"1a0f93c6","metadata":{"id":"1a0f93c6"},"outputs":[],"source":["BRANCH = 'main'\n","\n","\"\"\"\n","You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n","\n","Instructions for setting up Colab are as follows:\n","1. Open a new Python 3 notebook.\n","2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n","3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n","4. Run this cell to set up dependencies.\n","\"\"\""]},{"cell_type":"code","execution_count":null,"id":"ffdfe626","metadata":{"id":"ffdfe626"},"outputs":[],"source":["import os\n","# either provide a path to local NeMo repository with NeMo already installed or git clone\n","\n","# option #1: local path to NeMo repo with NeMo already installed\n","NEMO_DIR_PATH = os.path.dirname(os.path.dirname(os.path.abspath('')))\n","\n","# option #2: download NeMo repo\n","if 'google.colab' in str(get_ipython()) or not os.path.exists(os.path.join(NEMO_DIR_PATH, \"nemo\")):\n"," ## Install dependencies\n"," !apt-get install sox libsndfile1 ffmpeg\n","\n"," !git clone -b $BRANCH https://github.com/NVIDIA/NeMo\n"," %cd NeMo\n"," !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\n"," NEMO_DIR_PATH = os.path.abspath('')"]},{"cell_type":"markdown","id":"bcc3e593","metadata":{"id":"bcc3e593"},"source":["# 1. Introduction to ASR confidence estimation\n","Confidence estimation is a crucial yet sometimes overlooked aspect of automatic speech recognition (ASR) systems. Confidence estimation for ASR is the process of estimating the rate of reliability of the output generated by an ASR system. For an output transcription, confidence estimation answers the question \"how accurate this transcription is\", or \"how likely this transcription is correct\".\n","\n","Confidence score is the result of confidence estimation. It lies in range from 0 to 1, where zero signals that the confidence estimator is completely unsure, and one indicates that the estimator is confident in the output. Confidence scores are often used to guide downstream processing in ASR applications. For example, in a voice dictation application, a low confidence score could trigger the system to ask the user to repeat the input or to suggest alternative transcriptions.\n","\n","There are several approaches to confidence estimation in ASR, including:\n","\n","1. Acoustic modeling-based methods: These methods use the acoustic model scores to estimate the confidence score. The acoustic model represents the relationship between the acoustic signal and the corresponding linguistic units, and the score reflects the similarity between the observed signal and the predicted model output. Here, the acoustic model can be the ASR model itself (non-trainable methods), or a trainable external estimator, accepting acoustic features or output probabilities and predicting confidence scores.\n","\n","2. Language modeling-based methods: These methods use the language model scores to estimate the confidence score. The language model represents the probability distribution of the sequence of words, and the score reflects the likelihood of the transcription given the language model. \n","\n","3. Combination methods: These methods combine the scores from both the acoustic and language models to estimate the confidence score. This approach can leverage the strengths of both models to achieve more accurate confidence scores.\n","\n","In this introductory tutorial we will cover only the non-trainable acoustic-based methods."]},{"cell_type":"markdown","id":"59100fb9","metadata":{"id":"59100fb9"},"source":["## 1.1. Optional resources\n","This tutorial is self-contained, but if you want to dive deeper into the topic, you can check out these resources:\n","* Paper behind this tutorial: https://arxiv.org/abs/2212.08703\n","* Supplementary blog on how and why confidence estimation methods of this tutorial were developed: https://developer.nvidia.com/blog/entropy-based-methods-for-word-level-asr-confidence-estimation/"]},{"cell_type":"markdown","id":"cd7226c5","metadata":{"id":"cd7226c5"},"source":["# 2. Data Download\n","First, let's download audio and text data. Here we will use LibriSpeech *dev-other* and *test-other*."]},{"cell_type":"code","execution_count":null,"id":"fd542e62","metadata":{"id":"fd542e62"},"outputs":[],"source":["## create data directory and download an audio file\n","WORK_DIR = 'WORK_DIR'\n","DATA_DIR = WORK_DIR + '/DATA'\n","os.makedirs(DATA_DIR, exist_ok=True)\n","\n","print('downloading audio data...')\n","!python ./scripts/dataset_processing/get_librispeech_data.py --data_root=$DATA_DIR --data_set=dev_other,test_other\n","!rm $DATA_DIR/dev_other.tar.gz $DATA_DIR/test_other.tar.gz"]},{"cell_type":"markdown","id":"383eee71","metadata":{"id":"383eee71"},"source":["# 3. Confidence estimation example\n","Let's see how confidence scores can be obtained with NeMo models."]},{"cell_type":"markdown","id":"7c7c0170","metadata":{"id":"7c7c0170"},"source":["## 3.1. Helper functions\n","The following functions are to pretty-print confidence scores for word-level ASR hypotheses."]},{"cell_type":"code","execution_count":null,"id":"20cf0b38","metadata":{"id":"20cf0b38"},"outputs":[],"source":["import json\n","import os\n","from termcolor import colored\n","from typing import List, Optional, Tuple, Union\n","\n","from IPython.display import Audio, HTML, Image, display\n","import numpy as np\n","import texterrors\n","\n","def get_detailed_wer_labels(ref: List[str], hyp: List[str], return_eps_padded_hyp: bool = False):\n"," \"\"\"Get detailed WER labels, aligning reference with hypothesis.\n"," \n"," Possible WER labels:\n"," - 'C' for Correct,\n"," - 'I' for Insertion,\n"," - 'D' for Deletion,\n"," - 'S' for Substitution.\n","\n"," Returns:\n"," WER labels list.\n"," [Optional] Epsilin-padded hypothesis if return_eps_padded_hyp set to True.\n"," \"\"\"\n","\n"," # Align reference and hypothesis using \"<eps>\"\n"," aligned_ref, aligned_hyp = texterrors.align_texts(ref, hyp, False)[:-1]\n","\n"," # Determine labels\n"," labels = []\n"," for r, h in zip(aligned_ref, aligned_hyp):\n"," if r == h:\n"," labels.append(\"C\")\n"," elif r == \"<eps>\":\n"," labels.append(\"I\")\n"," elif h == \"<eps>\":\n"," labels.append(\"D\")\n"," else:\n"," labels.append(\"S\")\n","\n"," return labels if not return_eps_padded_hyp else labels, aligned_hyp\n","\n","\n","def fill_confidence_deletions(confidence_scores: List[float], labels: List[str], fill_value: float = 0.0):\n"," \"\"\"Fill confidence scores list with the provided value for deletions.\n"," Assumes that we have no natural confidence scores for deletions.\n"," \n"," Returns:\n"," Confidence scores list with deletion scores.\n"," \"\"\"\n","\n"," assert len(confidence_scores) <= len(labels)\n","\n"," # If the lengths of confidence_scores and labels are equal, then we assume that there are no deletions\n"," if len(confidence_scores) == len(labels):\n"," return confidence_scores\n","\n"," # Insert fill_value into confidence_scores where label == \"D\"\n"," new_confidence_scores = []\n"," score_index = 0\n"," for label in labels:\n"," if label == \"D\":\n"," new_confidence_scores.append(fill_value)\n"," else:\n"," new_confidence_scores.append(confidence_scores[score_index])\n"," score_index += 1\n"," return new_confidence_scores\n","\n","\n","def pretty_pad_word_labels(labels: List[str], words: List[str]):\n"," \"\"\"Pad word labels with dash for pretty printing.\n"," Expects labels and words to have the same length.\n"," \n"," Returns:\n"," Padded labels list.\n"," \"\"\"\n"," \n"," # Check that words and labels without 'D' have the same length\n"," assert len(words) == len(labels)\n","\n"," # Pad the labels with dashes to align them with the words\n"," padded_labels = []\n"," for word, label in zip(words, labels):\n"," label_len = len(word)\n"," left_padding = (label_len - 1) // 2\n"," right_padding = label_len - left_padding - 1\n"," padded_label = \"-\" * left_padding + label + \"-\" * right_padding\n"," padded_labels.append(padded_label)\n","\n"," return padded_labels\n","\n","\n","def pretty_print_transcript_with_confidence(\n"," transcript: str,\n"," confidence_scores: List[float],\n"," threshold: float,\n"," reference: Optional[str] = None,\n"," terminal_width: int = 120,\n","):\n"," with_labels = reference is not None\n"," shade_if_low_confidence = lambda x, y: colored(x, 'light_grey') if y < threshold else x\n"," transcript_list = transcript.split()\n"," output_lines = []\n"," if with_labels:\n"," reference_list = reference.split()\n"," labels, eps_padded_hyp = get_detailed_wer_labels(reference_list, transcript_list, True)\n"," padded_labels = pretty_pad_word_labels(labels, eps_padded_hyp)\n"," current_line_len = 0\n"," current_word_line = \"\"\n"," current_label_line = \"\"\n"," for word, label, padded_label, score in zip(\n"," eps_padded_hyp, labels, padded_labels, fill_confidence_deletions(confidence_scores, labels)\n"," ):\n"," word_len = len(word)\n"," if current_line_len + word_len + 1 <= terminal_width:\n"," if current_line_len > 0:\n"," current_line_len += 1\n"," current_word_line += \" \"\n"," current_label_line += \"-\"\n"," current_line_len += word_len\n"," current_word_line += shade_if_low_confidence(word, score)\n"," current_label_line += padded_label\n"," else:\n"," output_lines.append(current_word_line + \"\\n\" + current_label_line)\n"," current_line_len = word_len\n"," current_word_line = shade_if_low_confidence(word, score)\n"," current_label_line = padded_label\n"," if current_word_line:\n"," output_lines.append(current_word_line + \"\\n\" + current_label_line)\n"," else:\n"," current_line_len = 0\n"," current_word_line = \"\"\n"," for word, score in zip(transcript_list, confidence_scores):\n"," word_len = len(word)\n"," if current_line_len + word_len + 1 <= terminal_width:\n"," if current_line_len > 0:\n"," current_line_len += 1\n"," current_word_line += \" \"\n"," current_line_len += word_len\n"," current_word_line += shade_if_low_confidence(word, score)\n"," else:\n"," output_lines.append(current_word_line)\n"," current_line_len = word_len\n"," current_word_line = shade_if_low_confidence(word, score)\n"," if current_word_line:\n"," output_lines.append(current_word_line)\n","\n"," print(\"\\n\".join(output_lines))\n","\n","\n","def _html_paint_word_grey(word: str, shade: str):\n"," if shade == \"black\":\n"," color = \"0,0,0\"\n"," elif shade == \"grey\":\n"," color = \"150,150,150\"\n"," elif shade == \"light_grey\":\n"," color = \"200,200,200\"\n"," else:\n"," raise ValueError(\n"," f\"`shade` has to be one of the following: `black`, `grey`, `light_grey`. Provided: `{shade}`\"\n"," )\n"," return f'<mark style=\"color:rgb({color});background-color:rgb(255,255,255);\"><font size=\"3\">{word}</font></mark>'\n","\n","\n","def html_paint_transcript_with_confidence(\n"," transcript: str,\n"," confidence_scores: List[float],\n"," score_mean_global: float,\n"," score_std_global: float,\n","):\n"," s_list = []\n"," for word, score in zip(transcript.split(), confidence_scores):\n"," if score >= score_mean_global + score_std_global:\n"," shade = \"black\"\n"," elif score < score_mean_global - score_std_global:\n"," shade = \"light_grey\"\n"," else:\n"," shade = \"grey\"\n"," s_list.append(_html_paint_word_grey(word, shade))\n"," display(HTML('<p>'+ \" \".join(s_list) + '</p>'))"]},{"cell_type":"markdown","id":"dec57a27","metadata":{"id":"dec57a27"},"source":["## 3.2. Data and model loading\n","This tutorial uses CTC and RNN-T Conformer models trained on LibriSpeech.\n","\n","You can try to use other pre-trained models as well."]},{"cell_type":"code","execution_count":null,"id":"b66c60a3","metadata":{"id":"b66c60a3"},"outputs":[],"source":["from dataclasses import dataclass\n","from omegaconf import DictConfig, OmegaConf\n","\n","from nemo.collections.asr.models import ASRModel\n","\n","def load_model(name: str):\n"," \"\"\"Load a pre-trained model.\n","\n"," Args:\n"," name: Pre-trained model name.\n"," Reserved names:\n"," - 'ctc' for 'stt_en_conformer_ctc_large_ls'\n"," - 'rnnt' for 'stt_en_conformer_transducer_large_ls'\n","\n"," Returns:\n"," A model loaded into GPU with .eval() mode set.\n"," \"\"\"\n"," if name == \"ctc\":\n"," name = \"stt_en_conformer_ctc_large_ls\"\n"," elif name == \"rnnt\":\n"," name = \"stt_en_conformer_transducer_large_ls\"\n","\n"," model = ASRModel.from_pretrained(model_name=name, map_location=\"cuda:0\")\n"," model.eval()\n","\n"," return model\n","\n","@dataclass\n","class TestSet:\n"," filepaths: List[str]\n"," reference_texts: List[str]\n"," durations: List[float]\n","\n","def load_data(manifest_path: str):\n"," filepaths = []\n"," reference_texts = []\n"," durations = []\n"," with open(manifest_path, \"r\") as f:\n"," for line in f:\n"," item = json.loads(line)\n"," audio_file = item[\"audio_filepath\"]\n"," filepaths.append(str(audio_file))\n"," text = item[\"text\"]\n"," reference_texts.append(text)\n"," durations.append(float(item[\"duration\"]))\n"," return TestSet(filepaths, reference_texts, durations)\n","\n","TEST_MANIFESTS = {\n"," \"dev_other\": DATA_DIR + \"/dev_other.json\",\n"," \"test_other\": DATA_DIR + \"/test_other.json\",\n","}\n","\n","\n","# Load data\n","test_sets = {manifest: load_data(path) for manifest, path in TEST_MANIFESTS.items()}\n","\n","# Load model\n","is_rnnt = False\n","# is_rnnt = True\n","\n","model = load_model(\"rnnt\" if is_rnnt else \"ctc\")"]},{"cell_type":"markdown","id":"88c3d7ee","metadata":{"id":"88c3d7ee"},"source":["## 3.3. Setting up confidence estimation\n","To set up confidence estimation for NeMo ASR models, you need to:\n","1. Initialize _ConfidenceConfig_\n","2. Put the created _ConfidenceConfig_ into the model decoding config.\n","\n","The folloving cell contains an example of _ConfidenceConfig_ initialization and updating the the model's decoding config.\n","\n","For the _ConfidenceConfig_ there are also listed possible values for its parameters.\n","\n","Note that only `strategy=\"greedy\"` (or `greedy_batch` for RNN-T) supports computing confidence scores."]},{"cell_type":"code","execution_count":null,"id":"078005f1","metadata":{"id":"078005f1"},"outputs":[],"source":["from nemo.collections.asr.metrics.rnnt_wer import RNNTDecodingConfig\n","from nemo.collections.asr.metrics.wer import CTCDecodingConfig\n","from nemo.collections.asr.parts.utils.asr_confidence_utils import (\n"," ConfidenceConfig,\n"," ConfidenceConstants,\n"," ConfidenceMeasureConfig,\n"," ConfidenceMeasureConstants,\n",")\n","from nemo.collections.asr.parts.utils.asr_confidence_benchmarking_utils import (\n"," apply_confidence_parameters,\n"," get_correct_marks,\n"," get_token_targets_with_confidence,\n"," get_word_targets_with_confidence,\n",")\n","\n","\n","# List allowed options for ConfidenceMeasureConfig and ConfidenceConfig\n","print(f\"Allowed options for ConfidenceMeasureConfig: {ConfidenceMeasureConstants.print()}\\n\")\n","print(f\"Allowed options for ConfidenceConfig: {ConfidenceConstants.print()}\\n\")\n","\n","# Initialize ConfidenceConfig and ConfidenceMeasureConfig\n","confidence_cfg = ConfidenceConfig(\n"," preserve_frame_confidence=True, # Internally set to true if preserve_token_confidence == True\n"," # or preserve_word_confidence == True\n"," preserve_token_confidence=True, # Internally set to true if preserve_word_confidence == True\n"," preserve_word_confidence=True,\n"," aggregation=\"prod\", # How to aggregate frame scores to token scores and token scores to word scores\n"," exclude_blank=False, # If true, only non-blank emissions contribute to confidence scores\n"," measure_cfg=ConfidenceMeasureConfig( # Config for per-frame scores calculation (before aggregation)\n"," name=\"max_prob\", # Or \"entropy\"\n"," entropy_type=\"gibbs\", # Used only for name == \"entropy\"\n"," alpha=0.5, # Low values (<1) increase sensitivity, high values decrease sensitivity\n"," entropy_norm=\"lin\" # How to normalize (map to [0,1]) entropy\n"," )\n",")\n","\n","# Alternalively, look at ConfidenceConfig's docstring\n","print(f\"More info on ConfidenceConfig here:\\n{ConfidenceConfig().__doc__}\\n\")\n","\n","# Put the created ConfidenceConfig into the model decoding config via .change_decoding_strategy()\n","model.change_decoding_strategy(\n"," RNNTDecodingConfig(fused_batch_size=-1, strategy=\"greedy_batch\", confidence_cfg=confidence_cfg)\n"," if is_rnnt\n"," else CTCDecodingConfig(confidence_cfg=confidence_cfg)\n",")"]},{"cell_type":"markdown","id":"efe0baea","metadata":{"id":"efe0baea"},"source":["## 3.4. Decode test set and get transcriptions with confidence scores\n","Let's transcribe Librispeech _test-other_ and see what confidence scores are inside."]},{"cell_type":"code","execution_count":null,"id":"ccd8d0de","metadata":{"id":"ccd8d0de"},"outputs":[],"source":["# current_test_set = test_sets[\"dev_other\"]\n","current_test_set = test_sets[\"test_other\"]\n","transcriptions = model.transcribe(paths2audio_files=current_test_set.filepaths, batch_size=16, return_hypotheses=True, num_workers=4)\n","if is_rnnt:\n"," transcriptions = transcriptions[0]"]},{"cell_type":"markdown","id":"0500514e","metadata":{"id":"0500514e"},"source":["For a transcribed hypothesis, there can be `frame_confidence` and aggregated from them `token_confidence` and `word_confidence`."]},{"cell_type":"code","execution_count":null,"id":"98035fd2","metadata":{"id":"98035fd2"},"outputs":[],"source":["tran = transcriptions[0]\n","print(\n"," f\"\"\" Recognized text: `{tran.text}`\\n\n"," Word confidence: {[round(c, 3) for c in tran.word_confidence]}\\n\n"," Token confidence: {[round(c, 3) for c in tran.token_confidence]}\\n\n"," Frame confidence: {[round(c, 3) for c in tran.frame_confidence]}\"\"\"\n",")"]},{"cell_type":"markdown","id":"9613bfc1","metadata":{"id":"9613bfc1"},"source":["Now let's draw the recognition results highlighted according to their confidence scores.\n","\n","There are two options: plain text with WER labels, and HTML without WER labels but with three colors."]},{"cell_type":"code","execution_count":null,"id":"a83295ff","metadata":{"id":"a83295ff"},"outputs":[],"source":["from nemo.collections.asr.metrics.wer import word_error_rate, word_error_rate_detail, word_error_rate_per_utt\n","\n","def show_dataset_with_confidence(\n"," sorted_indices, transcriptions, test_set, threshold, html_show=False, min_dur_to_show=0.0, utt_to_show=10\n","):\n"," confidence_scores = np.array([c for t in transcriptions for c in t.word_confidence])\n"," score_std = confidence_scores.std()\n"," utt_shown = 0\n"," for i, value in sorted_indices:\n"," if utt_shown >= utt_to_show:\n"," break\n"," if test_set.durations[i] >= min_dur_to_show:\n"," print(\"=\"*120)\n"," hyp = transcriptions[i].text\n"," scores = transcriptions[i].word_confidence\n"," if html_show:\n"," html_paint_transcript_with_confidence(hyp, scores, threshold, score_std)\n"," else:\n"," ref = test_set.reference_texts[i]\n"," pretty_print_transcript_with_confidence(hyp, scores, threshold, ref)\n"," utt_shown += 1\n","\n","\n","# you can play with these parameters\n","threshold = 0.52\n","# in colab, you may want to use `html_show = True` as non-html colorion does't work in colab\n","html_show = False\n","min_dur_to_show = 4.0\n","utt_to_show = 10\n","\n","wer_per_utt, avg_wer = word_error_rate_per_utt([h.text for h in transcriptions], current_test_set.reference_texts)\n","sorted_wer_indices = sorted(enumerate(wer_per_utt), key=lambda x: x[1])[::-1]\n","\n","show_dataset_with_confidence(\n"," sorted_indices=sorted_wer_indices,\n"," transcriptions=transcriptions,\n"," test_set=current_test_set,\n"," threshold=threshold,\n"," html_show=html_show,\n"," min_dur_to_show=min_dur_to_show,\n"," utt_to_show=utt_to_show\n",")"]},{"cell_type":"markdown","id":"dbfcb2da","metadata":{"id":"dbfcb2da"},"source":["## 3.5. Confidence metrics\n","\n","There are several metrics to evaluate the effectiveness of a confidence estimation method. Some of them confidence estimation as a binary classification task. Other measure how close the correct word confidence scores are to $1.0$ and the incorrect word scores are to $0.0$.\n","\n","Some of them are:\n","1. Area Under the Receiver Operating Characteristics Curve ($\\mathrm{AUC}_\\mathrm{ROC}$): class separability metric.\n","2. Area Under the Precision-Recall Curve ($\\mathrm{AUC}_\\mathrm{PR}$): how well the correct words are detected.\n","3. Area Under the Negative Predictive Value vs. True Negative Rate Curve ($\\mathrm{AUC}_\\mathrm{NT}$): how well the incorrect words are detected ($\\mathrm{AUC}_\\mathrm{PR}$ in which errors are treated as positives).\n","4. Normalized Cross Entropy ($\\mathrm{NCE}$): how close of confidence for correct predictions to $1.0$ and of incorrect predictions to $0.0$. It ranges from $-\\infty$ to $1.0$, with negative scores indicating that the confidence method performs worse than the setting confidence score to $1-\\mathrm{WER}$. This metric is also known as Normalized Mutual Information.\n","5. Expected Calibration Error ($\\mathrm{ECE}$): a weighted average over the absolute accuracy/confidence difference. It ranges from $0.0$ to $1.0$ with the best value $0.0$.\n","\n","Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be condsidered. They are:\n","1. Area Under the Youden's curve ($\\mathrm{AUC}_\\mathrm{YC}$): the rate of the effective threshold range (i.e. the adjustability or responsiveness). It ranges from $0.0$ to $1.0$ with the best value $0.5$.\n","2. Maximum of the Youden's curve $\\mathrm{MAX}_\\mathrm{YC}$: the optimal $\\mathrm{TNR}$ vs. $\\mathrm{FNR}$ tradeoff. It's unnormalized version can be used as a criterion for selecting the optimal $\\tau$. It ranges from $0.0$ to $1.0$ with the best value $1.0$.\n","3. The standard deviation of the Youden's curve values ($\\mathrm{STD}_\\mathrm{YC}$): indicates that $\\mathrm{TNR}$ and $\\mathrm{FNR}$ increase at different rates (viz. $\\mathrm{TNR}$ grows faster) as the $\\tau$ increases. It ranges from $0.0$ to $0.5$ with the best value around $0.25$.\n","\n","Let's see how well our confidence performs according to the metrcis above."]},{"cell_type":"code","execution_count":null,"id":"5d152775","metadata":{"id":"5d152775"},"outputs":[],"source":["from nemo.collections.asr.parts.utils.confidence_metrics import (\n"," auc_nt,\n"," auc_pr,\n"," auc_roc,\n"," auc_yc,\n"," ece,\n"," nce,\n"," save_confidence_hist,\n"," save_custom_confidence_curve,\n"," save_nt_curve,\n"," save_pr_curve,\n"," save_roc_curve,\n",")\n","\n","\n","targets_with_confidence = [get_word_targets_with_confidence(tran) for tran in transcriptions]\n","correct_marks = [get_correct_marks(r.split(), h.words) for r, h in zip(current_test_set.reference_texts, transcriptions)]\n","\n","y_true, y_score = np.array(\n"," [[f, p[1]] for cm, twc in zip(correct_marks, targets_with_confidence) for f, p in zip(cm, twc)]\n",").T\n","\n","\n","# output scheme: yc.mean(), yc.max(), yc.std() or yc.mean(), yc.max(), yc.std(), (thresholds, yc)\n","result_yc = auc_yc(y_true, y_score, return_std_maximum=True, return_curve=True)\n","# output scheme: ece or ece, (thresholds, ece_curve)\n","results_ece = ece(y_true, y_score, return_curve=True)\n","results = [\n"," auc_roc(y_true, y_score),\n"," auc_pr(y_true, y_score),\n"," auc_nt(y_true, y_score),\n"," nce(y_true, y_score),\n"," results_ece[0],\n","] + list(result_yc[:3])\n","\n","print(\n"," f\"\"\" AUC_ROC:\\t{results[0]:.5f}\n"," AUC_PR:\\t{results[1]:.5f}\n"," AUC_NT:\\t{results[2]:.5f}\n"," NCE:\\t{results[3]:.5f}\n"," ECE:\\t{results[4]:.5f}\n"," AUC_YC:\\t{results[5]:.5f}\n"," MAX_YC:\\t{results[7]:.5f}\n"," STD_YC:\\t{results[6]:.5f}\n"," \"\"\"\n",")"]},{"cell_type":"markdown","id":"4159034d","metadata":{"id":"4159034d"},"source":["Confidence metrics for the maximum probability confidence are not that great.\n","\n","Let's re-run and benchmark confidence estimation with the default confidence estimator."]},{"cell_type":"code","execution_count":null,"id":"d2e16f5f","metadata":{"id":"d2e16f5f"},"outputs":[],"source":["confidence_cfg = ConfidenceConfig(\n"," preserve_word_confidence=True,\n"," preserve_token_confidence=True,\n",")\n","\n","model.change_decoding_strategy(\n"," RNNTDecodingConfig(fused_batch_size=-1, strategy=\"greedy_batch\", confidence_cfg=confidence_cfg)\n"," if is_rnnt\n"," else CTCDecodingConfig(confidence_cfg=confidence_cfg)\n",")\n","\n","transcriptions = model.transcribe(paths2audio_files=current_test_set.filepaths, batch_size=16, return_hypotheses=True, num_workers=4)\n","if is_rnnt:\n"," transcriptions = transcriptions[0]"]},{"cell_type":"code","execution_count":null,"id":"6201ea4d","metadata":{"id":"6201ea4d"},"outputs":[],"source":["targets_with_confidence = [get_word_targets_with_confidence(tran) for tran in transcriptions]\n","correct_marks = [get_correct_marks(r.split(), h.words) for r, h in zip(current_test_set.reference_texts, transcriptions)]\n","\n","y_true, y_score = np.array(\n"," [[f, p[1]] for cm, twc in zip(correct_marks, targets_with_confidence) for f, p in zip(cm, twc)]\n",").T\n","\n","result_yc = auc_yc(y_true, y_score, return_std_maximum=True, return_curve=True)\n","results_ece = ece(y_true, y_score, return_curve=True)\n","results = [\n"," auc_roc(y_true, y_score),\n"," auc_pr(y_true, y_score),\n"," auc_nt(y_true, y_score),\n"," nce(y_true, y_score),\n"," results_ece[0],\n","] + list(result_yc[:3])\n","\n","print(\n"," f\"\"\" AUC_ROC:\\t{results[0]:.5f}\n"," AUC_PR:\\t{results[1]:.5f}\n"," AUC_NT:\\t{results[2]:.5f}\n"," NCE:\\t{results[3]:.5f}\n"," ECE:\\t{results[4]:.5f}\n"," AUC_YC:\\t{results[5]:.5f}\n"," MAX_YC:\\t{results[7]:.5f}\n"," STD_YC:\\t{results[6]:.5f}\n"," \"\"\"\n",")"]},{"cell_type":"markdown","id":"498e03d0","metadata":{"id":"498e03d0"},"source":["Note that despite the overall improvement, $NCE$ and $ECE$ have gotten worse. This is due to class imbalance caused by low WER."]},{"cell_type":"markdown","id":"45856cba","metadata":{"id":"45856cba"},"source":["Finally, let's draw curves for our metrics as well as histograms of correctly and incorrectly recognized words."]},{"cell_type":"code","execution_count":null,"id":"ff049043","metadata":{"id":"ff049043"},"outputs":[],"source":["from tempfile import TemporaryDirectory\n","\n","\n","plot_dir = TemporaryDirectory()\n","os.makedirs(plot_dir.name, exist_ok=True)\n","\n","mask_correct = y_true == 1\n","y_score_correct = y_score[mask_correct]\n","y_score_incorrect = y_score[~mask_correct]\n","\n","# histogram of the correct distribution\n","save_confidence_hist(y_score_correct, plot_dir.name, \"hist_correct\")\n","# histogram of the incorrect distribution\n","save_confidence_hist(y_score_incorrect, plot_dir.name, \"hist_incorrect\")\n","# AUC-ROC curve\n","save_roc_curve(y_true, y_score, plot_dir.name, \"roc\")\n","# AUC-PR curve\n","save_pr_curve(y_true, y_score, plot_dir.name, \"pr\")\n","# AUC-NT curve\n","save_nt_curve(y_true, y_score, plot_dir.name, \"nt\")\n","# ECE curve\n","ece_thresholds, ece_values = results_ece[-1]\n","ece_values /= max(ece_values)\n","save_custom_confidence_curve(ece_thresholds, ece_values, plot_dir.name, \"ece\")\n","# AUC-YC curve\n","yc_thresholds, yc_values = result_yc[-1]\n","save_custom_confidence_curve(yc_thresholds, yc_values, plot_dir.name, \"yc\")\n","\n","\n","display(\n"," Image(filename=os.path.join(plot_dir.name, \"hist_correct.png\"), retina=True),\n"," Image(filename=os.path.join(plot_dir.name, \"hist_incorrect.png\"), retina=True),\n"," Image(filename=os.path.join(plot_dir.name, \"roc.png\"), retina=True),\n"," Image(filename=os.path.join(plot_dir.name, \"pr.png\"), retina=True),\n"," Image(filename=os.path.join(plot_dir.name, \"nt.png\"), retina=True),\n"," Image(filename=os.path.join(plot_dir.name, \"ece.png\"), retina=True),\n"," Image(filename=os.path.join(plot_dir.name, \"yc.png\"), retina=True)\n",")"]},{"cell_type":"markdown","id":"ad78630a","metadata":{"id":"ad78630a"},"source":["You can use `scripts/speech_recognition/confidence/benchmark_asr_confidence.py` to find optimal confidence hyperparameters."]},{"cell_type":"markdown","id":"15e25521","metadata":{"id":"15e25521"},"source":["# 4. Confidence applications"]},{"cell_type":"markdown","id":"dbb82877","metadata":{"id":"dbb82877"},"source":["## 4.1. Small WER improvenent\n","\n","Good confidence scores can slightly reduce WER by removing low confidence words from recognition results.\n","\n","Consider the following example."]},{"cell_type":"markdown","id":"02eb4e1f","metadata":{"id":"02eb4e1f"},"source":["Let's look at the detailed WER of the transcribed test set before and after removing words with low confidence score."]},{"cell_type":"code","execution_count":null,"id":"fdf790b5","metadata":{"id":"fdf790b5"},"outputs":[],"source":["drop_low_confidence_words = lambda x, y, z: \" \".join([xx for xx, yy in zip(x.split(), y) if yy >= z])\n","\n","\n","threshold = 0.3\n","\n","wer_initial = word_error_rate_detail([h.text for h in transcriptions], current_test_set.reference_texts)\n","print(\n"," f\"\"\"WER detail before removing low confidence words:\n"," WER:\\t{wer_initial[0]:.5f}\n"," INS_rate:\\t{wer_initial[2]:.5f}\n"," DEL_rate:\\t{wer_initial[3]:.5f}\n"," SUB_rate:\\t{wer_initial[4]:.5f}\"\"\"\n",")\n","\n","wer_conf_dropped = word_error_rate_detail(\n"," [drop_low_confidence_words(hyp.text, hyp.word_confidence, threshold) for hyp in transcriptions],\n"," current_test_set.reference_texts,\n",")\n","print(\n"," f\"\"\"WER detail after removing low confidence words:\n"," WER:\\t{wer_conf_dropped[0]:.5f}\n"," INS_rate:\\t{wer_conf_dropped[2]:.5f}\n"," DEL_rate:\\t{wer_conf_dropped[3]:.5f}\n"," SUB_rate:\\t{wer_conf_dropped[4]:.5f}\"\"\"\n",")"]},{"cell_type":"markdown","id":"28ac85b1","metadata":{"id":"28ac85b1"},"source":["You can see that with the right `threshold` can reduce WER by a tiny bit, reducing insertions and substitutions yet increasing deletions.\n","\n","Now let's see how to find the optimal threshold.\n","\n","The most commonly used method for automatically determining the optimal cutoff threshold is the maximum of the unnormalized Youden's curve.\n","\n","For the purpose of the WER reduction, let's compare it to explicitly minimizing WER with respect to a threshold."]},{"cell_type":"code","execution_count":null,"id":"9b81e449","metadata":{"id":"9b81e449"},"outputs":[],"source":["from joblib import Parallel, delayed\n","from multiprocessing import cpu_count\n","from tqdm.notebook import tqdm\n","\n","def max_unnnormalized_yc(\n"," y_true: Union[List[int], np.ndarray],\n"," y_score: Union[List[float], np.ndarray],\n"," n_bins: int = 100,\n"," start: float = 0.0,\n"," stop: float = 1.0,\n","):\n"," \"\"\"Calculate the maximum of the unnormalized Youden's curve.\n"," \"\"\"\n"," y_true = np.array(y_true)\n"," y_score = np.array(y_score)\n"," thresholds = np.linspace(start, stop, n_bins + 1)\n"," assert len(y_true) == len(y_score)\n"," assert np.all(y_true >= 0) and np.all(y_true <= 1)\n"," if np.all(y_true == 0) or np.all(y_true == 1):\n"," return 0.0, 0.0\n"," mask_correct = y_true == 1\n"," y_score_correct = y_score[mask_correct]\n"," y_score_incorrect = y_score[~mask_correct]\n"," unnnormalized_yc = []\n"," for threshold in thresholds:\n"," tn = len((y_score_incorrect < threshold).nonzero()[0])\n"," fn = len((y_score_correct < threshold).nonzero()[0])\n"," unnnormalized_yc.append((threshold, tn - fn))\n"," return max(unnnormalized_yc, key=lambda x: x[1])[0]\n","\n","\n","def min_wer(ref: List[str], transcriptions, n_bins: int = 100, start: float = 0.0, stop: float = 1.0):\n"," \"\"\"Find the threshold value that delivers the minimum WER.\n"," \"\"\"\n"," thresholds = np.linspace(start, stop, n_bins + 1)\n"," hyp = [(hyp.text, hyp.word_confidence) for hyp in transcriptions]\n"," _get_wer = lambda x, y, z: (x, word_error_rate_detail([drop_low_confidence_words(yy[0], yy[1], x) for yy in y], z)[0])\n"," wers = Parallel(n_jobs=cpu_count())(delayed(_get_wer)(threshold, hyp, ref) for threshold in tqdm(thresholds))\n"," return min(wers, key=lambda x: x[1])\n","\n","\n","targets_with_confidence = [get_word_targets_with_confidence(tran) for tran in transcriptions]\n","correct_marks = [\n"," get_correct_marks(r.split(), h.words) for r, h in zip(current_test_set.reference_texts, transcriptions)\n","]\n","y_true, y_score = np.array(\n"," [[f, p[1]] for cm, twc in zip(correct_marks, targets_with_confidence) for f, p in zip(cm, twc)]\n",").T\n","\n","threshold_yc = max_unnnormalized_yc(y_true, y_score)\n","yc_wer_value = word_error_rate(\n"," [drop_low_confidence_words(hyp.text, hyp.word_confidence, threshold_yc) for hyp in transcriptions],\n"," current_test_set.reference_texts,\n",")\n","threshold_min_wer, min_wer_value = min_wer(current_test_set.reference_texts, transcriptions)\n","\n","print(\n"," f\"\"\" Initial WER: {wer_initial[0]:.5f}\n"," Optimal threshold and WER based on the Youden's curve: {threshold_yc}, {yc_wer_value:.5f}\n"," Optimal threshold for the minimum WER: {threshold_min_wer}, {min_wer_value:.5f}\n"," \"\"\"\n",")"]},{"cell_type":"markdown","id":"3b278d2d","metadata":{"id":"3b278d2d"},"source":["As you can see, the optimal cutoff threshold as the maximum of the Youden's curve does not improve WER, and the optimal threshold for the minimum WER is zero.\n","\n","Let's use a different confidence estimation setup to see if we can improve WER at least a bit."]},{"cell_type":"code","execution_count":null,"id":"39f72c78","metadata":{"id":"39f72c78"},"outputs":[],"source":["confidence_cfg = ConfidenceConfig(\n"," preserve_word_confidence=True,\n"," preserve_token_confidence=True,\n"," aggregation=\"min\",\n"," measure_cfg=DictConfig({\"entropy_type\": \"tsallis\", \"alpha\": 1.5, \"entropy_norm\": \"lin\"}),\n",")\n","\n","model.change_decoding_strategy(\n"," RNNTDecodingConfig(fused_batch_size=-1, strategy=\"greedy_batch\", confidence_cfg=confidence_cfg)\n"," if is_rnnt\n"," else CTCDecodingConfig(confidence_cfg=confidence_cfg)\n",")\n","\n","transcriptions = model.transcribe(paths2audio_files=current_test_set.filepaths, batch_size=16, return_hypotheses=True, num_workers=4)\n","if is_rnnt:\n"," transcriptions = transcriptions[0]\n","\n","threshold_min_wer, min_wer_value = min_wer(current_test_set.reference_texts, transcriptions)\n","\n","print(\n"," f\"\"\" Initial WER: {wer_initial[0]:.5f}\n"," Optimal threshold for the minimum WER: {threshold_min_wer}, {min_wer_value:.5f}\n"," \"\"\"\n",")"]},{"cell_type":"markdown","id":"e00581b1","metadata":{"id":"e00581b1"},"source":["Overall, such an improvement in WER is too small to be considered. However, this opens up the possibility of improving WER through the use of more accurate confidence estimation methods."]},{"cell_type":"markdown","id":"f9f89665","metadata":{"id":"f9f89665"},"source":["## 4.2. Reducing hallucinations with confidence scores\n","\n","One common application of confidence scores is the removal of recognition hallucinations.\n","\n","Let's see how this can be done."]},{"cell_type":"markdown","id":"c1c28379","metadata":{"id":"c1c28379"},"source":["Firstly, let's obtain a dataset on which the ASR model will hallucinate.\n","\n","Here we make it from the librosa examples."]},{"cell_type":"code","execution_count":null,"id":"3b0a0b4c","metadata":{"id":"3b0a0b4c"},"outputs":[],"source":["from itertools import combinations\n","import json\n","import librosa\n","import soundfile as sf\n","\n","def cyclic_sum(x, y):\n"," if x.shape[0] < y.shape[0]:\n"," x, y = y, x\n"," if x.shape[0] > y.shape[0]:\n"," y = np.take(y, range(0, x.shape[0]), mode='wrap')\n"," return x + y\n","\n","def generate_noise_examples(example_list: List[str], save_dir: str, samplerate: int = 16000):\n"," \"\"\"Generate noise examples with librosa.\n"," It loads the selected example, inverts and perturbs them with each other.\n","\n"," Returns:\n"," A manifest with the noise wavs.\n"," \"\"\"\n"," samples = {ex: librosa.core.load(librosa.util.example(key=ex, hq=True), sr=samplerate)[0] \n"," for ex in example_list}\n"," noise_samples = {\"_\".join([left, right]): cyclic_sum(samples[left][::-1], samples[right][::-1]) \n"," for left, right in combinations(samples.keys(), 2)}\n","\n"," os.makedirs(save_dir, exist_ok=True)\n"," manifest = os.path.join(save_dir, \"manifest.json\")\n"," with open(manifest, \"tw\", encoding=\"utf-8\") as fout:\n"," for k, v in noise_samples.items():\n"," audio_path = os.path.join(save_dir, f\"{k}.wav\")\n"," sf.write(audio_path, v, samplerate=samplerate)\n"," metadata = {\n"," \"audio_filepath\": audio_path,\n"," \"duration\": librosa.core.get_duration(y=v, sr=samplerate),\n"," \"label\": \"noise\",\n"," \"text\": \"_\"\n"," }\n"," json.dump(metadata, fout)\n"," fout.write('\\n')\n","\n"," return manifest\n","\n","librosa_list_examples = ['brahms',\n"," 'choice',\n"," 'fishin',\n"," 'humpback',\n"," 'libri1',\n"," 'libri2',\n"," 'libri3',\n"," 'nutcracker',\n"," 'pistachio',\n"," 'robin',\n"," 'sweetwaltz',\n"," 'trumpet',\n"," 'vibeace']\n","sr = 16000\n","\n","noise_dir = os.path.join(DATA_DIR, \"noise\")\n","noise_manifest = generate_noise_examples(librosa_list_examples, noise_dir, sr)"]},{"cell_type":"markdown","id":"f7f9ddca","metadata":{"id":"f7f9ddca"},"source":["Now let's transcribe it, setting the default confidence estimator."]},{"cell_type":"code","execution_count":null,"id":"60f39094","metadata":{"id":"60f39094"},"outputs":[],"source":["noise_data = load_data(noise_manifest)\n","\n","confidence_cfg = ConfidenceConfig(\n"," preserve_word_confidence=True,\n"," preserve_token_confidence=True,\n",")\n","\n","model.change_decoding_strategy(\n"," RNNTDecodingConfig(fused_batch_size=-1, strategy=\"greedy_batch\", confidence_cfg=confidence_cfg)\n"," if is_rnnt\n"," else CTCDecodingConfig(confidence_cfg=confidence_cfg)\n",")\n","\n","noise_transcriptions = model.transcribe(\n"," paths2audio_files=noise_data.filepaths, batch_size=4, return_hypotheses=True, num_workers=4\n",")\n","if is_rnnt:\n"," noise_transcriptions = noise_transcriptions[0]"]},{"cell_type":"markdown","id":"2f192186","metadata":{"id":"2f192186"},"source":["On a fully non-speech dataset, hallucinations can be measured as the Word Insertions per Second (WIS) value."]},{"cell_type":"code","execution_count":null,"id":"3589da00","metadata":{"id":"3589da00"},"outputs":[],"source":["def word_insertions_per_second(texts: List[str], durations: List[float]):\n"," \"\"\"Calculate the Word Insertions per Second (WIS) value for the given recognition results \n"," and their corresponding audio duration.\n"," \"\"\"\n"," assert len(texts) == len(durations)\n","\n"," wis_per_utt = [len(text.split(\" \")) / duration for text, duration in zip(texts, durations)]\n"," return sum(wis_per_utt) / len(wis_per_utt), wis_per_utt\n","\n","wis, wis_per_utt = word_insertions_per_second([t.text for t in noise_transcriptions], noise_data.durations)\n","print(f\"Original Word Insertions per Second: {wis:.5f}\")"]},{"cell_type":"markdown","id":"a0d8135d","metadata":{"id":"a0d8135d"},"source":["Now, the ability of a confidence estimator to detect hallucinations is computed as the Hallucination Detection Rate (HDR).\n","\n","It shows how many of all hallucinations can be removed, provided that no more than some fixed percentage of correct words are erroneously removed (under normal recognition conditions).\n","\n","HDR is another name of the metric $\\mathrm{TNR}_{FNR=e}$ which is calculated as $\\mathrm{TNR}(Y,\\tau): \\mathrm{FNR}(X,\\tau) \\approx e$, where $X$ is the dataset with supervision (to tune $\\tau$) and $Y$ is the noise-only dataset. Typical $e$ value is 0.05.\n","\n","Let's compute HDR and the new WIS.\n","\n","The generated dataset is clearly distinct from speech, so $e=0.01$ is sufficient."]},{"cell_type":"code","execution_count":null,"id":"0612ccf6","metadata":{"id":"0612ccf6"},"outputs":[],"source":["def hdr(\n"," y_true_speech: Union[List[int], np.ndarray],\n"," y_score_speech: Union[List[float], np.ndarray],\n"," y_score_noise: Union[List[float], np.ndarray],\n"," max_fnr: float = 0.05,\n"," n_bins: int = 100,\n",") -> Tuple[float, float]:\n"," \"\"\"Compute Hallucination Detection Rate (HDR) from prediction scores.\n","\n"," Returns:\n"," tnr: True-Negateve Rate for HDR\n"," threshold_hdr: Optomal threshold \n"," \"\"\"\n"," y_true_speech = np.array(y_true_speech)\n"," y_score_speech = np.array(y_score_speech)\n"," y_score_noise = np.array(y_score_noise)\n"," thresholds = np.linspace(0, 1, n_bins + 1)\n"," assert y_true_speech.shape[0] == y_score_speech.shape[0]\n"," assert np.all(y_true_speech >= 0) and np.all(y_true_speech <= 1)\n"," if np.all(y_true_speech == 0) or np.all(y_true_speech == 1):\n"," return 0.0, 0.0\n"," mask_correct = y_true_speech == 1\n"," count_correct = max(mask_correct.nonzero()[0].shape[0], 1)\n"," y_score_correct = y_score_speech[mask_correct]\n"," threshold_hdr = 0.0\n"," for threshold in thresholds:\n"," fnr = (y_score_correct < threshold).nonzero()[0].shape[0] / count_correct\n"," if fnr <= max_fnr:\n"," threshold_hdr = threshold\n"," else:\n"," break\n"," tnr = (y_score_noise < threshold_hdr).nonzero()[0].shape[0] / y_score_noise.shape[0]\n"," return tnr, threshold_hdr\n","\n","\n","# e\n","max_fnr = 0.01\n","\n","correct_marks = [\n"," mark for r, h in zip(current_test_set.reference_texts, transcriptions) for mark in get_correct_marks(r.split(), h.words)\n","]\n","y_score_speech = [w for h in transcriptions for w in h.word_confidence]\n","y_score_noise = [w for h in noise_transcriptions for w in h.word_confidence]\n","hdr_score, threshold_hdr = hdr(correct_marks, y_score_speech, y_score_noise, max_fnr=max_fnr)\n","wis_new = wis - wis * hdr_score\n","\n","hdr_score, wis_new\n","print(\n"," f\"\"\" Hallucination Detection Rate for max_fnr={max_fnr} : {hdr_score:.5f}\n"," New Word Insertions Per Second: {wis_new:.5f}\"\"\"\n",")"]},{"cell_type":"markdown","id":"418297d6","metadata":{"id":"418297d6"},"source":["Finally, let's print the noisy utterances to see if any more hallucinations persist."]},{"cell_type":"code","execution_count":null,"id":"3815e8e3","metadata":{"id":"3815e8e3"},"outputs":[],"source":["sorted_wis_indices = sorted(enumerate(wis_per_utt), key=lambda x: x[1])[::-1]\n","\n","show_dataset_with_confidence(\n"," sorted_indices=sorted_wis_indices,\n"," transcriptions=noise_transcriptions,\n"," test_set=noise_data,\n"," threshold=threshold_hdr,\n"," html_show=True,\n"," min_dur_to_show=0.0,\n"," utt_to_show=10,\n",")"]},{"cell_type":"code","execution_count":null,"id":"0ac58ef2","metadata":{"id":"0ac58ef2"},"outputs":[],"source":[]}],"metadata":{"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.10"},"colab":{"provenance":[],"gpuType":"T4"},"accelerator":"GPU"},"nbformat":4,"nbformat_minor":5}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments about tutorial:

  1. Make sure that all commands work when running from existing nemo folder. E.g., librispeech download fails and needs to be changed to !python "{NEMO_DIR_PATH}/scripts/dataset_processing/get_librispeech_data.py" ...

  2. Typo Some of them confidence estimation as a binary classification task.

  3. Section that describes confidence metrics has too much information and it's not clear which metric should be used. I'd just add default/recommended there and mention that there are others that one can consider.

  4. The plots can be improved by adding labels to axes. Also same comment - maybe don't show every possible metric, just whatever is most useful

  5. In the section about WER improvement I get this output:

    WER detail before removing low confidence words:
        WER:	0.06180
        INS_rate:	0.00651
        DEL_rate:	0.00481
        SUB_rate:	0.05047
    WER detail after removing low confidence words:
        WER:	0.10731
        INS_rate:	0.00224
        DEL_rate:	0.08190
        SUB_rate:	0.02317
    

    So WER is increased, while the comment says it should be slightly improved.

  6. Should we remove cells that show that using "optimal threshold" with Yuden curve is worse than no modification in WER?

  7. I don't understand the output of the last cell (and in general what color visualization shows). Maybe you can write a few small notes about that. And in general when introducing new dataset for hallucinations, would be great to describe what's in it. Is it just noise and no speech at all? Maybe play a few samples, otherwise hard to follow.

  8. When building the very first model with max_prob confidence - maybe mention that you're using max_prob to highlight how it's worse than entropy. Otherwise users might copy that code thinking it's an example of how to create good confidence model.

  9. Maybe use a subset of librispeech to make everything faster?

  10. The following cell raises an error when is_rnnt=True:

tran = transcriptions[0]
print(
    f"""    Recognized text: `{tran.text}`\n
    Word confidence: {[round(c, 3) for c in tran.word_confidence]}\n
    Token confidence: {[round(c, 3) for c in tran.token_confidence]}\n
    Frame confidence: {[round(c, 3) for c in tran.frame_confidence]}"""
)
>>>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[28], line 6
      1 tran = transcriptions[0]
      2 print(
      3     f"""    Recognized text: `{tran.text}`\n
      4     Word confidence: {[round(c, 3) for c in tran.word_confidence]}\n
      5     Token confidence: {[round(c, 3) for c in tran.token_confidence]}\n
----> 6     Frame confidence: {[round(c, 3) for c in tran.frame_confidence]}"""
      7 )

Cell In[28], line 6, in <listcomp>(.0)
      1 tran = transcriptions[0]
      2 print(
      3     f"""    Recognized text: `{tran.text}`\n
      4     Word confidence: {[round(c, 3) for c in tran.word_confidence]}\n
      5     Token confidence: {[round(c, 3) for c in tran.token_confidence]}\n
----> 6     Frame confidence: {[round(c, 3) for c in tran.frame_confidence]}"""
      7 )

TypeError: type list doesn't define __round__ method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kipok thank you very much for your suggestions!

I moved backward compatibility related code into config classes.
As for the comments about tutorial:

  1. Done
  2. Fixed
  3. Clarification added
  4. Added
  5. Fixed
  6. Not removed, but added some clarification
  7. Description added, audio display added
  8. Mention added
  9. Was not done
  10. Fixed

@titu1994
Copy link
Collaborator

titu1994 commented Jun 7, 2023

I agree about the renaming of method arg. Seems very low value but brings large changes to support backward compat

@github-actions
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial looks good now, thanks!

For the renaming of parameters, unfortunately a few things still need to be fixed.

@@ -18,6 +18,6 @@ confidence:
aggregation: mean
method_cfg:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
method_cfg:
measure_cfg:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# TODO (alaptev): delete the following two lines sometime in the future
logging.warning("Re-writing `measure_cfg` with the value of `method_cfg`.")
self.measure_cfg = self.method_cfg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So unfortunately, this line throws an error:

omegaconf.errors.ValidationError: Invalid type assigned: ConfidenceMeasureConfig is not a subclass of str. value: ConfidenceMeasureConfig(name='entropy', entropy_type='renyi', alpha=0.25, entropy_norm='lin', temperature='DEPRECATED')

Not sure how to fix this - I briefly tried adding a Union to the type or adding with dict_open but didn't seem to fix it on my end. Maybe I was doing it incorrectly and you'll have more luck. But this needs to be fixed in some way as right now any model that specifies old value will not work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, would be nice to have a simple test for this, just to load create that config class with an old name and check that the new is set up correctly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Igor! I think I fixed the issue, but you better check for your case.

GNroy and others added 11 commits July 14, 2023 04:33
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
GNroy and others added 6 commits July 14, 2023 04:35
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
@GNroy GNroy requested a review from Kipok July 14, 2023 12:35
Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks @GNroy!

@GNroy GNroy requested a review from titu1994 July 15, 2023 15:30
@titu1994 titu1994 merged commit 33100e0 into NVIDIA:main Jul 15, 2023
jubick1337 pushed a commit that referenced this pull request Aug 8, 2023
* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
ekmb added a commit that referenced this pull request Aug 8, 2023
* Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (#7022) (#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (#7019) (#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (#6994) (#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (#7052) (#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (#7065) (#7077)

* Fix documentation for Numba



* Update force float32 flag dynamically



* Update force float32 flag dynamically



* Fix nemo version



---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit a46e325.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* add paths to labeler. (#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Jan Beckmann <king-jan1999@hotmail.de>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
dorotat-nv pushed a commit to dorotat-nv/NeMo that referenced this pull request Aug 24, 2023
* Fix race condition when executing with multi-node where some ranks does not wait for setup (NVIDIA#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (NVIDIA#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (NVIDIA#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (NVIDIA#7022) (NVIDIA#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (NVIDIA#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (NVIDIA#7019) (NVIDIA#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (NVIDIA#6994) (NVIDIA#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (NVIDIA#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (NVIDIA#7052) (NVIDIA#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (NVIDIA#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (NVIDIA#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (NVIDIA#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (NVIDIA#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (NVIDIA#7065) (NVIDIA#7077)

* Fix documentation for Numba

* Update force float32 flag dynamically

* Update force float32 flag dynamically

* Fix nemo version

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (NVIDIA#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (NVIDIA#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (NVIDIA#7034) (NVIDIA#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (NVIDIA#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (NVIDIA#7053) (NVIDIA#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (NVIDIA#7081)

* small Bugfix (NVIDIA#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (NVIDIA#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA#7067) (NVIDIA#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (NVIDIA#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (NVIDIA#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (NVIDIA#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (NVIDIA#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (NVIDIA#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (NVIDIA#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (NVIDIA#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

This reverts commit a46e325.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (NVIDIA#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (NVIDIA#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (NVIDIA#7141) (NVIDIA#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (NVIDIA#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (NVIDIA#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (NVIDIA#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (NVIDIA#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (NVIDIA#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (NVIDIA#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (NVIDIA#7027) (NVIDIA#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (NVIDIA#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (NVIDIA#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (NVIDIA#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (NVIDIA#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (NVIDIA#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (NVIDIA#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (NVIDIA#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* add paths to labeler. (NVIDIA#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Jan Beckmann <king-jan1999@hotmail.de>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Davood-M added a commit that referenced this pull request Aug 24, 2023
* migrated class

Signed-off-by: dorotat <dorotat@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: dorotat <dorotat@nvidia.com>

* added unit test

Signed-off-by: dorotat <dorotat@nvidia.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: dorotat <dorotat@nvidia.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a.

Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: dorotat <dorotat@nvidia.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: dorotat <dorotat@nvidia.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* add paths to labeler. (#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* T5 metrics fix (#7037)

* Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (#7022) (#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (#7019) (#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (#6994) (#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (#7052) (#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (#7065) (#7077)

* Fix documentation for Numba

* Update force float32 flag dynamically

* Update force float32 flag dynamically

* Fix nemo version

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmai…
zhehuaichen added a commit to zhehuaichen/NeMo that referenced this pull request Sep 22, 2023
* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a.

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>

* fix default attention size (#7141) (#7143)

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add paths to labeler. (#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* T5 metrics fix (#7037)

* Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (#7022) (#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (#7019) (#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (#6994) (#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (#7052) (#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (#7065) (#7077)

* Fix documentation for Numba



* Update force float32 flag dynamically



* Update force float32 flag dynamically



* Fix nemo version



---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* add paths to labeler. (#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyar…
zhehuaichen added a commit to zhehuaichen/NeMo that referenced this pull request Sep 22, 2023
* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)



* Fix import guard checks (NVIDIA#7124)



* Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

This reverts commit a46e325.

* Fix import guard checks (NVIDIA#7126)

* Fix import guard checks



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------




* Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)

* [TTS] Create EnCodec training recipe (NVIDIA#6852)

* [TTS] Create EnCodec training recipe



* [TTS] Update encodec recipe



* [TTS] Rename EnCodec to AudioCodec



* [TTS] Add EnCodec unit tests



* [TTS] Add copyright header to distributed.py



---------



* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061)




* fix default attention size (NVIDIA#7141) (NVIDIA#7143)

* fix evaluator.py for various exceptions by ast (NVIDIA#7150)



* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.



* unify configs into a single one and add detailed comments providing supported candidates.



* choose 36-final IPA as default phoneme dict



---------



* [TTS] Add output audio format to preprocessing (NVIDIA#6889)

* [TTS] Add output audio format to preprocessing



* [TTS] Add format validation



* [TTS] Fix data tutorial



---------



* freeze (NVIDIA#7152)



* make sure any empty segments are removed (NVIDIA#7155)



* Update RIR generation scripts (NVIDIA#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio



* A quickstart speech enhancement tutorial (NVIDIA#6492)

A simple example of training a model for speech enhancement task



* NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160)

* allow specifying colors of text in ASS subtitle file



* specify vertical_alignment instead of marginv in ass_file_config



* add documentation of CTMFileConfig and ASSFileConfig to NFA README



---------



* Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153)




* TE bug fix (NVIDIA#7027) (NVIDIA#7036)




* [TTS] Remove nested TTS configs (NVIDIA#7154)

* [TTS] Remove nested TTS configs



* [TTS] Modify tutorial to support multiple sampling rates



* [TTS] Clarify min_duration unit



* [TTS] Default 22.05kHz highfreq to null



---------



* Merge release r1.20.0 to main (NVIDIA#7167)

* update package info



* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage



* install_bs (NVIDIA#7019)



* Fix typo and branch in tutorial (NVIDIA#7048)



* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079



* fixes for pr review



---------



* fix links for TN (NVIDIA#7117)



* update branch (NVIDIA#7135)



* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20



* Update vad_utils.py



---------





* update branch



* fix version



* resolve conflict the other way



* keep both



* revert keep both



---------















* Upgrade to pytorch lightning 2.0 (NVIDIA#6433)

* Upgrade pytorch lightning version in requirements



* Initial fixes for PTL2.0



* Add further fixes to support lightning 2.0



* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end



* Replace all occurances of validation_epoch_end to on_validation_epoch_end



* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively



* Change logger=None to logger=False in Trainer object



* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass



* Modify trainer.precision check and other small edits



* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer



* Add default values for args to fix Attribute Error



* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU



* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end



* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings



* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel



* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py



* Revert an extra space that was mistakenly added



* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity



* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity



* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing



* Remove outputs arg from on_train_epoch_end



* Remove outputs from on_validation_epoch_end in multi_binary_acc.py



* Remove output args from on_validation_epoch_end in the docstrings of some ASR files



* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs



* Add on_validation_epoch_end and remove outputs args for nlp models



* Append output of validation_step to validation_step_outputs in EncDecClassificationModel



* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0



* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py



* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError



* Add if condition check for multiple dataloaders when appending to validation outputs



* Separate validation pass to be used with both validation_step and test_step



* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py



* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len



* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0



* Modify precision checks to account for 16-mixed and bf16-mixed



* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel



* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py



* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel



* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml



* Add split arg self.test_step_outputs to TextClassificationModel



* Add test_step_outputs to dialogue and text classification models



* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py



* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step



* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg



* Add val/test_step_outputs to S2SQAModel and GPTQAModel



* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error



* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py



* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed



* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed



* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py



* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN



* Precision fix and skip few failing tests



* Add missing comment lines in JenkinsFile



* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py



* Minor edit JenkinsFile



* Minor edit in jenkins file



* Edit in Jenkins file



* Comment missed lines in Jenkins file



* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file



* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py



* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files



* Fix all CI TTS tests and comment few Jenkins tests



* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py



* Add a missing comment in JenkinsFile



* Add try except StopIteration in validation_step for models with dataloader_iter



* Remove pyyaml from requirements



* Add try except for inference_step in megatron_finetune_model.py



* Remove limit_val_batches for mockGPTDataset test



* Add new self.validation_step_outputs for MegatronGPTSFTModel



* Minor edit Jenkinsfile



* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.



* Remove resume_from_checkpoint if trainer arg in conf yaml files



* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs



* Remove resume_from_checkpoint in duplex_tn_config.yaml



* Fix typos, unused imports and refactor code to remove redundant funcs



* Remove commented code in megatron_nmt_model.py



* Fix overriden functions to match parent class functions



* Prefetch dataloader_iter to prevent hang for PP>1



* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1



* Uncomment tests in JenkinsFile



* Add '16' to precision checks and other minor fixes



* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders



* Minor edits



* Modify precision checks to avoid indexing



* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs



* Reference checkpoint with trainer.ckpt_path



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT



---------




* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112)

* scripts for sft



* fix style



* adde special token only for huggingface model



* change default name



* print out error datapoint content



* show error id



* annotation script working



* try to be compatible with huggingface tokenizer



* added examples



* added lang



* added lang



* text to value special case



* configure the slider



* annoatation handles lang



* added the unit test for chat sft dataset



* used the file in the test dir



* fix json error



* load local tokenizer



* remove mask count check



* added HF dataset backend



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------




* add paths to labeler. (NVIDIA#7087)



* T5 metrics fix (NVIDIA#7037)

* Fix race condition when executing with multi-node where some ranks does not wait for setup (NVIDIA#7016)




* Added bool types to neural_types export (NVIDIA#7032)




* rnnt and char utils (NVIDIA#6971)

* rnnt_ngram_merge



* char level bug



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------






* fix tab text gen (NVIDIA#7022) (NVIDIA#7031)





* Fixed kwargs for metric instance init



* Fixed kwargs for metric instance init



* removed kwagrs



* Updated config desc



* ASR Confidence update and tutorial (NVIDIA#6810)

* small fixes and tests



* various fixes for the tutorial



* tutorial added



* for for a little oops after rebasement



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests



* unused import removed



* fix review comments



* deprecated parameters for greedy configs



* move re-assigning to configs



* fix comments 2



* fix config tests



* fix ece test (my env was bugged apparently)



* renamings for confidence ensemble



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3



* return dropped tutorial



* CI flips back and forth, increasing tolerance



---------





* install_bs (NVIDIA#7019) (NVIDIA#7028)





* fixes for spellmapper (NVIDIA#6994) (NVIDIA#7000)






* added back the retro documents (NVIDIA#7033)




* Remove pyyaml (NVIDIA#7052) (NVIDIA#7054)





* st standalone model (NVIDIA#6969)

* st standalone model



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix



* sacrebleu import fix, unused imports removed



* import guard for nlp inside asr transformer bpe model



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered



* import ordering fix



* yttm for asr removed



* logging added



* added inference and translate method



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------





* remove pos emb from state dict for old models (NVIDIA#7068)

* remove pos emb from state dict



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment



* fix nmt test



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test



---------





* Fix typo in ASR-TTS tutorial (NVIDIA#7049)




* Fixed tutorial's name (NVIDIA#7047)





* Fix documentation for Numba (NVIDIA#7065) (NVIDIA#7077)

* Fix documentation for Numba



* Update force float32 flag dynamically



* Update force float32 flag dynamically



* Fix nemo version



---------






* Update Frame-VAD doc and fix onnx export (NVIDIA#7076)

* update fvad doc



* fix typo



* update fvad example



* update



* fix onnx export



* update test



* refactor



* update doc



* update



---------





* memmap worker arg (NVIDIA#7062)

* memmap worker arg



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update



* update



---------





* Fix caching bug in causal convolutions for cache-aware ASR models (NVIDIA#7034) (NVIDIA#7082)




* Fast Conformer global token fix (NVIDIA#7085)

* old way



* fix



* fix



* fix



* remove extra



* clean



* clean



* clean



* fix



* fix



* fix



* fix



* fix



* fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------





* Refined export_config (NVIDIA#7053) (NVIDIA#7066)

* Refined export_config
* Rolling back hierarchy change
---------





* small Bugfix (NVIDIA#7081)

* small Bugfix (NVIDIA#7079)

* fix branch



* fix typo



* fix link



---------



* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb



* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb



---------







* Added script to extract ASR CTC and RNNT models from ASR hybrid models (NVIDIA#7092)

* Added script to extract ctc and rnnt models from hybrid models



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag



---------






* Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA#7067) (NVIDIA#7094)



* update TTS readme (NVIDIA#7088)

* update TTS readme



---------




* Fix absolute path in path join call (NVIDIA#7099)




* Disable distopt contiguous param buffer by default (NVIDIA#7095)




* microphone demo (NVIDIA#7110)





* [Fix] load_state_dict in nlp_model.py (NVIDIA#7086)

* Fix load_state_dict in nlp_model.py



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------





* Fix plot function in vad_utils.py (NVIDIA#7113)

Fix plot function in vad_utils.py




* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)




* Fix import guard checks (NVIDIA#7124)




* Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

This reverts commit a46e325.



* Fix import guard checks (NVIDIA#7126)

* Fix import guard checks



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------





* Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)



* [TTS] Create EnCodec training recipe (NVIDIA#6852)

* [TTS] Create EnCodec training recipe



* [TTS] Update encodec recipe



* [TTS] Rename EnCodec to AudioCodec



* [TTS] Add EnCodec unit tests



* [TTS] Add copyright header to distributed.py



---------




* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061)





* fix default attention size (NVIDIA#7141) (NVIDIA#7143)



* fix evaluator.py for various exceptions by ast (NVIDIA#7150)




* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.



* unify configs into a single one and add detailed comments providing supported candidates.



* choose 36-final IPA as default phoneme dict



---------




* [TTS] Add output audio format to preprocessing (NVIDIA#6889)

* [TTS] Add output audio format to preprocessing



* [TTS] Add format validation



* [TTS] Fix data tutorial



---------




* freeze (NVIDIA#7152)




* make sure any empty segments are removed (NVIDIA#7155)




* Update RIR generation scripts (NVIDIA#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio




* A quickstart speech enhancement tutorial (NVIDIA#6492)

A simple example of training a model for speech enhancement task




* NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160)

* allow specifying colors of text in ASS subtitle file



* specify vertical_alignment instead of marginv in ass_file_config



* add documentation of CTMFileConfig and ASSFileConfig to NFA README



---------




* Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153)





* TE bug fix (NVIDIA#7027) (NVIDIA#7036)





* [TTS] Remove nested TTS configs (NVIDIA#7154)

* [TTS] Remove nested TTS configs



* [TTS] Modify tutorial to support multiple sampling rates



* [TTS] Clarify min_duration unit



* [TTS] Default 22.05kHz highfreq to null



---------




* Merge release r1.20.0 to main (NVIDIA#7167)

* update package info



* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage



* install_bs (NVIDIA#7019)



* Fix typo and branch in tutorial (NVIDIA#7048)



* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079



* fixes for pr review



---------



* fix links for TN (NVIDIA#7117)



* update branch (NVIDIA#7135)



* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20



* Update vad_utils.py



---------





* update branch



* fix version



* resolve conflict the other way



* keep both



* revert keep both



---------
















* Upgrade to pytorch lightning 2.0 (NVIDIA#6433)

* Upgrade pytorch lightning version in requirements



* Initial fixes for PTL2.0



* Add further fixes to support lightning 2.0



* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end



* Replace all occurances of validation_epoch_end to on_validation_epoch_end



* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively



* Change logger=None to logger=False in Trainer object



* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass



* Modify trainer.precision check and other small edits



* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer



* Add default values for args to fix Attribute Error



* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU



* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end



* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings



* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel



* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py



* Revert an extra space that was mistakenly added



* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity



* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity



* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing



* Remove outputs arg from on_train_epoch_end



* Remove outputs from on_validation_epoch_end in multi_binary_acc.py



* Remove output args from on_validation_epoch_end in the docstrings of some ASR files



* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs



* Add on_validation_epoch_end and remove outputs args for nlp models



* Append output of validation_step to validation_step_outputs in EncDecClassificationModel



* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0



* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py



* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError



* Add if condition check for multiple dataloaders when appending to validation outputs



* Separate validation pass to be used with both validation_step and test_step



* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py



* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len



* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0



* Modify precision checks to account for 16-mixed and bf16-mixed



* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel



* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py



* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel



* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml



* Add split arg self.test_step_outputs to TextClassificationModel



* Add test_step_outputs to dialogue and text classification models



* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py



* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step



* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg



* Add val/test_step_outputs to S2SQAModel and GPTQAModel



* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error



* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py



* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed



* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed



* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py



* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN



* Precision fix and skip few failing tests



* Add missing comment lines in JenkinsFile



* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py



* Minor edit JenkinsFile



* Minor edit in jenkins file



* Edit in Jenkins file



* Comment missed lines in Jenkins file



* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file



* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py



* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files



* Fix all CI TTS tests and comment few Jenkins tests



* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py



* Add a missing comment in JenkinsFile



* Add try except StopIteration in validation_step for models with dataloader_iter



* Remove pyyaml from requirements



* Add try except for inference_step in megatron_finetune_model.py



* Remove limit_val_batches for mockGPTDataset test



* Add new self.validation_step_outputs for MegatronGPTSFTModel



* Minor edit Jenkinsfile



* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.



* Remove resume_from_checkpoint if trainer arg in conf yaml files



* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs



* Remove resume_from_checkpoint in duplex_tn_config.yaml



* Fix typos, unused imports and refactor code to remove redundant funcs



* Remove commented code in megatron_nmt_model.py



* Fix overriden functions to match parent class functions



* Prefetch dataloader_iter to prevent hang for PP>1



* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1



* Uncomment tests in JenkinsFile



* Add '16' to precision checks and other minor fixes



* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders



* Minor edits



* Modify precision checks to avoid indexing



* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs



* Reference checkpoint with trainer.ckpt_path



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT



---------





* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112)

* scripts for sft



* fix style



* adde special token only for huggingface model



* change default name



* print out error datapoint content



* show error id



* annotation script working



* try to be compatible with huggingface tokenizer



* added examples



* added lang



* added lang



* text to value special case



* configure the slider



* annoatation handles lang



* added the unit test for chat sft dataset



* used the file in the test dir



* fix json error



* load local tokenizer



* remove mask count check



* added HF dataset backend



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------





* add paths to labeler. (NVIDIA#7087)




* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------
















































Co-authored-by: Adi Renduchintala <adithyar…

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: Xin Yao <yaox12@outlook.com>
Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: smajumdar <smajumdar@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Vahid <vnoroozi@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Signed-off-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com>
Signed-off-by: Greg Clark <grclark@nvidia.com>
Signed-off-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: jasonwan <jasonwan@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Siddharth Tyagi <styagi130@gmail.com>
Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Signed-off-by: Jason Wang <jasonwan@nvidia.com>
Signed-off-by: arendu <adithyare@nvidia.com>
Signed-off-by: Alireza Morsali <alireza.mors@gmail.com>
Signed-off-by: Siddharth Tyagi <siddhartht@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: mburchi <maxime.burchi@gmail.com>
Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Signed-off-by: Alexander Jipa <azzhipa@amazon.com>
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
Signed-off-by: lhb8125 <lhb8125@gmail.com>
Signed-off-by: Robin Dong <robin.k.dong@gmail.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Anton Peganov <apeganov@nvidia.com>
Signed-off-by: Samuele Cornell <cornellsamuele@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Tamerlan Tabolov <tktabolov@gmail.com>
Signed-off-by: zhehuaichen <139396994+zhehuaichen@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Matvei Novikov <mattyson.so@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jan Beckmann <king-jan1999@hotmail.de>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Xin Yao <yaox12@outlook.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: bene-ges <61418381+bene-ges@users.noreply.github.com>
Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Neha Tadimeti <ntadimeti@nvidia.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: Dima Rekesh <bmwshop@gmail.com>
Co-authored-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com>
Co-authored-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Co-authored-by: Greg Clark <grclark@nvidia.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Jason Wang <jasonwan@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: guyueh1 <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: styagi130 <siddharth.tyagi2015@vit.ac.in>
Co-authored-by: Siddharth Tyagi <siddhartht@nvidia.com>
Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Co-authored-by: Alireza Morsali <32244795+AlirezaMorsali@users.noreply.github.com>
Co-authored-by: styagi130 <styagi130@gmail.com>
Co-authored-by: dorotat-nv <115542912+dorotat-nv@users.noreply.github.com>
Co-authored-by: Maxime Burchi <60737204+burchim@users.noreply.github.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: eharper <eharper@nvidia.com>
Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>
Co-authored-by: Kelvin Liu <lhb8125@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Alexander Jipa <alexander.jipa@gmail.com>
Co-authored-by: Alexander Jipa <azzhipa@amazon.com>
Co-authored-by: omahs <73983677+omahs@users.noreply.github.com>
Co-authored-by: Robin Dong <robin.k.dong@gmail.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: PeganovAnton <apeganov@nvidia.com>
Co-authored-by: Samuele Cornell <cornellsamuele@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: Tamerlan Tabolov <nektonikto999@gmail.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Fix race condition when executing with multi-node where some ranks does not wait for setup (NVIDIA#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (NVIDIA#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (NVIDIA#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (NVIDIA#7022) (NVIDIA#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (NVIDIA#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (NVIDIA#7019) (NVIDIA#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (NVIDIA#6994) (NVIDIA#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (NVIDIA#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (NVIDIA#7052) (NVIDIA#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (NVIDIA#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (NVIDIA#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (NVIDIA#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (NVIDIA#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (NVIDIA#7065) (NVIDIA#7077)

* Fix documentation for Numba



* Update force float32 flag dynamically



* Update force float32 flag dynamically



* Fix nemo version



---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (NVIDIA#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (NVIDIA#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (NVIDIA#7034) (NVIDIA#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (NVIDIA#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (NVIDIA#7053) (NVIDIA#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (NVIDIA#7081)

* small Bugfix (NVIDIA#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (NVIDIA#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA#7067) (NVIDIA#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (NVIDIA#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (NVIDIA#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (NVIDIA#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (NVIDIA#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (NVIDIA#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (NVIDIA#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (NVIDIA#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

This reverts commit ae7624d.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (NVIDIA#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (NVIDIA#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (NVIDIA#7141) (NVIDIA#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (NVIDIA#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (NVIDIA#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (NVIDIA#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (NVIDIA#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (NVIDIA#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (NVIDIA#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (NVIDIA#7027) (NVIDIA#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (NVIDIA#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (NVIDIA#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (NVIDIA#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (NVIDIA#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (NVIDIA#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (NVIDIA#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (NVIDIA#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* add paths to labeler. (NVIDIA#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Jan Beckmann <king-jan1999@hotmail.de>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* migrated class

Signed-off-by: dorotat <dorotat@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: dorotat <dorotat@nvidia.com>

* added unit test

Signed-off-by: dorotat <dorotat@nvidia.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: dorotat <dorotat@nvidia.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit ae7624da7d773a6b9436ff61903dc4b99c7c27cb.

Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: dorotat <dorotat@nvidia.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: dorotat <dorotat@nvidia.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* add paths to labeler. (#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

* T5 metrics fix (#7037)

* Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (#7022) (#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (#7019) (#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (#6994) (#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (#7052) (#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (#7065) (#7077)

* Fix documentation for Numba

* Update force float32 flag dynamically

* Update force float32 flag dynamically

* Fix nemo version

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit ae7624da7d773a6b9436ff61903dc4b99c7c27cb.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmai…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants