Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for Numba FP16 RNNT Loss (NVIDIA#6991) (NVIDIA#7038)
* Force working space memory to always be in fp32 Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for fp16 testing in Numba Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for fp16 testing in Numba Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for fp16 testing in Numba Signed-off-by: smajumdar <titu1994@gmail.com> * Fix cost calculation by upcasting to fp32 Signed-off-by: smajumdar <titu1994@gmail.com> * Fix cost calculation by upcasting to fp32 Signed-off-by: smajumdar <titu1994@gmail.com> * Add support to check if numba fp16 is available Signed-off-by: smajumdar <titu1994@gmail.com> * add RNN-T loss implemented by PyTorch and test code (#5312) * Fix the bugs in cache-aware streaming Conformer (#5032) Signed-off-by: Vahid <vnoroozi@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * IA3 support for GPT and T5 (#4909) * init commit for ia3 adater training in GPT Signed-off-by: arendu <adithya.r@gmail.com> * ia3 adater training in GPT, models and adapter classes Signed-off-by: arendu <adithya.r@gmail.com> * reshape to operate even on non-contiguous tensors Signed-off-by: arendu <adithya.r@gmail.com> * configs Signed-off-by: arendu <adithya.r@gmail.com> * fixed none init Signed-off-by: arendu <adithya.r@gmail.com> * adding adapter and ia3 support for T5 based models Signed-off-by: arendu <adithya.r@gmail.com> * style fix Signed-off-by: arendu <adithya.r@gmail.com> * config update and t5 model adapter and ia3 Signed-off-by: arendu <adithya.r@gmail.com> * removed unused imports Signed-off-by: arendu <adithya.r@gmail.com> * predict step for inference Signed-off-by: arendu <adithya.r@gmail.com> * style fix Signed-off-by: arendu <adithya.r@gmail.com> * style fix Signed-off-by: arendu <adithya.r@gmail.com> * adapter inference for t5 Signed-off-by: arendu <adithya.r@gmail.com> * style fix Signed-off-by: arendu <adithya.r@gmail.com> * fixed bug micro and global batch size in eval Signed-off-by: arendu <adithya.r@gmail.com> * minor edit Signed-off-by: arendu <adithya.r@gmail.com> * agressive truncation if in test examples if no truncation field is given Signed-off-by: arendu <adithya.r@gmail.com> * corrected for language_model_path name changes in main Signed-off-by: arendu <adithya.r@gmail.com> * removed unused import Signed-off-by: arendu <adithya.r@gmail.com> * name change for language_model_path Signed-off-by: arendu <adithya.r@gmail.com> * include inter_attention to IA3 Signed-off-by: arendu <adithya.r@gmail.com> * minor fix in confg Signed-off-by: arendu <adithya.r@gmail.com> * minor fixes Signed-off-by: arendu <adithya.r@gmail.com> * removed unused flag Signed-off-by: arendu <adithya.r@gmail.com> * addressing PR comments Signed-off-by: arendu <adithya.r@gmail.com> * address PR comments Signed-off-by: arendu <adithya.r@gmail.com> * minor fix Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: arendu <adithya.r@gmail.com> * CI test Signed-off-by: arendu <adithya.r@gmail.com> * minor fix in jenkinsfile Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Bug fix - Limit val batches set to 1.0 (#5023) * Bug fix Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adressed sandeep's comments * Fixing limit val batches support in bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixing limit val batches support in bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [bug_fix] kv_channels is used when available (#5066) * fix bug s.t kv_channels is used when available Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * P&C Docs (#5068) (#5069) Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Add spe_split_by_unicode_script arg (#5072) * Add spe_split_by_unicode_script arg Signed-off-by: Anas <aabouallaban@pm.me> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Anas <aabouallaban@pm.me> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * probabilites -> probabilities (#5078) (#5079) Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * increase PR and Issue sweep quantity and active close PRs. (#5073) * increase PR and Issue sweep quantity and active close PRs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * update with stricter rules, 30 days to be stale and 7 days to be closed for both Issues and PRs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [TTS] added missing German phoneme tokenizer. (#5070) (#5074) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * rename to match prompt leanring (#5076) Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Missing fixes from r1.11.0 to T5 finetuning eval (#5054) (#5061) * Fixes to seq2seq eval Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Notebook bug fixes (#5084) (#5085) * Notebook bug fixes Signed-off-by: Virginia Adams <vadams@nvidia.com> * Turned nemo install back on Signed-off-by: Virginia Adams <vadams@nvidia.com> * reverted notebook Signed-off-by: Virginia Adams <vadams@nvidia.com> * Updated one line in entity linking nb Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * update strategy in notebook from ddp_fork to dp (#5088) (#5089) Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix bug in Squeezeformer Conv block (#5011) (#5024) * Fix bug in Squeezeformer Conv block Signed-off-by: smajumdar <smajumdar@nvidia.com> * Fix kernel context Signed-off-by: smajumdar <smajumdar@nvidia.com> * Fix access mixin Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * fixed megatron lm conversion bug (PTL related) (#5038) (#5063) Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David <amosalla@asu.edu> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix Unhashable type list for Numba Cuda spec augment kernel (#5093) (#5094) Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix numba (#5098) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Make it possible to specify output_filename in normalize_with_audio.py (#5092) Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Greedy decoding confidence for CTC and RNNT (#4931) * rnnt confidence draft Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * word confidence Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * advanced entropies added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * refactoring Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * oops forgot a file Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * metrics and benchmarking script added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * style fix Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * texterrors installation added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * lgtm and bug fix Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix comments Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix typos Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * add missing import after rebase Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [Add] SLURP models and examples (#4668) * add model, util and loss Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor annd update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update and refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * update and refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * update and refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update available models Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor data processing Signed-off-by: stevehuang52 <heh@nvidia.com> * fix typo Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor and update Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc Signed-off-by: stevehuang52 <heh@nvidia.com> * move transformer to asr.modules Signed-off-by: stevehuang52 <heh@nvidia.com> * move transformer to asr.modules Signed-off-by: stevehuang52 <heh@nvidia.com> * get rid of jsonlines Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to nlp Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * only optimize params that are part of the adapter modules (#5086) Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Pipeline Parallel T5 Prompt Learning (#4956) * Added pre process flag checks and pipeline parallel in fwd Signed-off-by: Virginia Adams <vadams@nvidia.com> * Added rank check for pipeline parallel Signed-off-by: Virginia Adams <vadams@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * T5 prompt learning works! Signed-off-by: Virginia Adams <vadams@nvidia.com> * IA3 passing CI Signed-off-by: Virginia Adams <vadams@nvidia.com> * Fixed typo Signed-off-by: Virginia Adams <vadams@nvidia.com> * removed optimizer setup so Adi's change will not conflict Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [TTS] remove phonemizer.py (#5090) remove phonemizer.py and convert code block to markdown in the tutorial. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * T5 Decoding with PP > 2 fix (#5091) (#5103) * set sequence lenghts in the pipeline properly Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names (#5087) (#5102) * fixed hifigan configs as well * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix and refactor consumed samples save/restore for Megatron models. (#5077) * Fixes and refactor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused imports Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * RIR corpus generator tool (#4927) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Multiprocessing fix (#5106) (#5107) Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [Bug fix] PC lexical + audio (#5109) (#5110) * training running Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [Fix] schedulers with no max_steps param (#4564) * fix schedulers Signed-off-by: stevehuang52 <heh@nvidia.com> * update to use python inspect module Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * T5 prompt learning fixes missing from r.11.0 merge (#5075) (#5101) * Fix special tokens Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: David <amosalla@asu.edu> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: David <amosalla@asu.edu> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [TTS] Add NeMo TTS Primer Tutorial (#4933) * [TTS] Add NeMo TTS Primer Tutorial Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Add Squeezeformer CTC model checkpoints on Librispeech (#5121) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * adding loss normalization options to rnnt joint (#4829) * adding normalization options to rnnt joint loss * moving the param to joint * moving loss normalization to rnnt loss config * style * cleaning up * fixing sum reduction in joint Signed-off-by: Dima Rekesh <drekesh@nvidia.com> * moving reduction into RNNT loss class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring * typos Signed-off-by: Dima Rekesh <drekesh@nvidia.com> Signed-off-by: Dima Rekesh <drekesh@nvidia.com> Co-authored-by: Dima Rekesh <drekesh@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Asr concat dataloader (#5108) * forced precision * typo * initial commit Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * typos and bugs Signed-off-by: Dima Rekesh <drekesh@nvidia.com> * reverting conformer encoder Signed-off-by: Dima Rekesh <drekesh@nvidia.com> * additional checks Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * adding support to CTC models as well * reverting conformer_encoder Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * typo Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring Signed-off-by: Dima Rekesh <drekesh@nvidia.com> * merging Signed-off-by: Dima Rekesh <drekesh@nvidia.com> Signed-off-by: Dima Rekesh <bmwshop@gmail.com> Signed-off-by: Dima Rekesh <drekesh@nvidia.com> Co-authored-by: Dima Rekesh <drekesh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * fix blossom ci unittests Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * bugfix: pybtex.database.InvalidNameString: Too many commas in author field. (#5112) (#5115) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Uppdate container version to 22.09 (#5105) * update container version Signed-off-by: ericharper <complex451@gmail.com> * pin click Signed-off-by: ericharper <complex451@gmail.com> * pin click 8.0.2 Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Remove unsupported arguments from MegatronNMT (#5065) * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * More fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * pp2 support for T5 IA3 learning and T5 Adapters learning (#5116) * enabling pp2 Signed-off-by: arendu <adithya.r@gmail.com> * optimizer update Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * T5 pp>1 support for adapters and ia3 Signed-off-by: arendu <adithya.r@gmail.com> * fix bug with missing adapter_tuning Signed-off-by: arendu <adithya.r@gmail.com> * inference error fixed, pp=2 Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * T5 Prompt Learning Fixes for Pipeline Parallel (#5120) * Initial fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Added back validation acc Signed-off-by: Virginia Adams <vadams@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Put num workers back Signed-off-by: Virginia Adams <vadams@nvidia.com> * added relative encoding if statament Signed-off-by: Virginia Adams <vadams@selene-login-01.nvidia.com> * Added back val loss only validation Signed-off-by: Virginia Adams <vadams@nvidia.com> * Revert "Added back val loss only validation" This reverts commit 86d8f4806fe30335c40c3716ce18259939df500f. * Removed val acc for PP > 1 Signed-off-by: Virginia Adams <vadams@nvidia.com> * Removed enc_seq_len if statement Signed-off-by: Virginia Adams <vadams@nvidia.com> * Added back validation acc calc Signed-off-by: Virginia Adams <vadams@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <vadams@selene-login-01.nvidia.com> Co-authored-by: Virginia Adams <vadams@nvidia.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Virginia Adams <vadams@selene-login-01.nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * add doc info (#4721) Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [TTS] Add SpanishCharsTokenizer (#5135) * [TTS] Add SpanishCharsTokenizer Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Update megatron interface to dialogue (#4936) * fix style formatting Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update template to include description of intent Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * changes based on requests in review Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add compatibility with assistant dataset Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove dialogue_state_tracking Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update huggingface utils for dialogue Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix style Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix typo Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add docstrings for assistant data processsor Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkins for SGDGEN local checkpoint Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update style Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * use local vocab file for Jenkinsfile Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * patch for Jenkins CI using local file Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add slot filling prediction and metrics Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unused code Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * refactor metrics code out of Dialogue GPT Model Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate backward compatible support for IntentSlotClassificationModel (bert model) Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * save prediction file for IntentSlotClassification Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update dialogue gpt model training for megatron gpt Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove batch generate for HF GPT2, which causes lower performance Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add few shot capability to dialogue gpt model Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile and remove unused import Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update code description and clarity Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address PR comments Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate compatibility with ZeroShotIntentModel Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * rename folder to dialogue due to increased scope and further refactor for clarity Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * added dialogue GPT for sequence generation task (e.g. answer extender) Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add CI test for DialogueGPTGenerationModel Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate DialogueS2SGenerationModel for generation task (e.g. answer extender) Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * modify huggingface utils to support HF t5/BART models Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unused imports Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update bleu metric Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix bleu metric style Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * debug bleu metric Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * debug bleu metric Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update based on PR #3893 Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update 2 based on PR #3893 Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update 3 based on PR #3893 Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate sgd generation based on user user utterance and system slot-values to generate system utterance Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add validation model saving capabilities Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * cleaned up code for SGD Based Answer extender Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Dialogue Generation CI Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkinsfile Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix Jenkins CI issue" Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add support for design dataset Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unnecessary imports Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * support megatron for dialogue_s2s_generation_model Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * reduce loaded samples in MSMarcoDataProcessor to 64 when cfg.model.dataset.debug_mode=True Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update CI Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update checkpoint and predictions filename to include epoch number Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate HF BART MNLI into zero shot intent model Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate Dialogue Nearest Neighbour Model Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * refactor Dialogue SGD Data Processor to make interface for models cleaner Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update Dialogue S2S Generation model for DialogueSGDDataProcessor interface Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update jenkins Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * support sgd and drive thru datasets by zero shot model and nearest neighbour model Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add prediction saving code to nearest neighbour and zero shot intent models Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix typo in sgd data processor Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * integrate Dialogue Mellon QA Data Processor Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update mellon qa Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update dialogue.py to remove outdated info Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update dialogue_config.yaml Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update dialogue_config.yaml Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add dialogue docs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address review comments Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix for cfg Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * make dependency on apex optional Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * change NLPDDPluggin calling logic to make it possible to run without apex Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add first draft of tutorial Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * reduce ms marco size by removing lines without wellFormedAnswers Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address pr comments Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update colab tutorial link in dialogue docs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * include unit test and some refactor to facilitate unit test Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address pr issues Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove typos in dialogue tutorial Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * support larger files for question answering Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unnecessary artifacts to reduce memory use Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * put 0 tensor to device Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update link within dialogue tutorial Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * restore previously delete files Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error handling when loss = nan Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update nan handling Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update spanning loss func Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update spanning loss Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix type error raised in qa_dataset.py Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add error checking message Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * revert back to float32 Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * revert back to float32 Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update exp logging Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msgs Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update loading of large file from pickle to json Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update loading of large file from pickle to json Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * limit number of negative samples Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * revert post processing Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * revert post processing Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unused methods and style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add more documentation Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unused imports Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * changes base on PR review Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * set wandb logger falseby default Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update interface with megatron gpt prompt learning Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update inline documentation Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update prompt_ids Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update error msg Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update config Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update config Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * set inference = False for dialgue prompt learning during trainng Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * set inference = False for dialgue prompt learning during trainng Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unused code Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update config yaml Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix bug for megatron gpt prompt learning Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove unused import Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address comments in PR Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address comments in PR Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address typo Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * add megatron t5 inference Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix bug due to bert tokenizer not being space-aware Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update style Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update IntentSlotModel onnx export test Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update style Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update exportable Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address PR comments Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * replace functools.cache_property with functools.lru_cache to maintain python 3.7 compatibility Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * improve speed of rank_candidates and support for p tuning Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update dialogue.py Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * fix megatron prompt learning saving bug Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update generate_candidate method Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * remove repeated init text ids and invert attention masks Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update typo Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * custom collate fn to remove excess padding in batch Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * style fix Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update complete method to mitigate issue when max seq len is low Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * address pr comments Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> * update generation interface Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> Signed-off-by: Zhilin Wang <zhilinw@nvidia.com> Co-authored-by: Zhilin Wang <zhilinw@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Added save inference ready .nemo file with every checkpoint (#5055) * Added save inference ready .nemo file with every checkpoint Signed-off-by: Virginia Adams <vadams@nvidia.com> * Python style fix Signed-off-by: Virginia Adams <vadams@nvidia.com> * addressed Adi's comment Signed-off-by: Virginia Adams <vadams@nvidia.com> * Added ptuning check in model checkpoint saving Signed-off-by: Virginia Adams <vadams@nvidia.com> * Changed save_nemo_on_valdaition default to False Signed-off-by: Virginia Adams <vadams@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes global batch size of adapter CI Signed-off-by: Virginia Adams <vadams@nvidia.com> * Changed num workers to 0 Signed-off-by: Virginia Adams <vadams@nvidia.com> * added first stage of pipeline check Signed-off-by: Virginia Adams <vadams@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams <vadams@nvidia.com> Signed-off-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training (#5118) * Remove ; from jupyter notebook cells Signed-off-by: Igor Gitman <igitman@nvidia.com> * Fix typos in documentation/code Signed-off-by: Igor Gitman <igitman@nvidia.com> * Fix output message to have 'or equal' Signed-off-by: Igor Gitman <igitman@nvidia.com> * Link formatting fixes Signed-off-by: Igor Gitman <igitman@nvidia.com> * Add error if max_utts is used in tarred datasets Signed-off-by: Igor Gitman <igitman@nvidia.com> * Remove max_utts parameter from tarred datasets Signed-off-by: Igor Gitman <igitman@nvidia.com> * Fix max_utts removal in tests Signed-off-by: Igor Gitman <igitman@nvidia.com> * Fix typo if -> is Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Merge r1.12.0 main (#5139) * update branch Signed-off-by: ericharper <complex451@gmail.com> * Add cherry-pick action (#4958) * add cherry-pick action Signed-off-by: ericharper <complex451@gmail.com> * Pin Transformers version to fix CI (#4955) * Pin transformers version in CI to prevent offline tokenizer loading error Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Drop version Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Disable offline temporarily Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Disable offline temporarily Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Enable offline Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> * upper bound transformers Signed-off-by: ericharper <complex451@gmail.com> * remove duplicate transformers requirement Signed-off-by: ericharper <complex451@gmail.com> * Release SOTA Lang ID model (#5080) * add pretrained lang id model ambernet Signed-off-by: fayejf <fayejf07@gmail.com> * update doc and style fix Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <fayejf07@gmail.com> * update branch and package info Signed-off-by: ericharper <complex451@gmail.com> * remove upper bounds on lightning and transformers Signed-off-by: ericharper <complex451@gmail.com> * remove transformers offline from ci Signed-off-by: ericharper <complex451@gmail.com> * upper bound transformers Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Added ASR model comparison to SDE (#5043) SDE: Added ASR model comparison tool to SDE transcribe speech: Added support for many predictions in one file, as well as custom field names Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * fix nmt eval sampler (#5154) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix Global init steps (#5143) * move global step to base Signed-off-by: Yi Dong <yidong@nvidia.com> * fix fused softmax Signed-off-by: Yi Dong <yidong@nvidia.com> * add the missing file Signed-off-by: Yi Dong <yidong@nvidia.com> * update the fused kernel Signed-off-by: Yi Dong <doyend@gmail.com> * fix import error Signed-off-by: Yi Dong <doyend@gmail.com> * fix import again Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [TTS] bug fix - sample rate was being ignored in vocoder dataset (#4518) * bug fix - sample rate was being ignored in vocoder dataset when not loading mel * handled n segments for a different sampling rate than original sampling rate * Added case for n_segments 0, warning for n_segments greater than file length Signed-off-by: Paarth Neekhara <paarth.n@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Add EMA support to NeMo (#4764) * Added Base files Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Some refactors, swap to using MNIST Lnet Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add a few more tests, allow the callback to be set via the exp manager Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Actually run validation for testing Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Run isort Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add test for saving state/fix saving state Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Use dummy model Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix test Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add copyright Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Support saving separate EMA weight module Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add standalone functionality/logging Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Expose more parameters Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Modify to allow option to replace validation Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add jenkins test, formatting Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Pin Transformers version to fix CI (#4955) * Pin transformers version in CI to prevent offline tokenizer loading error Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Drop version Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Disable offline temporarily Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Disable offline temporarily Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Enable offline Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add cherry-pick action (#4958) (#4961) * add cherry-pick action Signed-off-by: ericharper <complex451@gmail.com> * Pin Transformers version to fix CI (#4955) * Pin transformers version in CI to prevent offline tokenizer loading error Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Drop version Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Disable offline temporarily Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Disable offline temporarily Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Enable offline Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix changelog builder (#4962) (#4963) Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * fix cherry pick workflow (#4964) (#4965) Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * reorder model check (#4959) (#4967) Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * check for active conda environment (#4970) (#4971) Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * [TTS] fix broken tutorial for MixerTTS. (#4949) (#4976) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Checkpoint averaging class fix (#4946) * 1. Added args.class_path to provide it externally. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add ability to give seperate datasets for test, train and validation (#4798) * Add ability to give seperate datasets for test, train and validation * Addressed Sandeeps comments * Addressed Sandeeps comments * Add ability to give seperate datasets for test, train and validation * Add ability to give seperate datasets for test, train and validation * Addressed review comments * Bug fix for common dataset utils * Add CI tests Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> * Reformat code Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> * Bug fix Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> * Bug fix * Bug Fix * Bug Fix * Update Jenkinsfile * Addressed comments * Addressed Eriks comments. * Addressed Sandeep * Update Jenkinsfile * Update Jenkinsfile * Update dataset_utils.py * Update Jenkinsfile * Update Jenkinsfile * Use GPT CI config Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * fix label models restoring issue from wrighted cross entropy (#4968) (#4975) Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add simple pre-commit file (#4983) * Add simple pre-commit file Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Exclude docs folder Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 053bd5ba579537a5f311b431871c21f3381b43eb. Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execution environment (#4951) Signed-off-by: Jin Li <liji@nvidia.com> Signed-off-by: Jin Li <liji@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Adding speaker embedding conditioning in fastpitch (#4986) Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix ASR issues (#4984) (#4991) * Fix ASR issues Signed-off-by: smajumdar <smajumdar@nvidia.com> * Revert fix Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix current tests Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * More test coverage Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address reviews Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Drop bf16 test Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address review Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * remove print Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add bf16 Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: smajumdar <smajumdar@nvidia.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Jin Li <liji@nvidia.com> Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: shanmugamr1992 <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: liji-nv <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Subhankar Ghosh <subhankar2321@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix BF16 test (#5162) Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Fix errors in speaker diarization nemo docs (#5153) * fix docs and docstrings for MSDD Signed-off-by: Taejin Park <tango4j@gmail.com> * fix nemo docs errors Signed-off-by: Taejin Park <tango4j@gmail.com> * reflected review comments Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * Add interleaved pipeline schedule to GPT (#5025) * add virtual pipeline size to config Signed-off-by: ericharper <complex451@gmail.com> * convert model to list of modules Signed-off-by: ericharper <complex451@gmail.com> * convert model to list of modules Signed-off-by: ericharper <complex451@gmail.com> * convert model to list of modules Signed-off-by: ericharper <complex451@gmail.com> * update for list of modules Signed-off-by: ericharper <complex451@gmail.com> * add virtual to init Signed-off-by: ericharper <complex451@gmail.com> * update first last stage embedding all reduce Signed-off-by: ericharper <complex451@gmail.com> * update sequence parallel all reduce for virtual models Signed-off-by: ericharper <complex451@gmail.com> * runs but we get an error Signed-off-by: ericharper <complex451@gmail.com> * set virtual rank 0 after looping Signed-off-by: ericharper <complex451@gmail.com> * account for virtual when determinining first and last pipeline stages Signed-off-by: ericharper <complex451@gmail.com> * checkpointing for virtual models in progress Signed-off-by: ericharper <complex451@gmail.com> * add checkpoint hooks Signed-off-by: ericharper <complex451@gmail.com> * working on validation when resuming Signed-off-by: ericharper <complex451@gmail.com> * skip sanity val steps by default in config Signed-off-by: ericharper <complex451@gmail.com> * remove comment Signed-off-by: ericharper <complex451@gmail.com> * log number of params Signed-off-by: ericharper <complex451@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style Signed-off-by: ericharper <complex451@gmail.com> * check if self.model is a list Signed-off-by: ericharper <complex451@gmail.com> * make virtual pipeline default size None on init Signed-off-by: ericharper <complex451@gmail.com> * make virtual pipeline default to None in config Signed-off-by: ericharper <complex451@gmail.com> * remove ensure_divisibility call Signed-off-by: ericharper <complex451@gmail.com> * fix lgtm alerts Signed-off-by: ericharper <complex451@gmail.com> * remove num_sanity_val_steps from config Signed-off-by: ericharper <complex451@gmail.com> * default virtual pipeline size to none Signed-off-by: ericharper <complex451@gmail.com> * check for list Signed-off-by: ericharper <complex451@gmail.com> * update assert to make sure we are only doing virtual for gpt Signed-off-by: ericharper <complex451@gmail.com> * revert change to get_params_for_weight_decay Signed-off-by: ericharper <complex451@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init var Signed-off-by: ericharper <complex451@gmail.com> * add import guard for set virtual model parallel world size Signed-off-by: ericharper <complex451@gmail.com> * use import guard Signed-off-by: ericharper <complex451@gmail.com> * update calls to fake init in eval scripts Signed-off-by: ericharper <complex451@gmail.com> * add _get_fwd_bwd_function Signed-off-by: ericharper <complex451@gmail.com> * log all total model parameters Signed-off-by: ericharper <complex451@gmail.com> * remove unused import Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * reduced to 14 inactive days to be stale for PRs. (#5165) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com> * refactor TTS documentation organization and add new contents. (#5137) * refactor TTS documentation organization and add new contents. * fix asr api bug. * fix broken links. * fix unexpected indentation errors. * fixed unexpected indentation. * fixed broken paper reference. * fixed cross-reference and typos. * fixed toctree errors. * revert to 'Augmentors' * reordered TTS tutorial list in starthere. * ordered api classes alphabetically for each Section. * fixed underscore typo for fastpitch checkpoint. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * upcase 'Tuning' Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fixed typo for RAD-TTS Aligner Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * reorder aligner section after mel-gen and vocoders in models.rst. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * clarify Mixer-TTS-X and reorder model descriptions alphabetically. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fixed some typos and formats. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * removed old megatron.rst. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fixed block quote ends without a blank line warnings. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * remove duplicate reference; fixed missing key nlp-megatron-shoeybi2019megatron Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Revert "removed old megatron.rst." This reverts commit c5ea1dc3f23272eecfe8040e3abfa54fa122cf73. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * removed Russian, a hyphen, and add a note about G2P in tts/…
- Loading branch information