Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Megatron positional encoding alibi fix (#5808) (#5863) * 1. Debugging. * 1. Debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. * 1. Debugging. * 1. Fixed initialization. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. * 1. Removed scale from ALiBi. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated yaml and added support to control number of alibi heads. Signed-off-by: Micha Livne <mlivne@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Removed num_attention_heads_alibi from configs. Signed-off-by: Micha Livne <mlivne@nvidia.com> Signed-off-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Signed-off-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix segmenting for pcla inference (#5849) * Fix segmenting for pcla inference Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> * Fix segmenting for pcla inference Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * indentation fix (#5861) (#5862) Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * add ambernet to readme (#5872) (#5873) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix wrong label mapping in batch_inference for label_model (#5767) (#5870) * fix batch inference * add test for batch * fix device Signed-off-by: fayejf <fayejf07@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * WAR for https://github.com/pytorch/pytorch/pull/91526 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864) * fix data simulator Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * Adding noise_manifest handling for faster speed Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added multi-gpu feature Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added a parameter for noise source file number Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed noise_manifest error bug Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * RETRO model finetuning (#5800) * add save and load dynmaic index Signed-off-by: Yi Dong <yidong@nvidia.com> * add chunk stride feature Signed-off-by: Yi Dong <yidong@nvidia.com> * add chunk stride feature Signed-off-by: Yi Dong <yidong@nvidia.com> * add no pq index Signed-off-by: Yi Dong <yidong@nvidia.com> * added megatron lm compatible mode Signed-off-by: Yi Dong <yidong@nvidia.com> * addd config Signed-off-by: Yi Dong <yidong@nvidia.com> * fix position embedding Signed-off-by: Yi Dong <yidong@nvidia.com> * added index factory Signed-off-by: Yi Dong <yidong@nvidia.com> * share neighbors and weights amoung strategies Signed-off-by: Yi Dong <yidong@nvidia.com> * fix bug Signed-off-by: Yi Dong <yidong@nvidia.com> * added metric tto faiss index Signed-off-by: Yi Dong <yidong@nvidia.com> * set default to inner product Signed-off-by: Yi Dong <yidong@nvidia.com> * added qa fine tuen dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * added fine tuning code Signed-off-by: Yi Dong <yidong@nvidia.com> * trim it Signed-off-by: Yi Dong <yidong@nvidia.com> * fix data issue Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * added version Signed-off-by: Yi Dong <yidong@nvidia.com> * fix key error Signed-off-by: Yi Dong <yidong@nvidia.com> * make sure to overwrite the cfg Signed-off-by: Yi Dong <yidong@nvidia.com> * make multiple sentence bert available Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the document Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the table Signed-off-by: Yi Dong <yidong@nvidia.com> * fix transformer Signed-off-by: Yi Dong <yidong@nvidia.com> * make sure to turn off the rope in chunked cross attention layer Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the security issue Signed-off-by: Yi Dong <yidong@nvidia.com> * style fix Signed-off-by: Yi Dong <yidong@nvidia.com> * fix codeql issues Signed-off-by: Yi Dong <yidong@nvidia.com> * fix Signed-off-by: Yi Dong <yidong@nvidia.com> * use -1 Signed-off-by: Yi Dong <yidong@nvidia.com> * fix empty index Signed-off-by: Yi Dong <yidong@nvidia.com> * clean up Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the lower bound for repetition penalty Signed-off-by: Yi Dong <yidong@nvidia.com> * add retro qa inference strategy Signed-off-by: Yi Dong <yidong@nvidia.com> * added new inference logic Signed-off-by: Yi Dong <yidong@nvidia.com> * working inference Signed-off-by: Yi Dong <yidong@nvidia.com> * fix TP inference Signed-off-by: Yi Dong <yidong@nvidia.com> * revert requirement Signed-off-by: Yi Dong <yidong@nvidia.com> * added file inference Signed-off-by: Yi Dong <yidong@nvidia.com> * use string to prevent collison Signed-off-by: Yi Dong <yidong@nvidia.com> * use NQ test Signed-off-by: Yi Dong <yidong@nvidia.com> * fix prompt Signed-off-by: Yi Dong <yidong@nvidia.com> * fix inference Signed-off-by: Yi Dong <yidong@nvidia.com> * set good defaults for demo Signed-off-by: Yi Dong <yidong@nvidia.com> * replicate adlr Signed-off-by: Yi Dong <yidong@nvidia.com> * make sure to turn off attention reset for megatron lm compatible model Signed-off-by: Yi Dong <yidong@nvidia.com> * style fix Signed-off-by: Yi Dong <yidong@nvidia.com> * fix typo Signed-off-by: Yi Dong <yidong@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix inference error Signed-off-by: Yi Dong <yidong@nvidia.com> * fix logging Signed-off-by: Yi Dong <yidong@nvidia.com> * address comments Signed-off-by: Yi Dong <yidong@nvidia.com> --------- Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * [TTS] GAN-based spectrogram enhancer (#5565) * [TTS] add SpectrogramEnhancer based on StyleGAN 2 Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] some tests for spectrogram enhancer Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: a tiny clean up Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: log images during training Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * exp_manager: pass save_on_train_epoch_end to checkpointing callback Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: add training script and config examples Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix comments Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: don't assume FastPitch Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: better input shapes handling Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix porting error Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix logging and .nemo saving Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: clean up scaling Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: update examples Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: shape handling Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: remove LoggerCollection handling Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: copyright notice for tests Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: use process_batch helper Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: return empty list of available models Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: some docs Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: style --fix Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: chan_last -> channel_last Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: remove unused imports Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: remove unused return value Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: losses are nn.Modules now Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: init optimizers from config Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: unused imports Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: typechecking Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: more tests Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix logging images Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: unclutter prepare_batch Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: init generator and discriminator from the config for consistency with other NeMo models Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: update spectrogram range in the example config Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: comment on loss weights in the example config Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: rename Conv2DMod to Conv2DModulated Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: remove unused imports Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix CodeQL import warnings Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: type_as_recursive -> to_device_recursive Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: move to_device_recursive to helpers Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: move losses to a separate module, add comments Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: add optimizers' entries to config Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix test configs Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: support length masking for 3-dim tensors Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: add masking to spectrogram normalization Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix tests Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: add spectrogram normalization tests Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix imports and formatting in tests Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix docstring typo Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: rename G and D to generator and discriminator Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: better argument naming in interfaces (condition -> input_spectograms, target -> target_spectrograms) Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [TTS] SpectrogramEnhancer: fix import warnings in modules Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] add resynthesize_dataset.py script Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] add PairedRealFakeSpectrogramsDataset Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: update example config to reflect new data setup Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] resynthesize_dataset.py: remove unused imports Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] resynthesize_dataset.py: use nemo manifest handling Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] resynthesize_dataset.py: remove unused import Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] resynthesize_dataset.py: underscores for .npy names Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: remove return value from a test Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] add length masking helper Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: use common tts length mask function Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] unused imports in tts helpers Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: fix an import Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: introduce computed upsample_factor to generator Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: clean up and clarify validation data setup Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: remove a hardcoded path in the example config Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] SpectrogramEnhancer: configurize max_spectrogram_length in generator Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [TTS] resynthesize_dataset.py: consistent dashes and underscores in CLI args Signed-off-by: Roman Korostik <rkorostik@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Roman Korostik <rkorostik@nvidia.com> Signed-off-by: Roman Korostik <racoiaws@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Optimizing distributed Adam when running with one work queue (#5560) * Dist Adam constructs a single param bucket for each GPT layer Signed-off-by: Tim Moon <tmoon@nvidia.com> * Synchronize dist Adam reduce-scatters before launching model-parallel all-reduces Signed-off-by: Tim Moon <tmoon@nvidia.com> * Configure per-layer dist Adam buckets for BERT and T5 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove unused variables Signed-off-by: Tim Moon <tmoon@nvidia.com> * Configure GPT with one dist Adam bucket per virtual pipeline stage Signed-off-by: Tim Moon <tmoon@nvidia.com> * Configure BERT with one dist Adam bucket per virtual pipeline stage Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit in Dockerfile Need recent updates to Apex distributed Adam optimizer. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove logic for per-virtual-pipeline distopt buckets from T5 Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * fix(readme): fix typo (#5883) Signed-off-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org> Signed-off-by: Jason <jasoli@nvidia.com> * TTS inference with Heteronym classification model, hc model inference refactoring (#5768) * refactor inference, fix span detection Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix merge conflicts Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix merge conflicts Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused var Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, test update Signed-off-by: ekmb <ebakhturina@nvidia.com> * arg name update Signed-off-by: ekmb <ebakhturina@nvidia.com> * merge wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert changes Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docs, move heteronym to baseg2p Signed-off-by: ekmb <ebakhturina@nvidia.com> * change wordid file defaults to none Signed-off-by: ekmb <ebakhturina@nvidia.com> * add manifest check Signed-off-by: ekmb <ebakhturina@nvidia.com> * replace homograph with heteronym, upper case wordid for riva, review feedback Signed-off-by: ekmb <ebakhturina@nvidia.com> * add log message, update comment Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename test manifest field Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * take out retro doc (#5885) (#5886) Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Add option to disable distributed parameters in distributed Adam optimizer (#5685) * Add option to run dist Adam without distributed params Similar to DDP, but leverages dist Adam's support for overlapping communication with backward compute Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix bug in grad clipping when dist Adam has redundant params Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * [ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774) * Separate full BPE dataset construction Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Fix the case when the dataset is None Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Fix comment Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Fix typos Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Separate char dataset construction. Fix DALI dataset usage. Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * transformer duration added and IPA config files added Signed-off-by: Jason <jasoli@nvidia.com> * inference issue for pace resolved Signed-off-by: Jason <jasoli@nvidia.com> * Latest ONNX develpoments Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Remove MCD_DTW tarball (#5889) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Block large files from being merged into NeMo main (#5898) * Attempt to use large-file pre-commit ci hook Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Set defaults and enforce Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Set to 1000 Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove enforcement Signed-off-by: SeanNaren <snarenthiran@nvidia.com> --------- Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876) * Updated offline_clustering.py, the getMultiScaleCosAffinityMatrix function, reduced memory usage Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com> * torch.empty.cache() outside forward_infer() Signed-off-by: Taejin Park <tango4j@gmail.com> * Removed unnecessary lines Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Speed up for non torch.jit.script Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * parallelism is default off Signed-off-by: Taejin Park <tango4j@gmail.com> * nme_mat_size is unified as 512, removing redundant docstring Signed-off-by: Taejin Park <tango4j@gmail.com> --------- Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * set max_steps for lr decay through config (#5780) * set max_steps for lr decay through config * added warning for optim sched max_steps config option * reverted changes to modelPT and updated megatron_base_model * added the experimental cosine annealing scheduler class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update decay_steps for consine annealing exp class * added copyright --------- Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix transducer and question answering tutorial bugs bugs (#5809) (#5810) Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * update apex install instructions (#5901) (#5902) Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Hybrid ASR-TTS models (#5659) Add hybrid ASR-TTS models and text-to-text dataset Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Set providers for ORT inference session (#5903) Signed-off-by: athitten <abhishreetm@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * [ASR] Configurable metrics for audio-to-audio + removed experimental decorators (#5827) * Added an option to configure metrics for audio-to-audio models Removed experimental decorators Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Addressed review comments Signed-off-by: Ante Jukić <ajukic@nvidia.com> --------- Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Correct doc for RNNT transcribe() function (#5904) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Add segmentation export to Audacity label file (#5857) * Save the segmentation as label file for Audacity Audacity is a free open source audio editor that can import label file to quickly assess the segmentation quality. This commit add the export to [Audacity label format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html) so that directly after running the segmentation tool the segmentation quality can be assessed or the segmentation can be shared easily. Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com> * Fix styling Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused score in audacity export score is not written in audacity label file so we can safely not load it from segment. Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com> --------- Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Cross-Lingual objectives (XLM) and multilingual (many-many) support for Megatron-NMT (#5026) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix multilingual setup Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update NMT script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update training script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Log consumed samples Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Logging on val epoch end Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove redundant print Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update error message Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Restore fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove ipdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Move to classmethods Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Initial Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Refactor masking to add skip_masking_id and working xlm bert and t5 datasets Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Testing a simple solution Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Updated yaml. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Fixed warnings. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed a bug. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Updates Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix minor data things Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Lang ids for validation datasets Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * More fixes for lang id code at inference Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove pdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix prepend ID and bleu logging Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Refactor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes for many-many NMT Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Reset o2 default Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore dataset utils Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Allreduce bleu scores Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Loading index file into memmap object. Signed-off-by: Micha Livne <mlivne@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed extentin when loading files. Signed-off-by: Micha Livne <mlivne@nvidia.com> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix redundant building Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * PP > 2 for NMT Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Merge and fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Refactor multilingual again Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor and verify data formats Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * more fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix passing langs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * More fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes for bart Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> Signed-off-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * ONNX export working Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fixing unit test Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Update isort to the latest version (#5895) Update isort to the latest version Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Pin isort version (#5914) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Moved eval notebook data to aws (#5911) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * FilterbankFeaturesTA to match FilterbankFeatures (#5913) Signed-off-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * fixed missing long_description_content_type (#5909) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * added TPMLP for T5-based models (#5840) (#5841) Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David <amosalla@asu.edu> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fixing 0-size issue and ONNX BS>1 trace Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fixing code scan alert Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * update container (#5917) Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * remove conda pynini install (#5921) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Merge release main (#5916) * update branch Signed-off-by: ericharper <complex451@gmail.com> * added TPMLP for T5-based models (#5840) Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> * remove notebook (#5859) Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David <amosalla@asu.edu> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Dynamic freezing in Nemo (#5879) * Initial commit for dynamic freezing logic Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated logic to handle lists and updated docs Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Transferred dynamic freezing logic to core from asr Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert asr config to original Signed-off-by: Daniel Egert <degert@nvidia.com> * Fixed tab indent in core.rst Signed-off-by: Daniel Egert <degert@nvidia.com> * Updated modelPT for latest from master Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed indents in docs Signed-off-by: Daniel Egert <degert@nvidia.com> --------- Signed-off-by: Daniel Egert <degert@nvidia.com> Co-authored-by: Daniel Egert <degert@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix Windows bug with save_restore_connector (#5919) * Initial commit for Windows bug with save_to Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <degert@nvidia.com> Co-authored-by: Daniel Egert <degert@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * add new lannguages to doc (#5939) Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Workarounds for ONNX export with autocast Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * fix val loss computation in megatron (#5871) * fix val loss computation in megatron * Fix NaN handling during validation --------- Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com> Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Restoring sigmas Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Add core classes and functions for online clustering diarizer part 2 (#5609) * Add core classes and functions for online clustering diarizer Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add audio to labels code Signed-off-by: Taejin Park <tango4j@gmail.com> * resolve type errors Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added unit=tests for very short audio Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Filled all missing docstrings Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved conflict and added missing docstrings Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed unit-test errors Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the wrongly added file - megatron_gpt_model.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Fix wrongly included file - megatron_gpt_model.py Signed-off-by: Taejin Park <tango4j@gmail.com> * resolve code quality issue Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed unit-test errors and bugs Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changed total_sec for offline_clustering toy_data in unit-tests Signed-off-by: Taejin Park <tango4j@gmail.com> * fixed merging index offset bug Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only including part 1 files Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused function Signed-off-by: Taejin Park <tango4j@gmail.com> * fixed unused imports Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * divided nmesc_clustering.py into two and reflected first-pass comments Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding offline/online_clustering.py Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code QL autocomment Signed-off-by: Taejin Park <tango4j@gmail.com> * Removed unused imports Signed-off-by: Taejin Park <tango4j@gmail.com> * Update nemo/collections/asr/parts/utils/online_clustering.py Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> * Reflected comments Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved code scanning issue Signed-off-by: Taejin Park <tango4j@gmail.com> * Adding online_diarizer.py Signed-off-by: Taejin Park <tango4j@gmail.com> * updated tests and speaker_utils Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the wrong test eval Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating online diarizer for varialbe name change Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected comments and some typo fixes in speaker_utils Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Distributed Adam optimizer overlaps param all-gather with forward compute (#5684) * Add distopt support for overlapping param all-gather with forward compute Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * [TTS][ZH] added new NGC model cards with polyphone disambiguation. (#5940) * [TTS][ZH] added new NGC model cards with polyphone disambiguation. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Moved truncation of context higher up Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * [TN] bugfix file handler is not closed. (#5955) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Added unit test for regulate_len. Unscripted sort_tensor for TRT Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fixed slice Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * [TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * bugfix: file handlers are not closed. (#5956) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * [TTS][G2P] deprecate add_symbols (#5961) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * fix broken link (#5968) Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix hybridasr bug (#5950) (#5957) Signed-off-by: Jason <jasoli@nvidia.com> * Added list_available_models (#5967) * Added list_available_models Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com> * Added to readme Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru> * added vits to docs Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru> * added vits to docs Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru> --------- Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com> Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru> Signed-off-by: Jason <jasoli@nvidia.com> * Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` (#5947) * Move project settings to pyproject.toml Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Remove setup.cfg Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Remove deprecated pytest-runner Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Add comments Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Allow only registered markers for pytest Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Fix torchaudio installation (#5850) * Fail if torchaudio not installed Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Fix torchaudio matching version Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Warn if Pytorch major version changed Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Update fastpitch.py (#5969) Signed-off-by: Jason <jasoli@nvidia.com> * Review comments Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * per-micro-batch input loader (#5635) * per-micro-batch input loader * per-micro-batch input loader set arg default val * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * apply per-microbatch-loader to only GPT * update docstring on micro-batch input loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed the default arg val * fix batch size to 1 at log stat registration * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update container for CI Signed-off-by: ericharper <complex451@gmail.com> * update container in jenkinsfile Signed-off-by: ericharper <complex451@gmail.com> * update container for CI Signed-off-by: ericharper <complex451@gmail.com> fix merge conflict * revert Jenkinsfile * Revert "revert Jenkinsfile" This reverts commit d23b7757e0f935dacde2840f234193c632a2b3be. * Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * add GradScaler * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <complex451@gmail.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * update container in readme (#5981) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * Support Alignment Extraction for all RNNT Beam decoding methods (#5925) * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <titu1994@gmail.com> * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <titu1994@gmail.com> * Remove everything else Signed-off-by: smajumdar <titu1994@gmail.com> * Support dataclass in AbstractRNNTDecoding Signed-off-by: smajumdar <titu1994@gmail.com> * Add first draft unittest Signed-off-by: smajumdar <titu1994@gmail.com> * Correct the logic to more to the next timestep in the alignment Signed-off-by: smajumdar <titu1994@gmail.com> * Finalize ALSD alignment generation Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for TSD greedy alignment extraction Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for mAES greedy alignment extraction Signed-off-by: smajumdar <titu1994@gmail.com> * Finalize extraction of alignments from all beam algorithms for RNNT Signed-off-by: smajumdar <titu1994@gmail.com> * Style fixes Signed-off-by: smajumdar <titu1994@gmail.com> * Add copyright Signed-off-by: smajumdar <titu1994@gmail.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * Add AWS SageMaker ASR Examples (#5638) * Base code for AWS SageMaker example Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove format Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * wrap Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add a notebook with the code Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Setup Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Update notebook Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Remove space Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix spelling mistake Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add message to explain usage Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add CommonVoice esperanto example Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix path Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fixes Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Import sox locally, add documentation Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address reviews Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address reviews Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address reviews Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add cell to download the SSL model Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Set max epochs to 300 Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fixes, introduce HF dataset instructions Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Upstream updates from other branch Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix warning Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Add README, add image Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Fix warning Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Address feedback Signed-off-by: SeanNaren <snarenthiran@nvidia.com> * Feedback Signed-off-by: SeanNaren <snarenthiran@nvidia.com> --------- Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> * Update PUBLICATIONS.md (#5963) * Add papers from 2022/2022 to PUBLICATIONS.md Signed-off-by: smajumdar <titu1994@gmail.com> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <titu1994@gmail.com> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <titu1994@gmail.com> * Add additional papers Signed-off-by: smajumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> * [G2P] fixed typos and broken import library. (#5978) (#5979) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> * [G2P] added backward compatibility for english tokenizer and fixed unit tests (#5980) (#5984) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Micha Livne <mlivne@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Roman Korostik <rkorostik@nvidia.com> Signed-off-by: Roman Korostik <racoiaws@users.noreply.github.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org> Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: athitten <abhishreetm@gmail.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com> Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> Signed-off-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Signed-off-by: Daniel Egert <degert@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com> Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Roman Korostik <racoiaws@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Mikyas Desta <miktekabi@gmail.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Gabriel Pirlogeanu <53811655+gabitza-tech@users.noreply.github.com> Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com> Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: athitten <47577437+athitten@users.noreply.github.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: David <amosalla@asu.edu> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com> Co-authored-by: Daniel Egert <degert@nvidia.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com> Co-authored-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com> Co-authored-by: Jason <jasoli@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com>
- Loading branch information