Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[G2P] backward compatibility for english tokenizer and bugfix #5980

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

XuesongYang
Copy link
Collaborator

Signed-off-by: Xuesong Yang 1646669+XuesongYang@users.noreply.github.com

What does this PR do ?

added backward compatibility and fixed relevant unit tests.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

tests.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
@XuesongYang XuesongYang merged commit 1fbeb7b into r1.16.0 Feb 10, 2023
@XuesongYang XuesongYang deleted the bugfix-tokenize-english branch February 10, 2023 07:34
github-actions bot pushed a commit that referenced this pull request Feb 10, 2023
…it tests (#5980)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
XuesongYang added a commit that referenced this pull request Feb 10, 2023
…it tests (#5980) (#5984)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
blisc pushed a commit to borisfom/NeMo that referenced this pull request Feb 10, 2023
…it tests (NVIDIA#5980) (NVIDIA#5984)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>
blisc added a commit that referenced this pull request Feb 10, 2023
* Megatron positional encoding alibi fix (#5808) (#5863)

* 1. Debugging.

* 1. Debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

* 1. Debugging.

* 1. Fixed initialization.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

* 1. Removed scale from ALiBi.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated yaml and added support to control number of alibi heads.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Removed num_attention_heads_alibi from configs.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix segmenting for pcla inference (#5849)

* Fix segmenting for pcla inference

Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>

* Fix segmenting for pcla inference

Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* indentation fix (#5861) (#5862)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* add ambernet to readme (#5872) (#5873)

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix wrong label mapping in batch_inference for label_model (#5767) (#5870)

* fix batch inference

* add test for batch

* fix device

Signed-off-by: fayejf <fayejf07@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* WAR for https://github.com/pytorch/pytorch/pull/91526

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864)

* fix data simulator

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Adding noise_manifest handling for faster speed

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added multi-gpu feature

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added a parameter for noise source file number

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed noise_manifest error bug

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* RETRO model finetuning (#5800)

* add save and load dynmaic index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add chunk stride feature

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add chunk stride feature

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add no pq index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added megatron lm compatible mode

Signed-off-by: Yi Dong <yidong@nvidia.com>

* addd config

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix position embedding

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added index factory

Signed-off-by: Yi Dong <yidong@nvidia.com>

* share neighbors and weights amoung strategies

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added metric tto faiss index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set default to inner product

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added qa fine tuen dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added fine tuning code

Signed-off-by: Yi Dong <yidong@nvidia.com>

* trim it

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix data issue

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added version

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix key error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure to overwrite the cfg

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make multiple sentence bert available

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the document

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the table

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix transformer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure to turn off the rope in chunked cross attention layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the security issue

Signed-off-by: Yi Dong <yidong@nvidia.com>

* style fix

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix codeql issues

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use -1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix empty index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* clean up

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the lower bound for repetition penalty

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add retro qa inference strategy

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added new inference logic

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix TP inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* revert requirement

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added file inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use string to prevent collison

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use NQ test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix prompt

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set good defaults for demo

Signed-off-by: Yi Dong <yidong@nvidia.com>

* replicate adlr

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure to turn off attention reset for megatron lm compatible model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* style fix

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix typo

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix inference error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix logging

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS] GAN-based spectrogram enhancer (#5565)

* [TTS] add SpectrogramEnhancer based on StyleGAN 2

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] some tests for spectrogram enhancer

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: a tiny clean up

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: log images during training

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* exp_manager: pass save_on_train_epoch_end to checkpointing callback

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add training script and config examples

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix comments

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: don't assume FastPitch

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: better input shapes handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix porting error

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix logging and .nemo saving

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: clean up scaling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: update examples

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: shape handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove LoggerCollection handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: copyright notice for tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: use process_batch helper

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: return empty list of available models

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: some docs

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: style --fix

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: chan_last -> channel_last

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove unused return value

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: losses are nn.Modules now

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: init optimizers from config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: typechecking

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: more tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix logging images

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: unclutter prepare_batch

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: init generator and discriminator from the config for consistency with other NeMo models

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: update spectrogram range in the example config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: comment on loss weights in the example config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: rename Conv2DMod to Conv2DModulated

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix CodeQL import warnings

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: type_as_recursive -> to_device_recursive

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: move to_device_recursive to helpers

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: move losses to a separate module, add comments

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add optimizers' entries to config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix test configs

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: support length masking for 3-dim tensors

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add masking to spectrogram normalization

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add spectrogram normalization tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix imports and formatting in tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix docstring typo

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: rename G and D to generator and discriminator

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: better argument naming in interfaces (condition -> input_spectograms, target -> target_spectrograms)

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [TTS] SpectrogramEnhancer: fix import warnings in modules

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] add resynthesize_dataset.py script

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] add PairedRealFakeSpectrogramsDataset

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: update example config to reflect new data setup

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: remove unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: use nemo manifest handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: remove unused import

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: underscores for .npy names

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove return value from a test

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] add length masking helper

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: use common tts length mask function

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] unused imports in tts helpers

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix an import

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: introduce computed upsample_factor to generator

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: clean up and clarify validation data setup

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove a hardcoded path in the example config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: configurize max_spectrogram_length in generator

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: consistent dashes and underscores in CLI args

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
Signed-off-by: Roman Korostik <racoiaws@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Optimizing distributed Adam when running with one work queue (#5560)

* Dist Adam constructs a single param bucket for each GPT layer

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Synchronize dist Adam reduce-scatters before launching model-parallel all-reduces

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Configure per-layer dist Adam buckets for BERT and T5

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove unused variables

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Configure GPT with one dist Adam bucket per virtual pipeline stage

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Configure BERT with one dist Adam bucket per virtual pipeline stage

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit in Dockerfile

Need recent updates to Apex distributed Adam optimizer.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove logic for per-virtual-pipeline distopt buckets from T5

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fix(readme): fix typo (#5883)

Signed-off-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Signed-off-by: Jason <jasoli@nvidia.com>

* TTS inference with Heteronym classification model, hc model inference refactoring (#5768)

* refactor inference, fix span detection

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix merge conflicts

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix merge conflicts

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused var

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, test update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* arg name update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert changes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docs, move heteronym to baseg2p

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change wordid file defaults to none

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add manifest check

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* replace homograph with heteronym, upper case wordid for riva, review feedback

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add log message, update comment

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename test manifest field

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* take out retro doc (#5885) (#5886)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add option to disable distributed parameters in distributed Adam optimizer (#5685)

* Add option to run dist Adam without distributed params

Similar to DDP, but leverages dist Adam's support for overlapping communication with backward compute

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix bug in grad clipping when dist Adam has redundant params

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774)

* Separate full BPE dataset construction

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix the case when the dataset is None

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix typos

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Separate char dataset construction. Fix DALI dataset usage.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* transformer duration added and IPA config files added

Signed-off-by: Jason <jasoli@nvidia.com>

* inference issue for pace resolved

Signed-off-by: Jason <jasoli@nvidia.com>

* Latest ONNX develpoments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Remove MCD_DTW tarball (#5889)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Block large files from being merged into NeMo main (#5898)

* Attempt to use large-file pre-commit ci hook

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Set defaults and enforce

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Set to 1000

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove enforcement

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

---------

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876)

* Updated offline_clustering.py, the getMultiScaleCosAffinityMatrix function, reduced memory usage

Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com>

* torch.empty.cache() outside forward_infer()

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Removed unnecessary lines

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Speed up for non torch.jit.script

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* parallelism is default off

Signed-off-by: Taejin Park <tango4j@gmail.com>

* nme_mat_size is unified as 512, removing redundant docstring

Signed-off-by: Taejin Park <tango4j@gmail.com>

---------

Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* set max_steps for lr decay through config (#5780)

* set max_steps for lr decay through config

* added warning for optim sched max_steps config option

* reverted changes to modelPT and updated megatron_base_model

* added the experimental cosine annealing scheduler class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update decay_steps for consine annealing exp class

* added copyright

---------

Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix transducer and question answering tutorial bugs bugs (#5809) (#5810)

Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* update apex install instructions (#5901) (#5902)

Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Hybrid ASR-TTS models (#5659)

Add hybrid ASR-TTS models and text-to-text dataset

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Set providers for ORT inference session (#5903)

Signed-off-by: athitten <abhishreetm@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [ASR] Configurable metrics for audio-to-audio + removed experimental decorators (#5827)

* Added an option to configure metrics for audio-to-audio models
Removed experimental decorators

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Addressed review comments

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

---------

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Correct doc for RNNT transcribe() function (#5904)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add segmentation export to Audacity label file (#5857)

* Save the segmentation as label file for Audacity

Audacity is a free open source audio editor that can import label file to quickly assess the segmentation quality. This commit add the export to [Audacity label format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html) so that directly after running the segmentation tool the segmentation quality can be assessed or the segmentation can be shared easily.

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>

* Fix styling

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused score in audacity export

score is not written in audacity label file so we can safely not load it from segment.

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>

---------

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Cross-Lingual objectives (XLM) and multilingual (many-many) support for Megatron-NMT (#5026)

* Update blendable dataset, and refactor seq2seq data

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Blendable dataset with binarized mmap working

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Pass seed from cfg to dataset

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix multilingual setup

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add on epoch start reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update tokenizer creation for multilingual

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Tmp

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update NMT script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update training script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Log consumed samples

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Logging on val epoch end

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove redundant print

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Ckpt averaging for non model parallel megatron models

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Empty

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update error message

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove check

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Restore fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove ipdb

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Move to classmethods

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Initial

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* Refactor masking to add skip_masking_id and working xlm bert and t5 datasets

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Testing a simple solution

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed. Seems to work. Need to validate.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Added support in CSV and text memmap toMEgatron encoder-decoder

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Added support in CSV.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.
2. Fixed bugs.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed bugs.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Updated yaml.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Fixed warnings.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed a bug.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* Tmp

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Updates

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix minor data things

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Lang ids for validation datasets

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes for lang id code at inference

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove pdb

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix prepend ID and bleu logging

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for many-many NMT

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Reset o2 default

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore dataset utils

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Allreduce bleu scores

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Loading index file into memmap object.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed extentin when loading files.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix redundant building

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* PP > 2 for NMT

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Merge and fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor multilingual again

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor and verify data formats

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* more fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix passing langs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for bart

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@cs.toronto.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* ONNX export working

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixing unit test

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Update isort to the latest version (#5895)

Update isort to the latest version

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Pin isort version (#5914)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Moved eval notebook data to aws (#5911)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* FilterbankFeaturesTA to match FilterbankFeatures (#5913)

Signed-off-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fixed missing long_description_content_type (#5909)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* added TPMLP for T5-based models (#5840) (#5841)

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixing 0-size issue and ONNX BS>1 trace

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixing code scan alert

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* update container (#5917)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* remove conda pynini install (#5921)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Merge release main (#5916)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* added TPMLP for T5-based models (#5840)

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* remove notebook (#5859)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Dynamic freezing in Nemo (#5879)

* Initial commit for dynamic freezing logic

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated logic to handle lists and updated docs

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Transferred dynamic freezing logic to core from asr

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert asr config to original

Signed-off-by: Daniel Egert <degert@nvidia.com>

* Fixed tab indent in core.rst

Signed-off-by: Daniel Egert <degert@nvidia.com>

* Updated modelPT for latest from master

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed indents in docs

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix Windows bug with save_restore_connector (#5919)

* Initial commit for Windows bug with save_to

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* add new lannguages to doc (#5939)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Workarounds for ONNX export with autocast

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fix val loss computation in megatron (#5871)

* fix val loss computation in megatron

* Fix NaN handling during validation

---------

Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Restoring sigmas

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add core classes and functions for online clustering diarizer part 2 (#5609)

* Add core classes and functions for online clustering diarizer

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add audio to labels code

Signed-off-by: Taejin Park <tango4j@gmail.com>

* resolve type errors

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added unit=tests for very short audio

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Filled all missing docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved conflict and added missing docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed unit-test errors

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the wrongly added file - megatron_gpt_model.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fix wrongly included file - megatron_gpt_model.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* resolve code quality issue

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed unit-test errors and bugs

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed total_sec for offline_clustering toy_data in unit-tests

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed merging index offset bug

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only including part 1 files

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused function

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed unused imports

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* divided nmesc_clustering.py into two and reflected first-pass comments

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding offline/online_clustering.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code QL autocomment

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Removed unused imports

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update nemo/collections/asr/parts/utils/online_clustering.py

Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>

* Reflected comments

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved code scanning issue

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Adding online_diarizer.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* updated tests and speaker_utils

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the wrong test eval

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updating online diarizer for varialbe name change

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reflected comments and some typo fixes in speaker_utils

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Distributed Adam optimizer overlaps param all-gather with forward compute (#5684)

* Add distopt support for overlapping param all-gather with forward compute

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS][ZH] added new NGC model cards with polyphone disambiguation. (#5940)

* [TTS][ZH] added new NGC model cards with polyphone disambiguation.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Moved truncation of context higher up

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TN] bugfix file handler is not closed. (#5955)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Added unit test for regulate_len. Unscripted sort_tensor for TRT

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixed slice

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* bugfix: file handlers are not closed. (#5956)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS][G2P] deprecate add_symbols (#5961)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fix broken link (#5968)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix hybridasr bug (#5950) (#5957)

Signed-off-by: Jason <jasoli@nvidia.com>

* Added list_available_models (#5967)

* Added list_available_models

Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>

* Added to readme

Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>

* added vits to docs

Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>

* added vits to docs

Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>

---------

Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>
Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>
Signed-off-by: Jason <jasoli@nvidia.com>

* Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` (#5947)

* Move project settings to pyproject.toml

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove setup.cfg

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove deprecated pytest-runner

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Allow only registered markers for pytest

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix torchaudio installation (#5850)

* Fail if torchaudio not installed

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix torchaudio matching version

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Warn if Pytorch major version changed

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Update fastpitch.py (#5969)

Signed-off-by: Jason <jasoli@nvidia.com>

* Review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* per-micro-batch input loader (#5635)

* per-micro-batch input loader

* per-micro-batch input loader

set arg default val

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

* apply per-microbatch-loader to only GPT

* update docstring on micro-batch input loader

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed the default arg val

* fix batch size to 1 at log stat registration

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update container for CI

Signed-off-by: ericharper <complex451@gmail.com>

* update container in jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* update container for CI

Signed-off-by: ericharper <complex451@gmail.com>

fix merge conflict

* revert Jenkinsfile

* Revert "revert Jenkinsfile"

This reverts commit d23b7757e0f935dacde2840f234193c632a2b3be.

* Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* add GradScaler

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* update container in readme (#5981)

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Support Alignment Extraction for all RNNT Beam decoding methods (#5925)

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove everything else

Signed-off-by: smajumdar <titu1994@gmail.com>

* Support dataclass in AbstractRNNTDecoding

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add first draft unittest

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct the logic to more to the next timestep in the alignment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize ALSD alignment generation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for TSD greedy alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for mAES greedy alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize extraction of alignments from all beam algorithms for RNNT

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add copyright

Signed-off-by: smajumdar <titu1994@gmail.com>

* Address comments

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add AWS SageMaker ASR Examples (#5638)

* Base code for AWS SageMaker example

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove format

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* wrap

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add a notebook with the code

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Setup

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Update notebook

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove space

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix spelling mistake

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add message to explain usage

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add CommonVoice esperanto example

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix path

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fixes

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Import sox locally, add documentation

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address reviews

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address reviews

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address reviews

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add cell to download the SSL model

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Set max epochs to 300

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fixes, introduce HF dataset instructions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Upstream updates from other branch

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix warning

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add README, add image

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix warning

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address feedback

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Feedback

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

---------

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Update PUBLICATIONS.md (#5963)

* Add papers from 2022/2022 to PUBLICATIONS.md

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add additional papers

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [G2P] fixed typos and broken import library. (#5978) (#5979)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [G2P] added backward compatibility for english tokenizer and fixed unit tests (#5980) (#5984)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
Signed-off-by: Roman Korostik <racoiaws@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: athitten <abhishreetm@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>
Signed-off-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>
Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Matvei Novikov <mattyson.so@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Roman Korostik <racoiaws@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Mikyas Desta <miktekabi@gmail.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Gabriel Pirlogeanu <53811655+gabitza-tech@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: athitten <47577437+athitten@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Micha Livne <mlivne@cs.toronto.edu>
Co-authored-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
titu1994 pushed a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
…it tests (NVIDIA#5980) (NVIDIA#5984)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
* Megatron positional encoding alibi fix (#5808) (#5863)

* 1. Debugging.

* 1. Debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

* 1. Debugging.

* 1. Fixed initialization.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

* 1. Removed scale from ALiBi.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated yaml and added support to control number of alibi heads.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Removed num_attention_heads_alibi from configs.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix segmenting for pcla inference (#5849)

* Fix segmenting for pcla inference

Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>

* Fix segmenting for pcla inference

Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* indentation fix (#5861) (#5862)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* add ambernet to readme (#5872) (#5873)

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix wrong label mapping in batch_inference for label_model (#5767) (#5870)

* fix batch inference

* add test for batch

* fix device

Signed-off-by: fayejf <fayejf07@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* WAR for https://github.com/pytorch/pytorch/pull/91526

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864)

* fix data simulator

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Adding noise_manifest handling for faster speed

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added multi-gpu feature

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added a parameter for noise source file number

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed noise_manifest error bug

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* RETRO model finetuning (#5800)

* add save and load dynmaic index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add chunk stride feature

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add chunk stride feature

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add no pq index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added megatron lm compatible mode

Signed-off-by: Yi Dong <yidong@nvidia.com>

* addd config

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix position embedding

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added index factory

Signed-off-by: Yi Dong <yidong@nvidia.com>

* share neighbors and weights amoung strategies

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added metric tto faiss index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set default to inner product

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added qa fine tuen dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added fine tuning code

Signed-off-by: Yi Dong <yidong@nvidia.com>

* trim it

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix data issue

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added version

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix key error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure to overwrite the cfg

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make multiple sentence bert available

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the document

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the table

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix transformer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure to turn off the rope in chunked cross attention layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the security issue

Signed-off-by: Yi Dong <yidong@nvidia.com>

* style fix

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix codeql issues

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use -1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix empty index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* clean up

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the lower bound for repetition penalty

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add retro qa inference strategy

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added new inference logic

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix TP inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* revert requirement

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added file inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use string to prevent collison

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use NQ test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix prompt

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix inference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set good defaults for demo

Signed-off-by: Yi Dong <yidong@nvidia.com>

* replicate adlr

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make sure to turn off attention reset for megatron lm compatible model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* style fix

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix typo

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix inference error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix logging

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS] GAN-based spectrogram enhancer (#5565)

* [TTS] add SpectrogramEnhancer based on StyleGAN 2

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] some tests for spectrogram enhancer

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: a tiny clean up

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: log images during training

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* exp_manager: pass save_on_train_epoch_end to checkpointing callback

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add training script and config examples

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix comments

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: don't assume FastPitch

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: better input shapes handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix porting error

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix logging and .nemo saving

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: clean up scaling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: update examples

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: shape handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove LoggerCollection handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: copyright notice for tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: use process_batch helper

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: return empty list of available models

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: some docs

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: style --fix

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: chan_last -> channel_last

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove unused return value

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: losses are nn.Modules now

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: init optimizers from config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: typechecking

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: more tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix logging images

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: unclutter prepare_batch

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: init generator and discriminator from the config for consistency with other NeMo models

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: update spectrogram range in the example config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: comment on loss weights in the example config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: rename Conv2DMod to Conv2DModulated

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix CodeQL import warnings

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: type_as_recursive -> to_device_recursive

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: move to_device_recursive to helpers

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: move losses to a separate module, add comments

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add optimizers' entries to config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix test configs

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: support length masking for 3-dim tensors

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add masking to spectrogram normalization

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: add spectrogram normalization tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix imports and formatting in tests

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix docstring typo

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: rename G and D to generator and discriminator

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: better argument naming in interfaces (condition -> input_spectograms, target -> target_spectrograms)

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [TTS] SpectrogramEnhancer: fix import warnings in modules

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] add resynthesize_dataset.py script

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] add PairedRealFakeSpectrogramsDataset

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: update example config to reflect new data setup

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: remove unused imports

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: use nemo manifest handling

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: remove unused import

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: underscores for .npy names

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove return value from a test

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] add length masking helper

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: use common tts length mask function

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] unused imports in tts helpers

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: fix an import

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: introduce computed upsample_factor to generator

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: clean up and clarify validation data setup

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: remove a hardcoded path in the example config

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] SpectrogramEnhancer: configurize max_spectrogram_length in generator

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [TTS] resynthesize_dataset.py: consistent dashes and underscores in CLI args

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
Signed-off-by: Roman Korostik <racoiaws@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Optimizing distributed Adam when running with one work queue (#5560)

* Dist Adam constructs a single param bucket for each GPT layer

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Synchronize dist Adam reduce-scatters before launching model-parallel all-reduces

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Configure per-layer dist Adam buckets for BERT and T5

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove unused variables

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Configure GPT with one dist Adam bucket per virtual pipeline stage

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Configure BERT with one dist Adam bucket per virtual pipeline stage

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit in Dockerfile

Need recent updates to Apex distributed Adam optimizer.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove logic for per-virtual-pipeline distopt buckets from T5

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fix(readme): fix typo (#5883)

Signed-off-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Signed-off-by: Jason <jasoli@nvidia.com>

* TTS inference with Heteronym classification model, hc model inference refactoring (#5768)

* refactor inference, fix span detection

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix merge conflicts

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix merge conflicts

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused var

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, test update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* arg name update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert changes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docs, move heteronym to baseg2p

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change wordid file defaults to none

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add manifest check

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* replace homograph with heteronym, upper case wordid for riva, review feedback

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add log message, update comment

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename test manifest field

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* take out retro doc (#5885) (#5886)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add option to disable distributed parameters in distributed Adam optimizer (#5685)

* Add option to run dist Adam without distributed params

Similar to DDP, but leverages dist Adam's support for overlapping communication with backward compute

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix bug in grad clipping when dist Adam has redundant params

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774)

* Separate full BPE dataset construction

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix the case when the dataset is None

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix typos

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Separate char dataset construction. Fix DALI dataset usage.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* transformer duration added and IPA config files added

Signed-off-by: Jason <jasoli@nvidia.com>

* inference issue for pace resolved

Signed-off-by: Jason <jasoli@nvidia.com>

* Latest ONNX develpoments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Remove MCD_DTW tarball (#5889)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Block large files from being merged into NeMo main (#5898)

* Attempt to use large-file pre-commit ci hook

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Set defaults and enforce

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Set to 1000

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove enforcement

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

---------

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876)

* Updated offline_clustering.py, the getMultiScaleCosAffinityMatrix function, reduced memory usage

Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com>

* torch.empty.cache() outside forward_infer()

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Removed unnecessary lines

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Speed up for non torch.jit.script

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* parallelism is default off

Signed-off-by: Taejin Park <tango4j@gmail.com>

* nme_mat_size is unified as 512, removing redundant docstring

Signed-off-by: Taejin Park <tango4j@gmail.com>

---------

Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* set max_steps for lr decay through config (#5780)

* set max_steps for lr decay through config

* added warning for optim sched max_steps config option

* reverted changes to modelPT and updated megatron_base_model

* added the experimental cosine annealing scheduler class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update decay_steps for consine annealing exp class

* added copyright

---------

Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix transducer and question answering tutorial bugs bugs (#5809) (#5810)

Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* update apex install instructions (#5901) (#5902)

Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Hybrid ASR-TTS models (#5659)

Add hybrid ASR-TTS models and text-to-text dataset

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Set providers for ORT inference session (#5903)

Signed-off-by: athitten <abhishreetm@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [ASR] Configurable metrics for audio-to-audio + removed experimental decorators (#5827)

* Added an option to configure metrics for audio-to-audio models
Removed experimental decorators

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Addressed review comments

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

---------

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Correct doc for RNNT transcribe() function (#5904)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add segmentation export to Audacity label file (#5857)

* Save the segmentation as label file for Audacity

Audacity is a free open source audio editor that can import label file to quickly assess the segmentation quality. This commit add the export to [Audacity label format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html) so that directly after running the segmentation tool the segmentation quality can be assessed or the segmentation can be shared easily.

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>

* Fix styling

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused score in audacity export

score is not written in audacity label file so we can safely not load it from segment.

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>

---------

Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Cross-Lingual objectives (XLM) and multilingual (many-many) support for Megatron-NMT (#5026)

* Update blendable dataset, and refactor seq2seq data

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Blendable dataset with binarized mmap working

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Pass seed from cfg to dataset

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix multilingual setup

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add on epoch start reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update tokenizer creation for multilingual

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Tmp

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update NMT script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update training script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Log consumed samples

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Logging on val epoch end

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove redundant print

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Ckpt averaging for non model parallel megatron models

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Empty

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update error message

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove check

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Restore fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove ipdb

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Move to classmethods

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Initial

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* Refactor masking to add skip_masking_id and working xlm bert and t5 datasets

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Testing a simple solution

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed. Seems to work. Need to validate.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Added support in CSV and text memmap toMEgatron encoder-decoder

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Added support in CSV.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.
2. Fixed bugs.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed bugs.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Updated yaml.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Fixed warnings.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* 1. Fixed a bug.

Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>

* Tmp

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Updates

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix minor data things

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Lang ids for validation datasets

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes for lang id code at inference

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove pdb

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix prepend ID and bleu logging

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for many-many NMT

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Reset o2 default

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore dataset utils

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Allreduce bleu scores

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Loading index file into memmap object.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed extentin when loading files.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix redundant building

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* PP > 2 for NMT

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Merge and fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor multilingual again

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor and verify data formats

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* more fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix passing langs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for bart

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@cs.toronto.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* ONNX export working

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixing unit test

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Update isort to the latest version (#5895)

Update isort to the latest version

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Pin isort version (#5914)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Moved eval notebook data to aws (#5911)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* FilterbankFeaturesTA to match FilterbankFeatures (#5913)

Signed-off-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fixed missing long_description_content_type (#5909)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* added TPMLP for T5-based models (#5840) (#5841)

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixing 0-size issue and ONNX BS>1 trace

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixing code scan alert

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* update container (#5917)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* remove conda pynini install (#5921)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Merge release main (#5916)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* added TPMLP for T5-based models (#5840)

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* remove notebook (#5859)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Dynamic freezing in Nemo (#5879)

* Initial commit for dynamic freezing logic

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated logic to handle lists and updated docs

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Transferred dynamic freezing logic to core from asr

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert asr config to original

Signed-off-by: Daniel Egert <degert@nvidia.com>

* Fixed tab indent in core.rst

Signed-off-by: Daniel Egert <degert@nvidia.com>

* Updated modelPT for latest from master

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed indents in docs

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix Windows bug with save_restore_connector (#5919)

* Initial commit for Windows bug with save_to

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* add new lannguages to doc (#5939)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Workarounds for ONNX export with autocast

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fix val loss computation in megatron (#5871)

* fix val loss computation in megatron

* Fix NaN handling during validation

---------

Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Restoring sigmas

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add core classes and functions for online clustering diarizer part 2 (#5609)

* Add core classes and functions for online clustering diarizer

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add audio to labels code

Signed-off-by: Taejin Park <tango4j@gmail.com>

* resolve type errors

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added unit=tests for very short audio

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Filled all missing docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved conflict and added missing docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed unit-test errors

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the wrongly added file - megatron_gpt_model.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fix wrongly included file - megatron_gpt_model.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* resolve code quality issue

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed unit-test errors and bugs

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed total_sec for offline_clustering toy_data in unit-tests

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed merging index offset bug

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only including part 1 files

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused function

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed unused imports

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* divided nmesc_clustering.py into two and reflected first-pass comments

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding offline/online_clustering.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code QL autocomment

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Removed unused imports

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update nemo/collections/asr/parts/utils/online_clustering.py

Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>

* Reflected comments

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved code scanning issue

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Adding online_diarizer.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* updated tests and speaker_utils

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the wrong test eval

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updating online diarizer for varialbe name change

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reflected comments and some typo fixes in speaker_utils

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Distributed Adam optimizer overlaps param all-gather with forward compute (#5684)

* Add distopt support for overlapping param all-gather with forward compute

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS][ZH] added new NGC model cards with polyphone disambiguation. (#5940)

* [TTS][ZH] added new NGC model cards with polyphone disambiguation.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Moved truncation of context higher up

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TN] bugfix file handler is not closed. (#5955)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Added unit test for regulate_len. Unscripted sort_tensor for TRT

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fixed slice

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* bugfix: file handlers are not closed. (#5956)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [TTS][G2P] deprecate add_symbols (#5961)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* fix broken link (#5968)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix hybridasr bug (#5950) (#5957)

Signed-off-by: Jason <jasoli@nvidia.com>

* Added list_available_models (#5967)

* Added list_available_models

Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>

* Added to readme

Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>

* added vits to docs

Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>

* added vits to docs

Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>

---------

Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>
Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>
Signed-off-by: Jason <jasoli@nvidia.com>

* Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` (#5947)

* Move project settings to pyproject.toml

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove setup.cfg

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove deprecated pytest-runner

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Allow only registered markers for pytest

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Fix torchaudio installation (#5850)

* Fail if torchaudio not installed

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix torchaudio matching version

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Warn if Pytorch major version changed

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Update fastpitch.py (#5969)

Signed-off-by: Jason <jasoli@nvidia.com>

* Review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* per-micro-batch input loader (#5635)

* per-micro-batch input loader

* per-micro-batch input loader

set arg default val

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

* apply per-microbatch-loader to only GPT

* update docstring on micro-batch input loader

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed the default arg val

* fix batch size to 1 at log stat registration

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update container for CI

Signed-off-by: ericharper <complex451@gmail.com>

* update container in jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* update container for CI

Signed-off-by: ericharper <complex451@gmail.com>

fix merge conflict

* revert Jenkinsfile

* Revert "revert Jenkinsfile"

This reverts commit d23b7757e0f935dacde2840f234193c632a2b3be.

* Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* add GradScaler

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* update container in readme (#5981)

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Support Alignment Extraction for all RNNT Beam decoding methods (#5925)

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove everything else

Signed-off-by: smajumdar <titu1994@gmail.com>

* Support dataclass in AbstractRNNTDecoding

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add first draft unittest

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct the logic to more to the next timestep in the alignment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize ALSD alignment generation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for TSD greedy alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for mAES greedy alignment extraction

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize extraction of alignments from all beam algorithms for RNNT

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add copyright

Signed-off-by: smajumdar <titu1994@gmail.com>

* Address comments

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Add AWS SageMaker ASR Examples (#5638)

* Base code for AWS SageMaker example

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove format

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* wrap

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add a notebook with the code

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Setup

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Update notebook

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove space

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix spelling mistake

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add message to explain usage

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add CommonVoice esperanto example

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix path

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fixes

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Import sox locally, add documentation

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address reviews

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address reviews

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address reviews

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add cell to download the SSL model

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Set max epochs to 300

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fixes, introduce HF dataset instructions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Upstream updates from other branch

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix warning

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add README, add image

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix warning

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Address feedback

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Feedback

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

---------

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* Update PUBLICATIONS.md (#5963)

* Add papers from 2022/2022 to PUBLICATIONS.md

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add additional papers

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [G2P] fixed typos and broken import library. (#5978) (#5979)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

* [G2P] added backward compatibility for english tokenizer and fixed unit tests (#5980) (#5984)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
Signed-off-by: Roman Korostik <racoiaws@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: gabitza-tech <gabriel.pirlogeanu@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: athitten <abhishreetm@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Micha Livne <mlivne@cs.toronto.edu>
Signed-off-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>
Signed-off-by: Evgeniy Shabalin <baah1999@yandex.ru>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Matvei Novikov <mattyson.so@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Roman Korostik <racoiaws@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Mikyas Desta <miktekabi@gmail.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Gabriel Pirlogeanu <53811655+gabitza-tech@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: ANMOL GUPTA <anmolg@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: athitten <47577437+athitten@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: CaraDuf <91517923+Ca-ressemble-a-du-fake@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Micha Livne <mlivne@cs.toronto.edu>
Co-authored-by: Mohamed Saad Ibn Seddik <ms.ibnseddik@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: Evgeniy Shabalin <36159472+treacker@users.noreply.github.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants