-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding RETRO tests to Action Tests (cicd-main.yml) #8942
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: eharper <eharper@nvidia.com>
* Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…megaconf (#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com>
…8242) (#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit d10726d) Co-authored-by: Piotr Żelasko <petezor@gmail.com>
…kdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com>
jenkins |
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
from
April 17, 2024 01:11
07af44d
to
dfea71e
Compare
pablo-garay
approved these changes
Apr 17, 2024
xingyaoww
pushed a commit
to xingyaoww/NeMo
that referenced
this pull request
Apr 23, 2024
* update branch Signed-off-by: eharper <eharper@nvidia.com> * Add dist ckpt support for regular optimizers (NVIDIA#7749) * Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303) Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Cache Aware Streaming tutorial notebook (NVIDIA#8296) * add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix path location and branch (NVIDIA#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * add deallocate pipeline output optimization (NVIDIA#8279) * add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * remove assertion (NVIDIA#8302) Signed-off-by: dimapihtar <dpihtar@gmail.com> * Update PEFT Doc (NVIDIA#8262) * update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks (NVIDIA#8242) (NVIDIA#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (NVIDIA#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (NVIDIA#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit d10726d) Co-authored-by: Piotr Żelasko <petezor@gmail.com> * add code for calling mcore_retro in NeMo * add code for calling mcore_retro in NeMo * runnable, training curve match retro mcore and nemo * working on retro inference * working on megatron_retro_eval.py and megatron_retro_inference.yaml * refactoring text_generation_utils code and retro inference relevant files * clean PR * resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers) * clean repository * revert changes to inference/eval code to original in main * clean code * runable training code, with already implemented eval code * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * revert to original eval code files * revert to original eval code files 2 * revert to original eval code files 3 * revert to original eval code files 4 * clean code * clean code * update my code to support changes from lastest main * commit before rebase r1.23.0 * Multimodal r1.23.0 bug fix (NVIDIA#8315) * Rename quick-gelu Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * ddpm config guard Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix ddpm edit api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix insert_image_token cfg issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * neva updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add back jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update default neva template Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * copy paste files from r1.23.0 * clean PR * Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334) Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Remove asr webapp (NVIDIA#8347) Signed-off-by: smajumdar <titu1994@gmail.com> * remove _target_ at model level in aed config (NVIDIA#8351) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * revert changes for tts and asr * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357) * Add change_vocabulary and save_tokenizers() support Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> * Change default (NVIDIA#8371) Signed-off-by: smajumdar <titu1994@gmail.com> * implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support * adding megatron compile_helpers(), in future can be fixed with correct MLM commit * bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Enable megatron core loggers for GPT pretraining (NVIDIA#8354) * Logging changes tested for gpt_pretraining Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * Additional args Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * mcore ds fix (NVIDIA#8283) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert apex installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * turn off the fusion for jenkins Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * addressing Eric's reviews * adding existing implementation RETRO files * adding existing implementation RETRO files * Add Finetuning tutorial with HF Datasets (NVIDIA#8356) * Add Finetuning tutorial with HF Datasets Signed-off-by: Nithin Rao Koluguri <nithinraok> * update on Som comments Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * release updates (NVIDIA#8378) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * add mock ds test Signed-off-by: dimapihtar <dpihtar@gmail.com> * add test for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * mcore ds fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * data input fix Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * MCore dataset compatibility for tokenizers (NVIDIA#8390) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer Signed-off-by: Valerie Sarge <vsarge@nvidia.com> * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. Signed-off-by: Valerie Sarge <vsarge@nvidia.com> --------- Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * Mcore customization doc (NVIDIA#8298) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * initial placeholder Signed-off-by: Huiying Li <huiyingl@nvidia.com> * add to intro/index.rst Signed-off-by: Huiying Li <huiyingl@nvidia.com> * initial content update Signed-off-by: Huiying Li <willwin.lee@gmail.com> * add diff images Signed-off-by: Huiying Li <willwin.lee@gmail.com> size Signed-off-by: Huiying Li <willwin.lee@gmail.com> * minor fixes * minor language change Signed-off-by: Chen Cui <chcui@nvidia.com> * clean changes --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> * wer fix (NVIDIA#8404) Signed-off-by: Travis Bartley <tbartley@nvidia.com> * updated link to pubmed (NVIDIA#8402) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Update NFA video download link (NVIDIA#8406) * update nfa nasa video link Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update link in markdown Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * revert changes (NVIDIA#8410) Signed-off-by: Chen Cui <chcui@nvidia.com> * Fix dreambooth data sampler issue (NVIDIA#8400) * Turn on drop last Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Some neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed errors in the CTM gen functions (NVIDIA#8416) Signed-off-by: Taejin Park <tango4j@gmail.com> * add ensemble decoding fix (NVIDIA#8427) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * SDE bugfix log (NVIDIA#8430) Signed-off-by: George <gzelenfroind@nvidia.com> * mcore customization doc minor fix (NVIDIA#8421) Signed-off-by: Huiying Li <willwin.lee@gmail.com> * NeMo-Mistral to HF converter bugfix. (NVIDIA#8353) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fixing mcore bert for TP, PP and SP (NVIDIA#8336) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> --------- Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481) * Add settings to suppress bf16 compile errors in CI on V100 Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * MoE parameter passing (NVIDIA#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * PR fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * CI fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update k2 version (NVIDIA#8478) (NVIDIA#8492) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Add fp8 support for SD/Update notebook paths (NVIDIA#8489) * Add fp8 support for SD/Update notebook paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * pin to 0.5.0 (NVIDIA#8465) Signed-off-by: eharper <eharper@nvidia.com> * Update NeMo Multimodal Requirements (NVIDIA#8515) * Update requirements_multimodal.txt Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update github raw content link (NVIDIA#8517) Signed-off-by: Chen Cui <chcui@nvidia.com> * Add dep notice for notebooks (NVIDIA#8522) * add dep notice Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> * Revert FP8 integration (NVIDIA#8520) * Revert FP8 integration Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update data prep notebook (NVIDIA#8532) Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * before update branch with latest r1.23.0 * update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint) * remove compile_helpers * reverse changes from main branch to r1.23.0 * adding *_legacy files * update MLM commit in Jenkinsfile to latest * debugging Jenkinstest: test different mcore import in retro_dataset * update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py * removing all mcore RETRO to pass the Jenkinstest * fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py * update Jenkinsfile file to use TE v0.7 * update NeMo to work with latest mcore RETRO (solving TE problems) * update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile * update commit for MLM * jenkinstest debugging * temporary fix RETRO's __init__ for jenkinstest * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * add model.data.dataloader_type=cyclic to jenkinsfile * update code to work with latest megatron-lm main 81dab6067 * update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067 * fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files) * isort and black * adjusting model.micro_batch_size to 1 * fix BRANCH = 'r1.23.0' * replace tutorials dir from main branch to huvu/mcore_retro * fix minor merges conflict * update Jenkinsfile * runnable with a temporary fix from Jacek (unfound -unfinished problem) * runnable with a temporary fix from Jacek (unfound -unfinished problem) * modified nlp_overrides.py back to original * fix checkpoint from Jacek Bieniusiewicz * config Jenkinsfile test * set RETRO Jenkins MBS to 1 * black fix * isort fix * update TE commit * update to latest Jenkinsfile with latest container and commits * remove new RETRO jenkinstest * merge latest main * put RETRO Jenkinstest to the right place * update code for megatron_retro_pretraining_legacy.py * untrack ipa_cmudict-0.7b_nv23.01.txt * untrack ipa_cmudict-0.7b_nv23.01.txt * set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy * update new RETRO jenkinstest to run faster * merging latest main, and edit Jenkinstest * update Jenkinstest for new RETRO to run faster * fix isort * adding RETRO tests to cicd-main.yml action tests * update ipa_cmudict-0.7b_nv23.01.txt * remove quotes for model.data for legacy RETRO action tests --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: George <gzelenfroind@nvidia.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: eharper <eharper@nvidia.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
alxzhang-amazon
pushed a commit
to alxzhang-amazon/NeMo
that referenced
this pull request
Apr 26, 2024
* update branch Signed-off-by: eharper <eharper@nvidia.com> * Add dist ckpt support for regular optimizers (NVIDIA#7749) * Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303) Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Cache Aware Streaming tutorial notebook (NVIDIA#8296) * add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix path location and branch (NVIDIA#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * add deallocate pipeline output optimization (NVIDIA#8279) * add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * remove assertion (NVIDIA#8302) Signed-off-by: dimapihtar <dpihtar@gmail.com> * Update PEFT Doc (NVIDIA#8262) * update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks (NVIDIA#8242) (NVIDIA#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (NVIDIA#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (NVIDIA#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit d10726d) Co-authored-by: Piotr Żelasko <petezor@gmail.com> * add code for calling mcore_retro in NeMo * add code for calling mcore_retro in NeMo * runnable, training curve match retro mcore and nemo * working on retro inference * working on megatron_retro_eval.py and megatron_retro_inference.yaml * refactoring text_generation_utils code and retro inference relevant files * clean PR * resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers) * clean repository * revert changes to inference/eval code to original in main * clean code * runable training code, with already implemented eval code * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * revert to original eval code files * revert to original eval code files 2 * revert to original eval code files 3 * revert to original eval code files 4 * clean code * clean code * update my code to support changes from lastest main * commit before rebase r1.23.0 * Multimodal r1.23.0 bug fix (NVIDIA#8315) * Rename quick-gelu Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * ddpm config guard Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix ddpm edit api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix insert_image_token cfg issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * neva updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add back jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update default neva template Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * copy paste files from r1.23.0 * clean PR * Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334) Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Remove asr webapp (NVIDIA#8347) Signed-off-by: smajumdar <titu1994@gmail.com> * remove _target_ at model level in aed config (NVIDIA#8351) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * revert changes for tts and asr * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357) * Add change_vocabulary and save_tokenizers() support Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> * Change default (NVIDIA#8371) Signed-off-by: smajumdar <titu1994@gmail.com> * implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support * adding megatron compile_helpers(), in future can be fixed with correct MLM commit * bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Enable megatron core loggers for GPT pretraining (NVIDIA#8354) * Logging changes tested for gpt_pretraining Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * Additional args Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * mcore ds fix (NVIDIA#8283) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert apex installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * turn off the fusion for jenkins Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * addressing Eric's reviews * adding existing implementation RETRO files * adding existing implementation RETRO files * Add Finetuning tutorial with HF Datasets (NVIDIA#8356) * Add Finetuning tutorial with HF Datasets Signed-off-by: Nithin Rao Koluguri <nithinraok> * update on Som comments Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * release updates (NVIDIA#8378) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * add mock ds test Signed-off-by: dimapihtar <dpihtar@gmail.com> * add test for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * mcore ds fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * data input fix Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * MCore dataset compatibility for tokenizers (NVIDIA#8390) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer Signed-off-by: Valerie Sarge <vsarge@nvidia.com> * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. Signed-off-by: Valerie Sarge <vsarge@nvidia.com> --------- Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * Mcore customization doc (NVIDIA#8298) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * initial placeholder Signed-off-by: Huiying Li <huiyingl@nvidia.com> * add to intro/index.rst Signed-off-by: Huiying Li <huiyingl@nvidia.com> * initial content update Signed-off-by: Huiying Li <willwin.lee@gmail.com> * add diff images Signed-off-by: Huiying Li <willwin.lee@gmail.com> size Signed-off-by: Huiying Li <willwin.lee@gmail.com> * minor fixes * minor language change Signed-off-by: Chen Cui <chcui@nvidia.com> * clean changes --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> * wer fix (NVIDIA#8404) Signed-off-by: Travis Bartley <tbartley@nvidia.com> * updated link to pubmed (NVIDIA#8402) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Update NFA video download link (NVIDIA#8406) * update nfa nasa video link Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update link in markdown Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * revert changes (NVIDIA#8410) Signed-off-by: Chen Cui <chcui@nvidia.com> * Fix dreambooth data sampler issue (NVIDIA#8400) * Turn on drop last Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Some neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed errors in the CTM gen functions (NVIDIA#8416) Signed-off-by: Taejin Park <tango4j@gmail.com> * add ensemble decoding fix (NVIDIA#8427) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * SDE bugfix log (NVIDIA#8430) Signed-off-by: George <gzelenfroind@nvidia.com> * mcore customization doc minor fix (NVIDIA#8421) Signed-off-by: Huiying Li <willwin.lee@gmail.com> * NeMo-Mistral to HF converter bugfix. (NVIDIA#8353) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fixing mcore bert for TP, PP and SP (NVIDIA#8336) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> --------- Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481) * Add settings to suppress bf16 compile errors in CI on V100 Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * MoE parameter passing (NVIDIA#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * PR fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * CI fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update k2 version (NVIDIA#8478) (NVIDIA#8492) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Add fp8 support for SD/Update notebook paths (NVIDIA#8489) * Add fp8 support for SD/Update notebook paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * pin to 0.5.0 (NVIDIA#8465) Signed-off-by: eharper <eharper@nvidia.com> * Update NeMo Multimodal Requirements (NVIDIA#8515) * Update requirements_multimodal.txt Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update github raw content link (NVIDIA#8517) Signed-off-by: Chen Cui <chcui@nvidia.com> * Add dep notice for notebooks (NVIDIA#8522) * add dep notice Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> * Revert FP8 integration (NVIDIA#8520) * Revert FP8 integration Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update data prep notebook (NVIDIA#8532) Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * before update branch with latest r1.23.0 * update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint) * remove compile_helpers * reverse changes from main branch to r1.23.0 * adding *_legacy files * update MLM commit in Jenkinsfile to latest * debugging Jenkinstest: test different mcore import in retro_dataset * update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py * removing all mcore RETRO to pass the Jenkinstest * fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py * update Jenkinsfile file to use TE v0.7 * update NeMo to work with latest mcore RETRO (solving TE problems) * update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile * update commit for MLM * jenkinstest debugging * temporary fix RETRO's __init__ for jenkinstest * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * add model.data.dataloader_type=cyclic to jenkinsfile * update code to work with latest megatron-lm main 81dab6067 * update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067 * fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files) * isort and black * adjusting model.micro_batch_size to 1 * fix BRANCH = 'r1.23.0' * replace tutorials dir from main branch to huvu/mcore_retro * fix minor merges conflict * update Jenkinsfile * runnable with a temporary fix from Jacek (unfound -unfinished problem) * runnable with a temporary fix from Jacek (unfound -unfinished problem) * modified nlp_overrides.py back to original * fix checkpoint from Jacek Bieniusiewicz * config Jenkinsfile test * set RETRO Jenkins MBS to 1 * black fix * isort fix * update TE commit * update to latest Jenkinsfile with latest container and commits * remove new RETRO jenkinstest * merge latest main * put RETRO Jenkinstest to the right place * update code for megatron_retro_pretraining_legacy.py * untrack ipa_cmudict-0.7b_nv23.01.txt * untrack ipa_cmudict-0.7b_nv23.01.txt * set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy * update new RETRO jenkinstest to run faster * merging latest main, and edit Jenkinstest * update Jenkinstest for new RETRO to run faster * fix isort * adding RETRO tests to cicd-main.yml action tests * update ipa_cmudict-0.7b_nv23.01.txt * remove quotes for model.data for legacy RETRO action tests --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: George <gzelenfroind@nvidia.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: eharper <eharper@nvidia.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
suiyoubi
pushed a commit
that referenced
this pull request
May 2, 2024
* update branch Signed-off-by: eharper <eharper@nvidia.com> * Add dist ckpt support for regular optimizers (#7749) * Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pin lhotse=1.19.2 in r1.23.0 (#8303) Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Cache Aware Streaming tutorial notebook (#8296) * add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * add deallocate pipeline output optimization (#8279) * add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix memory leak caused by context parallelism hanging references by omegaconf (#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * remove assertion (#8302) Signed-off-by: dimapihtar <dpihtar@gmail.com> * Update PEFT Doc (#8262) * update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks (#8242) (#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit d10726d) Co-authored-by: Piotr Żelasko <petezor@gmail.com> * add code for calling mcore_retro in NeMo * add code for calling mcore_retro in NeMo * runnable, training curve match retro mcore and nemo * working on retro inference * working on megatron_retro_eval.py and megatron_retro_inference.yaml * refactoring text_generation_utils code and retro inference relevant files * clean PR * resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers) * clean repository * revert changes to inference/eval code to original in main * clean code * runable training code, with already implemented eval code * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * revert to original eval code files * revert to original eval code files 2 * revert to original eval code files 3 * revert to original eval code files 4 * clean code * clean code * update my code to support changes from lastest main * commit before rebase r1.23.0 * Multimodal r1.23.0 bug fix (#8315) * Rename quick-gelu Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * ddpm config guard Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix ddpm edit api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix insert_image_token cfg issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * neva updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add back jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update default neva template Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * copy paste files from r1.23.0 * clean PR * Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Remove asr webapp (#8347) Signed-off-by: smajumdar <titu1994@gmail.com> * remove _target_ at model level in aed config (#8351) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * revert changes for tts and asr * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) * Add change_vocabulary and save_tokenizers() support Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> * Change default (#8371) Signed-off-by: smajumdar <titu1994@gmail.com> * implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support * adding megatron compile_helpers(), in future can be fixed with correct MLM commit * bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Enable megatron core loggers for GPT pretraining (#8354) * Logging changes tested for gpt_pretraining Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * Additional args Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * mcore ds fix (#8283) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert apex installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * turn off the fusion for jenkins Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * addressing Eric's reviews * adding existing implementation RETRO files * adding existing implementation RETRO files * Add Finetuning tutorial with HF Datasets (#8356) * Add Finetuning tutorial with HF Datasets Signed-off-by: Nithin Rao Koluguri <nithinraok> * update on Som comments Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * add mock ds test Signed-off-by: dimapihtar <dpihtar@gmail.com> * add test for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * mcore ds fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * data input fix Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * MCore dataset compatibility for tokenizers (#8390) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer Signed-off-by: Valerie Sarge <vsarge@nvidia.com> * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. Signed-off-by: Valerie Sarge <vsarge@nvidia.com> --------- Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * Mcore customization doc (#8298) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * initial placeholder Signed-off-by: Huiying Li <huiyingl@nvidia.com> * add to intro/index.rst Signed-off-by: Huiying Li <huiyingl@nvidia.com> * initial content update Signed-off-by: Huiying Li <willwin.lee@gmail.com> * add diff images Signed-off-by: Huiying Li <willwin.lee@gmail.com> size Signed-off-by: Huiying Li <willwin.lee@gmail.com> * minor fixes * minor language change Signed-off-by: Chen Cui <chcui@nvidia.com> * clean changes --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> * wer fix (#8404) Signed-off-by: Travis Bartley <tbartley@nvidia.com> * updated link to pubmed (#8402) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Update NFA video download link (#8406) * update nfa nasa video link Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update link in markdown Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * revert changes (#8410) Signed-off-by: Chen Cui <chcui@nvidia.com> * Fix dreambooth data sampler issue (#8400) * Turn on drop last Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Some neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed errors in the CTM gen functions (#8416) Signed-off-by: Taejin Park <tango4j@gmail.com> * add ensemble decoding fix (#8427) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * SDE bugfix log (#8430) Signed-off-by: George <gzelenfroind@nvidia.com> * mcore customization doc minor fix (#8421) Signed-off-by: Huiying Li <willwin.lee@gmail.com> * NeMo-Mistral to HF converter bugfix. (#8353) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fixing mcore bert for TP, PP and SP (#8336) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> --------- Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Add settings to suppress bf16 compile errors in CI on V100 (#8481) * Add settings to suppress bf16 compile errors in CI on V100 Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * PR fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * CI fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update k2 version (#8478) (#8492) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Add fp8 support for SD/Update notebook paths (#8489) * Add fp8 support for SD/Update notebook paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * pin to 0.5.0 (#8465) Signed-off-by: eharper <eharper@nvidia.com> * Update NeMo Multimodal Requirements (#8515) * Update requirements_multimodal.txt Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update github raw content link (#8517) Signed-off-by: Chen Cui <chcui@nvidia.com> * Add dep notice for notebooks (#8522) * add dep notice Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> * Revert FP8 integration (#8520) * Revert FP8 integration Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update data prep notebook (#8532) Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * before update branch with latest r1.23.0 * update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint) * remove compile_helpers * reverse changes from main branch to r1.23.0 * adding *_legacy files * update MLM commit in Jenkinsfile to latest * debugging Jenkinstest: test different mcore import in retro_dataset * update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py * removing all mcore RETRO to pass the Jenkinstest * fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py * update Jenkinsfile file to use TE v0.7 * update NeMo to work with latest mcore RETRO (solving TE problems) * update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile * update commit for MLM * jenkinstest debugging * temporary fix RETRO's __init__ for jenkinstest * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * add model.data.dataloader_type=cyclic to jenkinsfile * update code to work with latest megatron-lm main 81dab6067 * update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067 * fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files) * isort and black * adjusting model.micro_batch_size to 1 * fix BRANCH = 'r1.23.0' * replace tutorials dir from main branch to huvu/mcore_retro * fix minor merges conflict * update Jenkinsfile * runnable with a temporary fix from Jacek (unfound -unfinished problem) * runnable with a temporary fix from Jacek (unfound -unfinished problem) * modified nlp_overrides.py back to original * fix checkpoint from Jacek Bieniusiewicz * config Jenkinsfile test * set RETRO Jenkins MBS to 1 * black fix * isort fix * update TE commit * update to latest Jenkinsfile with latest container and commits * remove new RETRO jenkinstest * merge latest main * put RETRO Jenkinstest to the right place * update code for megatron_retro_pretraining_legacy.py * untrack ipa_cmudict-0.7b_nv23.01.txt * untrack ipa_cmudict-0.7b_nv23.01.txt * set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy * update new RETRO jenkinstest to run faster * merging latest main, and edit Jenkinstest * update Jenkinstest for new RETRO to run faster * fix isort * adding RETRO tests to cicd-main.yml action tests * update ipa_cmudict-0.7b_nv23.01.txt * remove quotes for model.data for legacy RETRO action tests --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: George <gzelenfroind@nvidia.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: eharper <eharper@nvidia.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com> Signed-off-by: Ao Tang <aot@nvidia.com>
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* update branch Signed-off-by: eharper <eharper@nvidia.com> * Add dist ckpt support for regular optimizers (NVIDIA#7749) * Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303) Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Cache Aware Streaming tutorial notebook (NVIDIA#8296) * add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix path location and branch (NVIDIA#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * add deallocate pipeline output optimization (NVIDIA#8279) * add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * remove assertion (NVIDIA#8302) Signed-off-by: dimapihtar <dpihtar@gmail.com> * Update PEFT Doc (NVIDIA#8262) * update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks (NVIDIA#8242) (NVIDIA#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (NVIDIA#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (NVIDIA#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit 86efc4e) Co-authored-by: Piotr Żelasko <petezor@gmail.com> * add code for calling mcore_retro in NeMo * add code for calling mcore_retro in NeMo * runnable, training curve match retro mcore and nemo * working on retro inference * working on megatron_retro_eval.py and megatron_retro_inference.yaml * refactoring text_generation_utils code and retro inference relevant files * clean PR * resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers) * clean repository * revert changes to inference/eval code to original in main * clean code * runable training code, with already implemented eval code * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * revert to original eval code files * revert to original eval code files 2 * revert to original eval code files 3 * revert to original eval code files 4 * clean code * clean code * update my code to support changes from lastest main * commit before rebase r1.23.0 * Multimodal r1.23.0 bug fix (NVIDIA#8315) * Rename quick-gelu Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * ddpm config guard Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix ddpm edit api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix insert_image_token cfg issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * neva updates Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add back jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update default neva template Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * copy paste files from r1.23.0 * clean PR * Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334) Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Remove asr webapp (NVIDIA#8347) Signed-off-by: smajumdar <titu1994@gmail.com> * remove _target_ at model level in aed config (NVIDIA#8351) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * revert changes for tts and asr * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357) * Add change_vocabulary and save_tokenizers() support Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> * Change default (NVIDIA#8371) Signed-off-by: smajumdar <titu1994@gmail.com> * implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support * adding megatron compile_helpers(), in future can be fixed with correct MLM commit * bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368) Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Enable megatron core loggers for GPT pretraining (NVIDIA#8354) * Logging changes tested for gpt_pretraining Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * Additional args Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * mcore ds fix (NVIDIA#8283) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert apex installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * turn off the fusion for jenkins Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * addressing Eric's reviews * adding existing implementation RETRO files * adding existing implementation RETRO files * Add Finetuning tutorial with HF Datasets (NVIDIA#8356) * Add Finetuning tutorial with HF Datasets Signed-off-by: Nithin Rao Koluguri <nithinraok> * update on Som comments Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * release updates (NVIDIA#8378) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * mcore ds fix Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr files Signed-off-by: dimapihtar <dpihtar@gmail.com> * add comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore version Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix Bert unit tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * update bert tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix bert mcore test Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix gpt jenkins tests Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * add mock ds test Signed-off-by: dimapihtar <dpihtar@gmail.com> * add test for dict data input type Signed-off-by: dimapihtar <dpihtar@gmail.com> * mcore ds fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * data input fix Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * MCore dataset compatibility for tokenizers (NVIDIA#8390) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer Signed-off-by: Valerie Sarge <vsarge@nvidia.com> * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. Signed-off-by: Valerie Sarge <vsarge@nvidia.com> --------- Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> * Mcore customization doc (NVIDIA#8298) * [tutorial] fixed missing RIR scripts file. (NVIDIA#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add values to en tts dict (NVIDIA#7879) Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add Bert HF checkpoint converter (NVIDIA#8088) * Add Bert HF checkpoint converter Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reformat Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add BERT ONNX export * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add NeMo BERT to HF BERT script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Clean code Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update argument names Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update build_transformer_config in Bert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> * initial placeholder Signed-off-by: Huiying Li <huiyingl@nvidia.com> * add to intro/index.rst Signed-off-by: Huiying Li <huiyingl@nvidia.com> * initial content update Signed-off-by: Huiying Li <willwin.lee@gmail.com> * add diff images Signed-off-by: Huiying Li <willwin.lee@gmail.com> size Signed-off-by: Huiying Li <willwin.lee@gmail.com> * minor fixes * minor language change Signed-off-by: Chen Cui <chcui@nvidia.com> * clean changes --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> * wer fix (NVIDIA#8404) Signed-off-by: Travis Bartley <tbartley@nvidia.com> * updated link to pubmed (NVIDIA#8402) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Update NFA video download link (NVIDIA#8406) * update nfa nasa video link Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update link in markdown Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * revert changes (NVIDIA#8410) Signed-off-by: Chen Cui <chcui@nvidia.com> * Fix dreambooth data sampler issue (NVIDIA#8400) * Turn on drop last Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Some neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed errors in the CTM gen functions (NVIDIA#8416) Signed-off-by: Taejin Park <tango4j@gmail.com> * add ensemble decoding fix (NVIDIA#8427) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * SDE bugfix log (NVIDIA#8430) Signed-off-by: George <gzelenfroind@nvidia.com> * mcore customization doc minor fix (NVIDIA#8421) Signed-off-by: Huiying Li <willwin.lee@gmail.com> * NeMo-Mistral to HF converter bugfix. (NVIDIA#8353) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fixing mcore bert for TP, PP and SP (NVIDIA#8336) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> --------- Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481) * Add settings to suppress bf16 compile errors in CI on V100 Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * MoE parameter passing (NVIDIA#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * PR fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * CI fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update k2 version (NVIDIA#8478) (NVIDIA#8492) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * Add fp8 support for SD/Update notebook paths (NVIDIA#8489) * Add fp8 support for SD/Update notebook paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * pin to 0.5.0 (NVIDIA#8465) Signed-off-by: eharper <eharper@nvidia.com> * Update NeMo Multimodal Requirements (NVIDIA#8515) * Update requirements_multimodal.txt Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update github raw content link (NVIDIA#8517) Signed-off-by: Chen Cui <chcui@nvidia.com> * Add dep notice for notebooks (NVIDIA#8522) * add dep notice Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> * Revert FP8 integration (NVIDIA#8520) * Revert FP8 integration Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update data prep notebook (NVIDIA#8532) Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * before update branch with latest r1.23.0 * update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint) * remove compile_helpers * reverse changes from main branch to r1.23.0 * adding *_legacy files * update MLM commit in Jenkinsfile to latest * debugging Jenkinstest: test different mcore import in retro_dataset * update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py * removing all mcore RETRO to pass the Jenkinstest * fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py * update Jenkinsfile file to use TE v0.7 * update NeMo to work with latest mcore RETRO (solving TE problems) * update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile * update commit for MLM * jenkinstest debugging * temporary fix RETRO's __init__ for jenkinstest * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster * add model.data.dataloader_type=cyclic to jenkinsfile * update code to work with latest megatron-lm main 81dab6067 * update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067 * fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files) * isort and black * adjusting model.micro_batch_size to 1 * fix BRANCH = 'r1.23.0' * replace tutorials dir from main branch to huvu/mcore_retro * fix minor merges conflict * update Jenkinsfile * runnable with a temporary fix from Jacek (unfound -unfinished problem) * runnable with a temporary fix from Jacek (unfound -unfinished problem) * modified nlp_overrides.py back to original * fix checkpoint from Jacek Bieniusiewicz * config Jenkinsfile test * set RETRO Jenkins MBS to 1 * black fix * isort fix * update TE commit * update to latest Jenkinsfile with latest container and commits * remove new RETRO jenkinstest * merge latest main * put RETRO Jenkinstest to the right place * update code for megatron_retro_pretraining_legacy.py * untrack ipa_cmudict-0.7b_nv23.01.txt * untrack ipa_cmudict-0.7b_nv23.01.txt * set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy * update new RETRO jenkinstest to run faster * merging latest main, and edit Jenkinstest * update Jenkinstest for new RETRO to run faster * fix isort * adding RETRO tests to cicd-main.yml action tests * update ipa_cmudict-0.7b_nv23.01.txt * remove quotes for model.data for legacy RETRO action tests --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Valerie Sarge <vsarge@nvidia.com> Signed-off-by: Huiying Li <huiyingl@nvidia.com> Signed-off-by: Huiying Li <willwin.lee@gmail.com> Signed-off-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: George <gzelenfroind@nvidia.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: eharper <eharper@nvidia.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: ashbhandare <ash.bhandare@gmail.com> Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Valerie Sarge <vsarge@nvidia.com> Co-authored-by: Huiying <willwin.lee@gmail.com> Co-authored-by: Huiying Li <huiyingl@nvidia.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adding RETRO tests to Action Tests (cicd-main.yml)
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information