Highlights
Large language models & Multi modal
- Training
- Long context recipe
- PyTorch Native FSDP 1
- Models
- Llama 3
- Mixtral
- Nemotron
- NeMo 1.0
Export
- TensorRT-LLM v0.12 integration
- LoRA support for vLLM
- FP8 checkpoint
ASR
- Parakeet large (ASR with PnC model)
- Added Uzbek offline and Gregorian streaming models
- Optimization feature for efficient bucketing to improve bs consumption on GPUs
Detailed Changelogs
ASR
Changelog
- add parakeet-tdt_ctc-110m model by @nithinraok :: PR: #10461
- fix asr finetune by @stevehuang52 :: PR: #10508
- replace unbiased with correction by @nithinraok :: PR: #10555
- Update Multi_Task_Adapters.ipynb by @pzelasko :: PR: #10600
- Fix asr warnings by @nithinraok :: PR: #10469
- Fix typo in ASR RNNT BPE model by @pzelasko :: PR: #10742
- TestEncDecMultiTaskModel for canary parallel by @karpnv :: PR: #10740
- fix chunked infer by @stevehuang52 :: PR: #10581
- training code for hybrid-autoregressive inference model by @hainan-xv :: PR: #10841
- remove stacking operation from batched functions by @lilithgrigoryan :: PR: #10524
- Add lhotse fixes for rnnt model training and WER hanging issue with f… by @nithinraok :: PR: #10821
- Fix ASR tests by @artbataev :: PR: #10794
- [Fix] Fixed sampler override and audio_key in prepare_audio_data by @anteju :: PR: #10980
- [WIP] Add docs for NEST SSL by @stevehuang52 :: PR: #10804
- Akoumparouli/mixtral recipe fix r2.0.0 by @akoumpa :: PR: #10994
- TDT compute timestamps option and Extra Whitespace handling for SPE by @monica-sekoyan :: PR: #10875
- ci: Switch to CPU only runner by @ko3n1g :: PR: #11035
- Fix timestamps tests by @monica-sekoyan :: PR: #11053
- ci: Pin release freeze by @ko3n1g :: PR: #11143
- Fix RNN-T loss memory usage by @artbataev :: PR: #11144
- Added deprecation notice by @Ssofja :: PR: #11133
- Fixes for Canary adapters tutorial by @pzelasko :: PR: #11184
- add ipython import guard by @nithinraok :: PR: #11191
- Self Supervised Pre-Training tutorial Fix by @monica-sekoyan :: PR: #11206
- update the return type by @nithinraok :: PR: #11210
- Timestamps to transcribe by @nithinraok :: PR: #10950
- [Doc fixes] update file names, installation instructions, bad links by @erastorgueva-nv :: PR: #11045
- Beam search algorithm implementation for TDT models by @lilithgrigoryan :: PR: #10903
TTS
Changelog
- Fix asr warnings by @nithinraok :: PR: #10469
- Make nemo text processing optional in TTS by @blisc :: PR: #10584
- [Doc fixes] update file names, installation instructions, bad links by @erastorgueva-nv :: PR: #11045
NLP / NMT
Changelog
-
MCORE interface for TP-only FP8 AMAX reduction by @erhoo82 :: PR: #10437
-
Remove Apex dependency if not using MixedFusedLayerNorm by @cuichenx :: PR: #10468
-
Add missing import guards for causal_conv1d and mamba_ssm dependencies by @janekl :: PR: #10429
-
Update doc for fp8 trt-llm export by @Laplasjan107 :: PR: #10444
-
Remove running validating after finetuning by @huvunvidia :: PR: #10560
-
Extending modelopt spec for TEDotProductAttention by @janekl :: PR: #10523
-
Fix mb_calculator import in lora tutorial by @BoxiangW :: PR: #10624
-
.nemo conversion bug fix by @dimapihtar :: PR: #10598
-
Require setuptools>=70 and update deprecated api by @thomasdhc :: PR: #10659
-
Akoumparouli/fix get tokenizer list by @akoumpa :: PR: #10596
-
[McoreDistOptim] fix the naming to match apex.dist by @gdengk :: PR: #10707
-
[fix] Ensures disabling exp_manager with exp_manager=null does not error by @terrykong :: PR: #10651
-
[feat] Update get_model_parallel_src_rank to support tp-pp-dp ordering by @terrykong :: PR: #10652
-
feat: Migrate GPTSession refit path in Nemo export to ModelRunner for Aligner by @terrykong :: PR: #10654
-
[MCoreDistOptim] Add assertions for McoreDistOptim and fix fp8 arg specs by @gdengk :: PR: #10748
-
Fix for crashes with tensorboard_logger=false and VP + LoRA by @vysarge :: PR: #10792
-
Adding init_model_parallel to FabricMegatronStrategy by @marcromeyn :: PR: #10733
-
Moving steps to MegatronParallel to improve UX for Fabric by @marcromeyn :: PR: #10732
-
Adding setup_megatron_optimizer to FabricMegatronStrategy by @marcromeyn :: PR: #10833
-
Make FabricMegatronMixedPrecision match MegatronMixedPrecision by @marcromeyn :: PR: #10835
-
Fix VPP bug in MegatronStep by @marcromeyn :: PR: #10847
-
Expose drop_last in MegatronDataSampler by @farhadrgh :: PR: #10837
-
Move collectiob.nlp imports inline for t5 by @marcromeyn :: PR: #10877
-
Use a context-manager when opening files by @akoumpa :: PR: #10895
-
ckpt convert bug fixes by @dimapihtar :: PR: #10878
-
remove deprecated ci tests by @dimapihtar :: PR: #10922
-
Update T5 tokenizer (adding additional tokens to tokenizer config) by @huvunvidia :: PR: #10972
-
Add support and recipes for HF models via AutoModelForCausalLM by @akoumpa :: PR: #10962
- gpt3 175b cli by @malay-nagda :: PR: #10985
- Fix for crash with LoRA + tp_overlap_comm=false + sequence_parallel=true by @vysarge :: PR: #10920
- Update
BaseMegatronSampler
for compatibility with PTL'''s_BatchProgress
by @ashors1 :: PR: #11016 - add deprecation note by @dimapihtar :: PR: #11024
- Update ModelOpt Width Pruning example defaults by @kevalmorabia97 :: PR: #10902
- switch to NeMo 2.0 recipes by @dimapihtar :: PR: #10948
- NeMo 1.0: upcycle dense to moe by @akoumpa :: PR: #11002
- Update mcore parallelism initialization in nemo2 by @yaoyu-33 :: PR: #10643
- Gemma2 in Nemo2 with Recipes by @suiyoubi :: PR: #11037
- Add Packed Seq option to GPT based models by @suiyoubi :: PR: #11100
- Fix MCoreGPTModel import in llm.gpt.model.base by @hemildesai :: PR: #11109
- TP+MoE peft fix by @akoumpa :: PR: #11114
- GPT recipes to use full te spec by @JimmyZhang12 :: PR: #11119
- Virtual pipeline parallel support for LoRA in NLPAdapterModelMixin by @vysarge :: PR: #11128
- update nemo args for mcore flash decode arg change by @HuiyingLi :: PR: #11138
- Call
ckpt_to_weights_subdir
fromMegatronCheckpointIO
by @ashors1 :: PR: #10897 - fix typo by @dimapihtar :: PR: #11234
- [Doc fixes] update file names, installation instructions, bad links by @erastorgueva-nv :: PR: #11045
- fix(export): GPT models w/ bias=False convert properly by @terrykong :: PR: #11255