Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds Tiktoken tokenizer for Nemotron-Mistral 12B (#9797)
* Adding context- & expert-parallism to MegatronStrategy (#9525) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add CICD test for Stable Diffusion (#9464) * Add CICD test for Stable Diffusion Signed-off-by: Michal Futrega <mfutrega@nvidia.com> * Update cicd-main.yml Signed-off-by: Michal Futrega <mfutrega@nvidia.com> * Use single gpu runner Signed-off-by: Michal Futrega <mfutrega@nvidia.com> --------- Signed-off-by: Michal Futrega <mfutrega@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Akoumparouli/nemo ux mixtral (#9446) * use default collate if dataset does not have one Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * mixtral config Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add convert_state Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix StateDictTransform for 2D layers, e.g. MoE Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pass num_moe_experts to specs Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * udpate MixtralModel Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * mini docstring Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * update mcoreddp call (#9345) * update mcoreddp call Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update mcore commits Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Llama and Gemma (#9528) * add llama Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * add llama Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * add llama3 Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fix typo Signed-off-by: Chen Cui <chcui@nvidia.com> * enable importers with multiple models Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * add gemma Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * checks Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] minor logging bug fixes (#9529) * minor exp_manager bug fixes * remove print statement * fix docstring * fix AppState defaults --------- Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * mcore distOpt restore fix (#9421) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Custom Tiktoken tokenizer. Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Fixed the tokenizer decoding on special tokens. Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Apply isort and black reformatting Signed-off-by: ertkonuk <ertkonuk@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Added token_to_id() method. Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Update neva conversion script from and to HF (#9296) * Update NeMo script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix example scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Update convert_llava_nemo_to_hf.py Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> * address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * vLLM Export Support (#9381) * Export implementation for vLLM 0.4.3. Supports LLAMA2, Mistral, Mixtral (unverified), Gemma and StarCoder2 models. The nemo.export.tensorrt_llm alias was removed to avoid initializing TRT-LLM when importing anything from nemo.export. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed some CodeQL warnings. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Removed empty files. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Updated the integration for vLLM 0.5.0. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Updated the vLLM deployment interface to use max_output_len instead of max_output_token. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Moved the Exporter class to nemo/export and renamed its file to vllm_exporter.py, to be more similar to TRT-LLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Implemented vLLM support in the export tests, added functional testing, implemented forward evaluation on vLLM without Triton. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Moved the vLLM deployment functionality to the common deploy_triton.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the CodeQL discovered issues. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed one more return of a wrong dimensionality... Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * More wrong dimensionality returns. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * PL: Delete precision if using plugin. TODO switch to MegatronTrainerBuilder (#9535) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add page context fmha (#9526) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * extend get_gpt_layer_modelopt_spec to support MoE (#9532) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * fix mock data generation for legacy dataset (#9530) Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [Nemo-UX] IO fixes (#9512) * Improve IOMixin.io_transform_args to handle dataclasses better * Dump task json + img inside NeMoLogger * Adding store_io to train task * Update opt.connect to also propagate to __io__ * Rename opt to optim for consistency * Moving to using safe serialization using fiddle, only use cloudpickle when needed * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Using Config from fiddle instead of sdk for now * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Move enable_nemo_ckpt_io from MegatronStrategy to ModelCheckpoint * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Move nemo-ckpt to _get_finalize_save_checkpoint_callback * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Update TrainerContext & io.load_ckpt * Use renamed TrainerContext inside ModelCheckpoint * Remove double io saving * Rename lightning.pytorch.opt -> optim * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Remove store_io from train-task * Adding fiddle-extension for torch * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Move fdl_torch import * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Adding dtype to serialization * Some fixes * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Make TransformerConfig inherit from IOMixin to fix serialization error * Make TransformerConfig inherit from IOMixin to fix serialization error * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Add support for BuiltinFunctionType * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Add missing import * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Fix dataclass fields --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Test C++ runtime on demand in nemo_export.py to avoid possible OOMs (#9544) * Add test_cpp_runtime flag Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Fix lhotse tests for v1.24.2 (#9546) * Fix lhotse tests for v1.24.0 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix RIR test Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * gpu_unitTests_notOptional (#9551) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * add reset learning rate functionality (#9372) * add reset_lr functionality Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix reset_lr logic Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> * move reset_lr from optim section Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> * add reset_lr value to config Signed-off-by: dimapihtar <dpihtar@gmail.com> * set reset_lr False by default Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove extra line Signed-off-by: dimapihtar <dpihtar@gmail.com> * add reset_lr test Signed-off-by: dimapihtar <dpihtar@gmail.com> * add reset_lr test Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove extra quote Signed-off-by: dimapihtar <dpihtar@gmail.com> * add ability to reset schedule's max_steps and decay_steps Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> * change scheduler's first step logic when using reset_lr Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert config Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix reset_lr logic Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> * revert config Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert config Signed-off-by: dimapihtar <dpihtar@gmail.com> * update reset_lr comments Signed-off-by: dimapihtar <dpihtar@gmail.com> * add use cases for reset_lr feature Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add Python AIStore SDK to container and bump min Lhotse version (#9537) * Add Python AIStore SDK to requirements and bump min Lhotse version Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move AIStore Python SDK to Dockerfile, remove matplotlib/ipywidgets deps Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Adding 'use_dynamo' option for export to use onnx.dynamo_export() instead of onnx.export() (#9147) * Ininial WARs to implement dynamo option for export Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * including weights in .onnx Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * dynamo_export works for many small models Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * External weights behaviour fixed Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Cleanup Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Apply isort and black reformatting Signed-off-by: borisfom <borisfom@users.noreply.github.com> * print cleaned up Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Added overloadable dynamic_shapes_for_export Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Addressing code review Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixing CI issues Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixing CI test failure Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Eliminated test cross-contamination Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> --------- Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: borisfom <borisfom@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Fix tokenizer IO (#9555) * Adding tokenizer to io-test + making it pass * Handling tokenizer correctly inside dump_io * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Removing not used import --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo UX] Move mistral_7b.py to mistral.py (#9545) * Move mistral_7b.py to mistral.py Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename MixtralConfig to MixtralConfig8x7B Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * mistral rename: mistralconfig7b & mistralmodel Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Use closed-formula to round by multiple (#9307) * Use closed-formula to round by multiple Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * ci: Do not attempt to send slack on fork (#9556) * ci: Do not attempt to send slack on fork Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * test Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Fix nemo export test (#9547) * fix minor import bug Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * fix export test Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Fix SDXL incorrect name in docs (#9534) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * GPU unit tests: Mark flaky tests to be fixed (#9559) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Bump PTL version (#9557) Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [Resiliency] Straggler detection (#9473) * Initial straggler det impl Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Fixed CI code checks Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Removed unused import Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * remove submodule Signed-off-by: Maanu Grover <maanug@nvidia.com> * Updated documentation; Updated callback params; Cosmetic changes Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Fixed straggler det config; Added basic test Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Fixes in test_straggler_det.py Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Updated straggler callback API Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * stop_if_detected=False by default Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> --------- Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Co-authored-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Co-authored-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * switch to torch_dist as default dist checkpointing backend (#9541) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Checkpointing bug fixes (#9562) * fix checkpoint loading * fix * fixes * another fix * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> --------- Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add tps and pps params to the export script (#9558) * fix minor import bug Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * fix export test Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * remove n_gpus param Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * add and fix parameters Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * fix deploy script Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * rename tps and pps params Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Consolidate gpt continue training script into pretraining script (#9413) * Consolidate gpt continue training with pretraining Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix default config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add github action cicd Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * extract _integrate_original_checkpoint_data as a method Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix getattr Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "Add github action cicd" This reverts commit a453f16ba2be6413db932623009da893208acdd5. * Update comments in nlp_overrides.py Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add support to change Multi task model prompt (#9542) * Add support to change Multi task model prompt Signed-off-by: smajumdar <titu1994@gmail.com> * Add support to change Multi task model prompt Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Update nemo/collections/common/prompts/formatter.py Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: titu1994 <titu1994@users.noreply.github.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add Multimodal Exporter (#9256) * Add video-neva TRT export * Add TRT inference * Change config * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Change export params * Remove unused import * Add neva export * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Change unpack nemo * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Add trt infer config * Fix neva trt inference * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Add exporter * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Fix infer * Add PyTriton * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Fix deploy wrong dim * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Change to pass PIL Image * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Fix video neva deploy * Change query * Change deploy * Remove unused import * Change ptuning * Change to mm exporter * Add script * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Fix script --------- Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Enable encoder adapters for Canary and MultiTaskAED models (#9409) * Fix assertions for adapter types Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Finalize support for decoder adapters Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * fix the freeze/unfreeze problem by replacing as_frozen with torch.inference_mode * Apply isort and black reformatting Signed-off-by: weiqingw4ng <weiqingw4ng@users.noreply.github.com> * Update tests to new generic way of module update Signed-off-by: smajumdar <titu1994@gmail.com> * Finalize code for update module Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Fix variable name Signed-off-by: smajumdar <titu1994@gmail.com> * Finalize projection support for transformer mha adapters Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Correct implementation of freeze restore Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Corrects the implementation of replace_adapter_modules to limit to just the top level modules Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Remove registration of Transformer MHA Signed-off-by: smajumdar <titu1994@gmail.com> * Remove registration of Transformer MHA Signed-off-by: smajumdar <titu1994@gmail.com> * Address reviewer comments Signed-off-by: smajumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: titu1994 <titu1994@users.noreply.github.com> Signed-off-by: weiqingw4ng <weiqingw4ng@users.noreply.github.com> Co-authored-by: Weiqing Wang <weiqingw@nvidia.com> Co-authored-by: weiqingw4ng <weiqingw4ng@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * pass option through (#9570) Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * PTQ refinements (#9574) * Rename megatron_gpt_quantization -> megatron_gpt_ptq Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Configure export.save_path as dir or tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * PTQ docs update Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Make model_type optional in case of quantized checkpoints Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop unused save_nemo_model_config argument Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Audio model collection (#9263) * Audio model collection Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Apply isort and black reformatting Signed-off-by: anteju <anteju@users.noreply.github.com> * Fix imports Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Addressed PR comments Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Apply isort and black reformatting Signed-off-by: anteju <anteju@users.noreply.github.com> --------- Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: anteju <anteju@users.noreply.github.com> Co-authored-by: anteju <anteju@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Fix Trainer serialization (#9571) * Fix Trainer serialization * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Update click version requirement (#9580) Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [Fault tolerance] Heartbeat detection (#9352) * Fault tolerance related changes Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Cosmetic changes in documentation Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Doc update round2 Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> --------- Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Co-authored-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> Co-authored-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Co-authored-by: jbieniusiewi <152396322+jbieniusiewi@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add ModelOpt QAT example for Llama2 SFT model (#9326) * add INT4 QAT example for Llama2 SFT model Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * Add config parameter to control kv cache quantization Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * Fix typo in cicd-main.yml for QAT test Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * fix nlp_overrides.py Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * address reviewer feedback Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * quantize unwrapped model Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * add compress export argument for qat config Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Set TE flag in legacy -> mcore conversion script (#9585) * set TE flag Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [Nemo-UX] Add fabric-API for manual forward-pass (#9577) * First pass over fabric-API * Adding Trainer -> Fabric conversion * Some small fixes to get a forward-pass in Fabric working * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Adding doc-string to Fabric.import_model * Adding track_io to io_init of Fabric * Fix Fabric.load_model + add doc-string * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Remove unused import * Some small fixes * Fix failing test --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [Nemo-UX] Add SDK-factories to llm-collection (#9589) * Adding sdk-factories to llm-collection * Removing _model from mistral + mixtral * Expose lr_scheduler inside lightning * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Multimodal projection layer adapter fix for PP>1 (#9445) * enabling multimodal adapters to load in PP>1 Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * Apply isort and black reformatting Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> * parameterizing validate_access_integrity, set to false when PP>1 Signed-off-by: paul-gibbons <paul@gibbonspaul.com> formatting fix Signed-off-by: paul-gibbons <paul@gibbonspaul.com> Apply isort and black reformatting Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> * update nlp_model.py Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * Apply isort and black reformatting Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> * update modelPT with validate_access_integrity Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * Apply isort and black reformatting Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> * updating save_restore_connector w/ validate_access_integrity Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * Apply isort and black reformatting Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> * addressing comment Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * adding validate_access_integrity to super().load_config_and_state_dict() Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * testing reorder of validate_access_integrity for CI failures Signed-off-by: paul-gibbons <paul@gibbonspaul.com> --------- Signed-off-by: paul-gibbons <paul@gibbonspaul.com> Signed-off-by: paul-gibbons <paul-gibbons@users.noreply.github.com> Co-authored-by: paul-gibbons <paul-gibbons@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add offline quantization script for QLoRA deployment (#9455) * add qlora offline quantization script Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * clean Signed-off-by: Chen Cui <chcui@nvidia.com> * docstring Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * qlora support more models (#9488) Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Some improvements to NeMoLogger (#9591) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Set n_gpu to None in nemo export (#9593) * fix minor import bug Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * set ngpus to None Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Inflight nemo model export support (#9527) * online model conversion and refit Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * clean code Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * cleanup Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * add refit, cleanup code Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * combine weight conversion functions Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * cleanup code Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com> * remove debug print Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * cleanup code Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * fix single gpu and cleanup code Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com> --------- Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * vLLM Export Improvements (#9596) * Separated the vLLM export functionality from the common deployment script into deploy_vllm_triton.py. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed vocab_size for LLAMA3. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Export test: fixed deployment testing w/o Megatron, made functional tests optional, added --gpu_memory_utilization. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Addressing review and CodeQL comments. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Set finalize_model_grads_func in on_fit_start instead to make sure it's being called (#9599) Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Set no_sync_func & grad_sync_fucn (#9601) * Set no_sync_func & grad_sync_fucn Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * set overlap_param_sync Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * small nemo logger bug fix (#9607) Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * fix the dict format returned by scheduler method (#9609) Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Dataloading enhancements and bug fixes (#9595) * fix dataloading + checkpoint restore * clean up data sampler * fix typo * support passing multiple paths to data module * fix validation dataloader * fix dataloader len when using gradient accumulation * fix progress bar * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * fix step count in loggers * fix blended dataset * address comments * address comment * move step logging into strategy * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> --------- Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Fix serialization of AutoResume (#9616) * fix serialization of autoresume * update undefined variables Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Chat template support for megatron_gpt_eval.py (#9354) * Bump PTL version (#9557) Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [Resiliency] Straggler detection (#9473) * Initial straggler det impl Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Fixed CI code checks Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Removed unused import Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * remove submodule Signed-off-by: Maanu Grover <maanug@nvidia.com> * Updated documentation; Updated callback params; Cosmetic changes Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Fixed straggler det config; Added basic test Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * Fixes in test_straggler_det.py Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Updated straggler callback API Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> * Apply isort and black reformatting Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> * stop_if_detected=False by default Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> --------- Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Co-authored-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Co-authored-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move model loading to separate function; call toContainer once; pad using closed formula Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * read prompts from file Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * If input prompt contains dict, apply model.tokenizer.chat_template Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * apply @Gal Leibovich's patch Taken from: https://github.com/NVIDIA/NeMo/commit/17572905344db4692583e72799d55801a8860f35 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename prompts_file to prompts_jsonl Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add chat_template param Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add ChatTemplateMixin to SentencePieceTokenizer Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add chat-template to text-gen-strat Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move load prompts to separate file Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove chat-template from text-gen-utils Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * make chat-template more generic Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add assert message Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * small refactor for chat_template_mixin Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * undo ckpt conv changes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move rounding to function Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Jacek Bieniusiewicz <jbieniusiewi@nvidia.com> Signed-off-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: jbieniusiewi <152396322+jbieniusiewi@users.noreply.github.com> Co-authored-by: jbieniusiewi <jbieniusiewi@users.noreply.github.com> Co-authored-by: Maanu Grover <maanug@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Jsonl support (#9611) * Adding support to preprocess .jsonl and .jsonl.gz files in input directory Signed-off-by: adityavavre <avavre@nvidia.com> * Adding support to preprocess .jsonl and .jsonl.gz files in input directory Signed-off-by: adityavavre <avavre@nvidia.com> * Apply isort and black reformatting Signed-off-by: adityavavre <adityavavre@users.noreply.github.com> --------- Signed-off-by: adityavavre <avavre@nvidia.com> Signed-off-by: adityavavre <adityavavre@users.noreply.github.com> Co-authored-by: adityavavre <adityavavre@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] Add PEFT (#9490) * initial commit for PEFT in nemo2 * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * address comments Signed-off-by: Chen Cui <chcui@nvidia.com> * make import easier Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * address comments Signed-off-by: Chen Cui <chcui@nvidia.com> * Update nemo/collections/llm/peft/lora.py Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> * Some small fixes + adding more doc-strings * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Adding ModelTransform callback * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Fixing type-hint for model_transform * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * fix import Signed-off-by: Chen Cui <chcui@nvidia.com> * model transform for gemma llama Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fix model transform Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * change lora target default to all linear modules Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * Small fix in mixtral * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Integrating PEFT to the public-API + some fixes * Big refactor to allow to load adapter-states * Some fixes to support adapter_path * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Disabling ckpt reloading when adapter_path is passed * Fix CLI * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Remove commented-out code * Remove commented-out code * Remove un-used import * Fix callback imports * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Fixing llm.pretrain * Some small fixes * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Fix missing import + type-hint in finetune * Adding PreemptionCallback + some more tests * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Clean up imports & clean up llm.api * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Trying to fix failing tests * Remove __init__.py 2 * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Fix failing test * Trying to fix last failing test --------- Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Marc Romeyn <marcromeyn@gmail.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Akoumparouli/mistral import instruct chat template fix (#9567) * use bf16 by defualt mistral conv Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add chat template Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use capitalized role names Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Remove .cuda calls, use device isntead (#9602) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * fix converter defautl args (#9565) * fix converter defautl args Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * mixtral export (#9603) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * fix: remove non_blocking from PTL's .cuda call (#9618) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Alit/mamba tmp (#9612) * adding mamba support * fix import mixins * rm convert jamba * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * more cleanups * use GPT text gen * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * fixing gbs in TP convetor * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * add reqs * add tutorial * minor fix to tutorial * moving finetuning files Signed-off-by: arendu <adithya.r@gmail.com> * moving finetuning files Signed-off-by: arendu <adithya.r@gmail.com> * address comments * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * address comments * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * add mamba_tmp * remove mamba import * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> --------- Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: Ali Taghibakhshi <ataghibakhsh@login-eos01.eos.clusters.nvidia.com> Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com> Co-authored-by: arendu <adithya.r@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * TitaNet Batch Verify Speaker (#9337) * add batch_inference for verify_speakers method Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * remove not used package Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * change batch inference logic Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * fixup Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * requested changes Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * add verify_speakers_batch to docs Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * handle None durations in manifest Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * change logging text Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> * check duration presence Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> * add channel_selector to dataset configs Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> --------- Signed-off-by: msekoyan@nvidia.com <msekoyan@nvidia.com> Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Enable MCore checkpointing optimizations (#9505) * Expose num processes in PyT Dist Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Add parallel save/load optimizations from MCore Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Remove async utils from MCore Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Enable DistOpt paralell R/W Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Enable PyT Dist caching Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Small fixes Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Make sure DistCkptIO is instantiated from config Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Bump MCore version to v0.7 Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Print load strategy Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Forward MCore to model space DistOpt Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Add separate flag to control DistOpt paralell R/W Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Turn off parallel save by default Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Change mixtral moe key name for trt-llm (#9620) * fix minor import bug Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * change moe key values Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * add weight to the key Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * fix ckpt load bug (#9621) * fix ckpt load bug Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * NeVA Minor Fixes (#9608) * fix neva resume with empty param loaded for some pp stage Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix crop size check Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * fix pretrianing data sizes and weights (#9627) Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Alit/mamba (#9575) * adding mamba support * fix import mixins * rm convert jamba * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * more cleanups * use GPT text gen * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * fixing gbs in TP convetor * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * add reqs * add tutorial * minor fix to tutorial * moving finetuning files Signed-off-by: arendu <adithya.r@gmail.com> * moving finetuning files Signed-off-by: arendu <adithya.r@gmail.com> * address comments * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * address comments * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * address comments * add mamba dependancies * add mcore tag * modify dockerfile ci * modify dockerfile ci --------- Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: Ali Taghibakhshi <ataghibakhsh@login-eos01.eos.clusters.nvidia.com> Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com> Co-authored-by: arendu <adithya.r@gmail.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [NeMo-UX] async checkpointing support (#9466) * add async checkpointing support * fixes * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * add parallel read/write support and other optimizations * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * address comments, make dist checkpointing args configurable * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * fix small typo Signed-off-by: ashors1 <ashors@nvidia.com> * Update default sharding type Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Signed-off-by: Anna Shors <71393111+ashors1@users.noreply.github.com> * Update default sharding type Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Signed-off-by: Anna Shors <71393111+ashors1@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> --------- Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Fix the arguments of forward_for_export function in msdd_models (#9624) * Fix the arguments of forward_for_export function Signed-off-by: Taejin Park <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> --------- Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: tango4j <tango4j@users.noreply.github.com> Co-authored-by: tango4j <tango4j@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Change default parallel_save to False (#9632) Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Unwrap ckpt_io for model opt (async save) (#9622) Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * MCore T5 support for NeMo - Training (#9432) * huvu/mcore_t5 first commit from local * removing DEBUGGING prints * cleaning megatron_lm_encoder_decoder_model.py code * cleaning code * adding Github action test * only run mcore T5 test * only run mcore T5 test * only run mcore T5 test * only run mcore T5 test * reset .github/workflows/cicd-main.yml * reset .github/workflows/cicd-main.yml * adding condition self.mcore_t5 when running self.build_transformer_config() * refractor megatron_lm_encoder_decoder_model.py to not use self.model * only run T5-related tests * remove all self.model * reset cicd file * reset cicd file * updating codes remove duplicate if/else; adding mcore/transformer_engine to config file * adjust +model.mcore_t5=True * Apply isort and black reformatting Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> --------- Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com> Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * [Nemo-UX] Expose transformer_layer_spec inside GPTConfig (#9592) * Expose transformer_layer_spec inside GPTConfig * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> * Expose layer-specs * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Update NeMo Clip to Use MCore Modules (#9594) * update clip model and config file Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update clip for mcore Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * MCore CLIP Fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix no mask Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * few neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update siglip module Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add siglip loss Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix collate fn Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update siglip conversion script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update siglip convert Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clip fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * clean up script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * clip fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix code styles Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update siglip_loss.py Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Add REST API to deploy module (#9539) * Add REST API and FastAPI to deploy module Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add NemoQuery and requirements Signed-off-by: Abhishree <abhishreetm@gmail.com> * Edit path for config.json Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add modifications for REST API for the correct functionality Move service dir under deploy Use NeMoQueryLLM instead of NemoQuery Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply isort and black reformatting Signed-off-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com> * Change default port for REST Service Change default port for REST service as Triton server also used the same port as default. Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: athitten <athitten@users.noreply.github.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: athitten <athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: athitten <athitten@users.noreply.github.com> Signed-off-by: Tugrul Konuk <ertkonuk@gmail.com> * Mistral + Mixtral Support for NeVa (#9459) * mistral template support Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * get_specs neva fix Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * mistral update Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * fixed mistral tokenization Signed-off-by: paul-gibbons <paul@gibbonspaul.com> * t…
- Loading branch information