-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text_generation_utils memory reduction if no logprob needed #6773
Conversation
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
…,now memory bottlenecked by attention softmax which needs to be solved with FA or long attention Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
…into gpt_predict_mp_mem_issue
for more information, see https://pre-commit.ci
…into gpt_predict_mp_mem_issue
@@ -317,18 +323,19 @@ def receive_generate_info(): | |||
""" | |||
model_parallel_group = parallel_state.get_model_parallel_group() | |||
src = get_model_parallel_src_rank() | |||
input_info_tensor = torch.empty(10, dtype=torch.float32, device=torch.cuda.current_device()) | |||
input_info_tensor = torch.empty(11, dtype=torch.float32, device=torch.cuda.current_device()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment here? why change to 11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added compute_logprob as new entry to input_info_tensor, hence need to increase by one. added comment what compute_logprob does
@@ -67,7 +66,7 @@ def forward_step(self, batch, tensor_shape): | |||
|
|||
return output_tensor | |||
|
|||
def tokenize_batch(self, sentences, max_len, add_BOS): | |||
def tokenize_batch(self, sentences, max_len, add_BOS, truncate_prompt_length): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for more information, see https://pre-commit.ci
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
…into gpt_predict_mp_mem_issue
for more information, see https://pre-commit.ci
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
…into gpt_predict_mp_mem_issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall.
why we need truncate_prompt_length argument?
@@ -265,9 +267,10 @@ def main(cfg) -> None: | |||
|
|||
# Second method of running text generation, call trainer.predict | |||
ds = RequestDataSet(OmegaConf.to_container(cfg.prompts)) | |||
request_dl = DataLoader(dataset=ds, batch_size=2) | |||
request_dl = DataLoader(dataset=ds, batch_size=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall. why we need truncate_prompt_length argument?
If someone want's to truncate their prompt, like with p-tuning truncating the context. but this is not a deal breaker, so your call if you want to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i can revert the bs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log prob changes looks great! just suggested @yzhang123 to not include the truncation logic in this PR.
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
this has been removed from the pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* peft eval directly from ckpt (#6785) * update to load from ckpt Signed-off-by: arendu <adithya.r@gmail.com> * update Signed-off-by: arendu <adithya.r@gmail.com> * load ckpt peft model Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update style Signed-off-by: arendu <adithya.r@gmail.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add Frame-VAD examples and utils (#6463) * add model, dataset, necessary utils and tests Signed-off-by: stevehuang52 <heh@nvidia.com> * fix tarred data Signed-off-by: stevehuang52 <heh@nvidia.com> * fix typo Signed-off-by: stevehuang52 <heh@nvidia.com> * add fvad examples and update utils Signed-off-by: stevehuang52 <heh@nvidia.com> * add copyright Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor and add tests Signed-off-by: stevehuang52 <heh@nvidia.com> * update dataset Signed-off-by: stevehuang52 <heh@nvidia.com> * update test Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * fix typos Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> * [TTS][zh] refine hardcoded lowercase for ASCII letters. (#6781) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Spellchecking ASR customization model (#6179) * bug fixes Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * fix bugs, add preparation and evaluation scripts, add readme Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * small fixes Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add real coverage calculation, small fixes, more debug information Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add option to pass a filelist and output folder - to handle inference from multiple input files Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * added preprocessing for yago wikipedia articles - finding yago entities and their subphrases Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * yago wiki preprocessing, sampling, pseudonormalization Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * more scripts for preparation of training examples Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add some alphabet checks Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add bert on subwords, concatenate it to bert on characters Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add calculation of character_pos_to_subword_pos Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * pdb Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * tensor join bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * double hidden_size in classifier Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * pdb Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * default index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * pad index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * remove pdb Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bugs, add creation of tarred dataset Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add possibility to change sequence len at inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * change sampling of dummy candidates at inference, add candidate info file Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix import Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * update transcription now uses info Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * write path Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * 1. add tarred dataset support(untested). 2. fix bug with ban_ngrams in indexing Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * skip short_sent if no real candidates Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix import Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add braceexpand Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug in np.ones Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug in collate Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * change tensor type to long because of error in torch.gather Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix for empty spans tensor Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * same fixes in _collate_fn for tarred dataset Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug from previous commit Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * change int types to be shorter to minimize tar size Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactoring of datasets and inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * tar by 100k examples, small fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fixes, add analytics script Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * Add functions for dynamic programming comparison to get best path by ngrams Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes to support testing on SPGISpeech Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add preprocessing for userlibri Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * some refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * some refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small refactoring before pr. Add bash-scripts reproducing evaluation Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * style fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fixes in inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix - didn't move window on last symbol Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug - shuffle was before truncation of sorted candidates Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactoring, fix some bugs Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * variour fixes. Add word_indices at inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add candidate positions Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move data preparation and evaluation to other repo Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add infer_reproduce_paper. Refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactor inference using fragment indices Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add some helper functions Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug with parameters order Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bugs Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactoring, fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add multiple variants of adjusting start/end positions Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit tests, other fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix CodeQl warnings Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fixes Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * fix bugs, add preparation and evaluation scripts, add readme Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * small fixes Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add real coverage calculation, small fixes, more debug information Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add option to pass a filelist and output folder - to handle inference from multiple input files Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * added preprocessing for yago wikipedia articles - finding yago entities and their subphrases Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * yago wiki preprocessing, sampling, pseudonormalization Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * more scripts for preparation of training examples Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add some alphabet checks Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add bert on subwords, concatenate it to bert on characters Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add calculation of character_pos_to_subword_pos Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * pdb Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * tensor join bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * double hidden_size in classifier Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * pdb Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * default index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * pad index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * remove pdb Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bugs, add creation of tarred dataset Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add possibility to change sequence len at inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * change sampling of dummy candidates at inference, add candidate info file Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix import Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * update transcription now uses info Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * write path Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * 1. add tarred dataset support(untested). 2. fix bug with ban_ngrams in indexing Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * skip short_sent if no real candidates Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix import Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add braceexpand Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug in np.ones Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug in collate Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * change tensor type to long because of error in torch.gather Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix for empty spans tensor Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * same fixes in _collate_fn for tarred dataset Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug from previous commit Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * change int types to be shorter to minimize tar size Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactoring of datasets and inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * tar by 100k examples, small fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fixes, add analytics script Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * Add functions for dynamic programming comparison to get best path by ngrams Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes to support testing on SPGISpeech Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add preprocessing for userlibri Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * some refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * some refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small refactoring before pr. Add bash-scripts reproducing evaluation Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * style fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fixes in inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * bug fix - didn't move window on last symbol Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug - shuffle was before truncation of sorted candidates Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactoring, fix some bugs Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * variour fixes. Add word_indices at inference Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add candidate positions Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move data preparation and evaluation to other repo Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add infer_reproduce_paper. Refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactor inference using fragment indices Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add some helper functions Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug with parameters order Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bugs Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * refactoring, fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add multiple variants of adjusting start/end positions Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit tests, other fixes Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CodeQl warnings Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add script for full inference pipeline, refactoring Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add tutorial Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * take example data from HuggingFace Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add docs Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix comment Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix bug Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * small fixes for PR Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add some more tests Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try to fix tests adding with_downloads Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * skip tests with tokenizer download Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> --------- Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com> Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Implement new vocoder dataset (#6670) * [TTS] Implement new vocoder dataset Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Redo config structure, minor fixes Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix alignment logging Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix script usage example Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fixed epoch LR scheduling Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Support .nemo checkpoint in FP callback Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Remove align interpolator Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Remove HiFi-GAN defaults list interpolation Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Rename weighted_sample_steps to weighted_sampling_steps_per_epoch Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> * GPT inference long context (#6687) * deb infer Signed-off-by: Evelina <ebakhturina@nvidia.com> * deb infer Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * dont do maxlen trunc for non abs pos emb Signed-off-by: Evelina <ebakhturina@nvidia.com> * dont do maxlen trunc for non abs pos emb Signed-off-by: Evelina <ebakhturina@nvidia.com> * convert for training only Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add eval test, add save .nemo for sft model Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * jenkins format fix Signed-off-by: Evelina <ebakhturina@nvidia.com> * update jenkins Signed-off-by: Evelina <ebakhturina@nvidia.com> * update jenkins Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: Evelina <ebakhturina@nvidia.com> * remove test, ci timeout Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix for m_gpt_eval.py Signed-off-by: Evelina <ebakhturina@nvidia.com> * jenkins test Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix gpt_eval with sft model Signed-off-by: Evelina <ebakhturina@nvidia.com> * revert jenkins Signed-off-by: Evelina <ebakhturina@nvidia.com> * keep float conversion for model.generate() Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix inference dtype Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * TDT model pull request (#6536) * TDT model pull request, initial draft Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TDT PR WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT PR WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT PR WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * TDT WIP Signed-off-by: Hainan Xu <hainanx@nvidia.com> * addressed some review comments, part1 Signed-off-by: Hainan Xu <hainanx@nvidia.com> * addressed some review comments, part1, one line fix Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add tests for comparing TDT alphas with pytorch VS kernel computation Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add tests for comparing multiblank alphas with pytorch VS kernel computation Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add tests for fixed case computation for TDT Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments for greedy-batch decoding for TDT Signed-off-by: Hainan Xu <hainanx@nvidia.com> * include config for TDT model with stateless decoders Signed-off-by: Hainan Xu <hainanx@nvidia.com> * add reference to TDT in Readme Signed-off-by: Hainan Xu <hainanx@nvidia.com> * slight modification of config file comments Signed-off-by: Hainan Xu <hainanx@nvidia.com> * addressed more comments Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more detailed comments for tdt kernel Signed-off-by: Hainan Xu <hainanx@nvidia.com> * one line fix Signed-off-by: Hainan Xu <hainanx@nvidia.com> * fixed small bug that results in test fails for rnnt_decoding Signed-off-by: Hainan Xu <hainanx@nvidia.com> * fixed small bug that results in test fails for rnnt_decoding Signed-off-by: Hainan Xu <hainanx@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed small bug that results in test fails for rnnt_decoding Signed-off-by: Hainan Xu <hainanx@nvidia.com> * remove unused import Signed-off-by: Hainan Xu <hainanx@nvidia.com> --------- Signed-off-by: Hainan Xu <hainanx@nvidia.com> Co-authored-by: Hainan Xu <hainanx@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix get_parameters when using main params optimizer (#6764) (#6787) * fix get param * change name --------- Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Lddl bert (#6761) (#6790) * initial POC for LDDL Bert * Finish LDDL POC * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge head * resolving merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for val/test loaders * change to new LDDL class + add winding * fix logging level * fix winding * test fix * fixes to winding * add file system * add prepemption optimizations * more logging * more prints * better logging * asfsf * add barrier * removing prints * working with mb lddl loader * final changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update requirements file with LDDL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert adding to requirements --------- Signed-off-by: wdykas <wdykas@nvidia.com> Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Fix check (#6798) (#6800) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> * Fix validation with drop_last=False (#6704) Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * SDE unt lvl comparison (#6669) Added a visual utterance-level comparison of two ASR models Signed-off-by: George <gzelenfroind@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Debug Transformer Engine FP8 support with Megatron-core infrastructure (#6791) * Construct FP8 amax reduction group Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Megatron-core version in CI Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Lora/PEFT training script CI test (#6664) * new lora test Signed-off-by: arendu <adithya.r@gmail.com> * updates Signed-off-by: arendu <adithya.r@gmail.com> * check for chat Signed-off-by: arendu <adithya.r@gmail.com> * update Signed-off-by: arendu <adithya.r@gmail.com> * update Signed-off-by: arendu <adithya.r@gmail.com> * small train set Signed-off-by: arendu <adithya.r@gmail.com> * update Signed-off-by: arendu <adithya.r@gmail.com> * precision change Signed-off-by: arendu <adithya.r@gmail.com> * fixed typo in paths Signed-off-by: arendu <adithya.r@gmail.com> * full data with limit val batches Signed-off-by: arendu <adithya.r@gmail.com> * tp2 instead of pp2 Signed-off-by: arendu <adithya.r@gmail.com> * tp2 instead of pp2 Signed-off-by: arendu <adithya.r@gmail.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Adi Renduchintala <adithya.r@gmail.com> * change branch to main, small fix (#6803) Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add call to p2p overlap (#6779) (#6786) * add call to p2p overlap * update Jenkins for test --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Eric Harper <complex451@gmail.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * fixed decor to show messages only when the wrapped object is called. (#6793) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Bug fix for reset_sequence_parallel_args (#6802) (#6805) Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com> Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com> * text_generation_utils memory reduction if no logprob needed (#6773) * repro for gpt eval mp mem issue Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add print statements for memory allocation Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * adjusted hot fix that prevents softmax on the entire output embedding,now memory bottlenecked by attention softmax which needs to be solved with FA or long attention Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using compute_logprob to configure inference Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * enable compute logprob for peft Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * remove print statements Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added docstrings Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing config Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * remove truncate prompt length feature Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor before all gather needs to be contiguous Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> * Fixed bug in MaskedSpecAug that overestimates samples. (#6775) Signed-off-by: tbartley94 <tbartley@nvidia.com> * update core version (#6817) (#6819) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> * lora pp2 (#6818) Signed-off-by: arendu <adithya.r@gmail.com> * Add optional index mapping dir in mmap text datasets (#6683) If datasets are stored on a read-only medium, index files cannot be created into adjacent files and an alternative directory must be specified for index mapping files. This commit adds an optional `index_mapping_dir` to the constructors. Unit tests are also added. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Update path formatting for relative paths Signed-off-by: Greg Heinrich <gheinrich@nvidia.com> * Add inference kv cache support for transformer TE path (#6627) * Add kv cache support for transformer TE path Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Mark get_data_parallel_group as WAR Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initialize process group for FP8 training Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Megatron GPT eval script for non-FP8 path Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Yen-Shi Wang <6960565+yen-shi@users.noreply.github.com> Co-authored-by: Yen-Shi Wang <yenshiw@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Support large inputs to Conformer and Fast Conformer (#6556) * initial commit Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * typos Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * tweaks to padding Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * comments Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * attempt at first working version Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * typos and fixed p calculation Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing merge artifacts Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * typo Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unnessary imports Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * if batch split succeeded no need to conv again Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding channel wise split Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding reference to pytorch issue 80020 Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * removing time chunking methods Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * accounting for the actual self._stride value Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * limiting the fix to dw_striding subsampling Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * renamed methods Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * one more accounting for the actual self._stride value Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * support for causal convs Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * option to set conv chunking size manually * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing imports * subsampling test Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename variable Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * imports in test Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * more runtime checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * a more careful test Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * bug in causal Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix in causal Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * change_conv_chunking_factor methods Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * renamed methods Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disabling chunking by default Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * changing default chunking to auto Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * only split if needed Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only split if needed Signed-off-by: Dima Rekesh <bmwshop@gmail.com> --------- Signed-off-by: Dima Rekesh <bmwshop@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * sharded_manifests updated docs (#6833) Signed-off-by: Dima Rekesh <bmwshop@gmail.com> * added fc-xl, xxl and titanet-s models (#6832) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * add reference to our paper (#6821) * add reference to our paper Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * add paper reference to docs Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> --------- Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * Upperbound Numpy to < 1.24 (#6829) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Multi-lookahead cache-aware streaming models (#6711) * added methods. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added methods. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added initial code. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added initial code. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added initial code. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added config files. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed bugs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated confs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated confs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated confs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated confs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * improved f.conv1d Signed-off-by: Vahid <vnoroozi@nvidia.com> * pulled from main. Signed-off-by: Vahid <vnoroozi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pulled from main. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added postpostnorm. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed the target continiouse bug. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added dw_striding causal. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added print for debugging. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added print for debugging. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed causal convolutions. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added _midnorm. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed transcribe. Signed-off-by: Vahid <vnoroozi@nvidia.com> * cleaned code. Signed-off-by: Vahid <vnoroozi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * moved back configs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * moved back configs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated fast emit for FC models. Signed-off-by: Vahid <vnoroozi@nvidia.com> * updated fast emit for FC models. Signed-off-by: Vahid <vnoroozi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed bug. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed bug and addressed comments. Signed-off-by: Vahid <vnoroozi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed configs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fixed configs. Signed-off-by: Vahid <vnoroozi@nvidia.com> * dropped the test. Signed-off-by: Vahid <vnoroozi@nvidia.com> --------- Signed-off-by: Vahid <vnoroozi@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * added changes to ramp up bs (#6799) * rampup bs changes Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * rampup bs changes Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * fixed styling Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * fix bug Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> --------- Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Fix typo in core.rst (#6838) Signed-off-by: Dounx <imdounx@gmail.com> * add back ptuning pp2 test (#6394) Signed-off-by: arendu <adithya.r@gmail.com> * t5 lora tuning (#6612) * t5 lora Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * eval lora t5 Signed-off-by: arendu <adithya.r@gmail.com> * adjust differernt lora dims Signed-off-by: arendu <adithya.r@gmail.com> * minor changes Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix for state_dict Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com> Co-authored-by: David <amosalla@asu.edu> * NFA updates (#6695) * update V_NEGATIVE_NUM constant to make better use of torch.float32 range Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * adjust backpointers dtype if U_max too large Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Remove print statements Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Remove need for user to specify model_downsample_factor Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * change model.cfg.sample_rate to model.cfg.preprocessor.sample_rate Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add check to make sure that window_stride is in model.cfg.preprocessor Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * reduce memory consumption of backpointers by making them relative instead of absolute Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update librosa.get_duration() 'filename' param to 'path' Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Do not throw error if 'text' or 'pred_text' are empty and make sure CTM filepaths in the output manifest are null Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * preprocess input text by removing any duplicate spaces and converting any newlines to spaces Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Use Utterance dataclass instead of dictionaries for keeping track of token/word/segment alignments Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * refactor so can save alignments as ctm and ass format files Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix bugs for saving character based ASS files and for using pred_text to do alignment Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Make token level .ass file use tokens with recovered capitalization Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Do not try to generate alignment files if text or pred text is empty, or if number of tokens is too large for T Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename output manifest file to say '_with_output_file_paths.json' Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add flag to resegment ass subtitle file to fill available text space Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Fix bug in resegmentation code Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Fix bug which skipped some utterances if batch_size more than 1 Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * reduce memory requirements by doing torch.gather on a slice of the log probs when they are needed Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * reduce memory requirements by not saving whole v_matrix Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove any extra spaces in pred_text Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused list pred_text_all_lines Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * support using hybrid Transducer-CTC models for alignment Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * fix typo - add brackets to torch.cuda.is_available() Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * make sure token case restoration will work if superscript or subscript num is in text Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove any BOM from input text Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * pick out 1st hypotheses if there is a tuple of them Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Remove print statement Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add detail to error message if fail to recover capitalization of tokens Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add flag use_local_attention Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename additional_ctm_grouping_separator -> additional_segment_grouping_separator Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update description of additional_segment_grouping_separator Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add simple docstring to get_utt_obj function Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Make docstring for add_t_start_end_to_utt_obj Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update docstrings for add_t_start_end_to_utt_obj and get_batch_variables Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update README and comments in align.py Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * change 'ground truth' -> 'reference text' in documentation Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add header Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add comments to get_utt_obj function Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * move constants so they are after imports Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add file description for make_ass_files Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * get rid of Utterance object's S attribute, and correct tests so they pass now Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove some unused variables Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove unused variable model from functions saving output files Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove unused var minimum_timestamp_duration from make_ass_files functions and return utt_obj Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * move minimum_timestamp_duration param to CTMFileConfig Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove unused enumerate and unused import Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * switch reading duration from librosa to soundfile to avoid filename/path deprecation message Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Added rouge monitoring support for T5 (#6737) * Added rouge monitoring support for t5 Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention (#6666) * move to nvidia megatron repo (#6465) (#6475) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Megatron KERPLE positional embeddings (#6478) (#6480) * [TTS] FastPitch adapter fine-tune and conditional layer normalization (#6416) [TTS] FastPitch adapter fine-tune and conditional layer normalization (#6416) --------- * [TTS] whitelist broken path fix. (#6412) * [TTS] whitelist broken path fix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * [TTS] FastPitch speaker encoder (#6417) * Add initial codes * Remove wemb * Fix import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore aligner loss * Add ConditionalInput * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error and support pre-trained config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change copyright and random weight test * Add initial codes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix import error * Add initial codes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset error * Remove reference speaker embedding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove SV encoder * Follow comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix length type * Fix append * Move error msg * Add look-up into speaker encoder * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add valueerror msg * Move lookup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error * Rebase and Fix error * Fix spk encoder * Rename n_speakers * Follow comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix n_speakers None error --------- * Sharded manifests for tarred datasets (#6395) * testing sharded manifests * compatibility * proper fixes * adding flag tot convert_to_tarred_audio_dataset * shard_manifests conf param * propagating the shard_manifests param * propagating the shard_manifests param * distributed checks * typo * typo * fixes * fixes * fixes * fixes * fixes * fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes based on PR comments and tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes to convert_to_tarred_audio_dataset.py * reversing manifest shards flag * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests * excluding manifests from webdataset url expansion * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * expand manifest paths before attempting to cache from datastore * explicit use of UTF-8 for manifest i/o --------- * Update wfst_text_normalization.rst (#6374) Add Hungarian (incoming in NeMo-text-processing) * Support Swiglu in TP PP Conversion (#6437) (#6451) * Support Swiglu in TP PP Conversion * Guard activation * Guard activation --------- * Update NeMo_TTS_Primer.ipynb (#6436) * Update NeMo_TTS_Primer.ipynb Changed a mistake in line 782. Instead of frequency band (ie. pitch) we should write frequency bin. Note that frequency bins in FFT are not related to pitch. * Update NeMo_TTS_Primer.ipynb Corrected the description of spectrogram and mel spectrogram calculations in lines 782 & 783 and added a fourth point to the description and added a reference for more mathematical details at the end of this point. --------- * add rampup batch size support for Megatron GPT (#6424) * added rampup batch size support * added tests for rampup batch size * fixed the typos * added assertions * changed assertion rules * deleted unused imports * changed tests for rampup batch size * updated rampup batch size tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed styling * rampup batch size tests changes --------- * Meagtron encoder decoder fix for empty validation outputs (#6459) (#6461) * 1. Meagtron encoder decoder fix for empty validation outputs. * 1. Debugging. --------- * Code-Switching dataset creation - upgrading to aggregate tokenizer manifest format (#6448) * added functionality to create agg tokenizer compatible manifest for CS, flag to use this mode by default * updated README with the new agg_tokenizer_manifest flag * fixed typo in scripts/speech_recognition/code_switching/README.md * changed agg_tokenizer_manifest to is_lid_manifest --------- * Added/updated new Conformer configs (#6426) (#6467) * Update script for ngram rnnt and hat beam search decoding (#6370) * add rnnt ngram beamsearch script * add return encoding embedding option * update script * add rnnt and hat ngram decoding script * add some parameters * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add return_encoder_embeddings parameter to RNNTDecodingConfig * replace return_encoder_embeddings parameter * generalization of scipt behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove return_encoder_embeddings parameter * remove return_encoder_embeddings parameter * add manual encoder_embeddings calculation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix beam_width value to 8 * fix rescoring description --------- * BERT pre-training mp fork to spawn (#6442) (#6454) * change bert fork to spawn * num_workers=0 fix --------- * fix replace_bos_with_pad not found (#6443) (#6450) * reduce workers on NMT CI (#6472) (#6474) * 1. Added KERPLE positional embeddings to encoder-decoder. * 1. Added a missing file. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Fixing commits. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. --------- Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Dima Rekesh <bmwshop@gmail.com> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Micha Livne <mlivne@nvidia.com> Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com> Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com> Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dima Rekesh <bmwshop@gmail.com> Co-authored-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <dpykhtar@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix an invalid link in get_data.py of ljspeech (#6456) Usage of the link in line 63 leads to downloading a html file not a tsv file, so we need to change it to a raw link. Signed-off-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * 1. Added external index sample. (#6462) (#6483) Signed-off-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Update README to add core installation (#6488) (#6489) * update README for megatron-core * fix --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix cache aware hybrid bugs (#6466) (#6484) Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix typos (#6494) (#6495) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add disclaimer about dataset for ASR (#6496) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * fix (#6502) datastore_path_to_webdataset_url(p) if is_datastore_path(p) and is_tarred_path(p) else p NameError: name 'is_tarred_path' is not defined Co-authored-by: George <gzelenfroind@nvidia.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * fix broken links r1.18.0 (#6501) (#6504) * fix broken links * fix broken links --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [TTS] Create functions for TTS preprocessing without dataloader (#6317) * [TTS] Create functions for TTS preprocessing without dataloader Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Cache aware streaming nfa (#6209) * add cache aware streaming to nemo aligner Signed-off-by: Slyne Deng <slyned@nvidia.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [BugFix] Force _get_batch_preds() to keep logits in decoder timestamps generator (#6499) * [BugFix] _get_batch_preds() is forced to keep logits in decoder timestamps generators Signed-off-by: Taejin Park <tango4j@gmail.com> * Ingnore keep_logits boolean in FrameASRBatchLogits Signed-off-by: Taejin Park <tango4j@gmail.com> --------- Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> …
What does this PR do ?
make text_generation_utils.py more memory efficient for inference especially for long context sequences
Collection: nlp
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information