-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gpt trainer test #6915
Fix gpt trainer test #6915
Conversation
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
70e3747
to
fadf272
Compare
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
for more information, see https://pre-commit.ci
fadf272
to
74c6fd8
Compare
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
for more information, see https://pre-commit.ci
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
…into fix-gpt-trainer-test
examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py
Outdated
Show resolved
Hide resolved
nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py
Outdated
Show resolved
Hide resolved
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
…into fix-gpt-trainer-test
for more information, see https://pre-commit.ci
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
…into fix-gpt-trainer-test
for more information, see https://pre-commit.ci
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
for more information, see https://pre-commit.ci
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
cf2f2b9
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, please don't change the default configuration behavior as the old code depends on it.
drop_last: True | ||
context_key: 'input' | ||
label_key: 'output' | ||
drop_last: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change the default behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we drop last if we evaluate with the validation dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one use case that I have to drop the last because I am preparing a dataset that computes contrastive loss. I need to make sure all the batch has the same batch size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set drop_last=False
, we can use pad_samples_to_global_batch_size=True
right? Or this doesn't fulfill your settings? https://github.com/NVIDIA/NeMo/blob/fix-gpt-trainer-test/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py#L788
If we set default drop_last=True
, then we may drop some samples for evaluation which may show incorrect results for comparison.
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
…into fix-gpt-trainer-test
* Add trainer.test() for GPT Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Remove unused part Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add trainer.test() for GPT Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Remove unused part Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix training part Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix references and add CI Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config error Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add metadata Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix empty batch Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix max seq length Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix dataset Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add token f1 Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FA in sft Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add inference config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix pad Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix num batch Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add query_key Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove pdb Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix write json Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset bug and refactor Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add logging for prediction Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix retrain Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add query_key in config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add inference config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix mask Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix mask Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Split PR Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Undo commit Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add query_key to doc_string Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Adjust yzhang123 comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error and follow comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove query key Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove logic and query Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove query from model Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove query_key Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix error Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix pdb Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add default tokens_to_generate Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Revert prompt truncate re-prompt Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip generation with metric loss Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * support GPTSFTChatDataset Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <dorotat@nvidia.com>
* Add trainer.test() for GPT Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Remove unused part Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add trainer.test() for GPT Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Remove unused part Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix training part Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix references and add CI Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config error Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add metadata Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix empty batch Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix max seq length Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix dataset Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add token f1 Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FA in sft Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add inference config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix pad Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix num batch Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add query_key Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove pdb Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix write json Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix dataset bug and refactor Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add logging for prediction Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix retrain Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Add query_key in config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add inference config Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix mask Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix mask Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> * Split PR Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Undo commit Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add query_key to doc_string Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Adjust yzhang123 comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error and follow comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove query key Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove logic and query Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove query from model Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove query_key Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix error Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix pdb Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add default tokens_to_generate Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Revert prompt truncate re-prompt Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip generation with metric loss Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * support GPTSFTChatDataset Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
Collection: [NLP]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information