-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade NeMo to latest mcore and TE #7862
Conversation
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml
Outdated
Show resolved
Hide resolved
examples/nlp/language_modeling/conf/megatron_gpt_inference.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
jenkins |
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
2 similar comments
jenkins |
jenkins |
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
@@ -3150,51 +3160,51 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"''' | |||
sh "rm -rf examples/nlp/language_modeling/gpt_index_mappings" | |||
} | |||
} | |||
stage('L2: Megatron GPT with Rope Pretraining and Resume Training TP=2') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericharper this one uses position_embedding_type=rope
which causes an error (TE bug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not even using mcore_gpt or transformer_engine flags though....
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
Signed-off-by: dimapihtar <dpihtar@gmail.com>
jenkins |
jenkins |
Signed-off-by: eharper <eharper@nvidia.com>
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
* mcore upgrade Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove GPTEmbedding import (deprecated) Signed-off-by: dimapihtar <dpihtar@gmail.com> * switch to LanguageModelEmbedding Signed-off-by: dimapihtar <dpihtar@gmail.com> * reset config Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pass attn_mask-type through the forward method Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * reset conf Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * add more attn_mask_type params Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attn_,ask_type fixes Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove attn_mask_type param Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * t5/mt5 fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert configs Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert configs Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove attn_mask_type param Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * add TE installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * add env var Signed-off-by: dimapihtar <dpihtar@gmail.com> * comment out rope test Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * add -e Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> * revert jenkins test comment Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: eharper <eharper@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: eharper <eharper@nvidia.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* mcore upgrade Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove GPTEmbedding import (deprecated) Signed-off-by: dimapihtar <dpihtar@gmail.com> * switch to LanguageModelEmbedding Signed-off-by: dimapihtar <dpihtar@gmail.com> * reset config Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pass attn_mask-type through the forward method Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * reset conf Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * add more attn_mask_type params Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attn_,ask_type fixes Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove attn_mask_type param Signed-off-by: dimapihtar <dpykhtar@nvidia.com> * t5/mt5 fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert configs Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert configs Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove attn_mask_type param Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * add TE installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * add env var Signed-off-by: dimapihtar <dpihtar@gmail.com> * comment out rope test Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * update mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore installation Signed-off-by: dimapihtar <dpihtar@gmail.com> * change mcore commit Signed-off-by: dimapihtar <dpihtar@gmail.com> * add -e Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> * revert jenkins test comment Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: eharper <eharper@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: eharper <eharper@nvidia.com>
What does this PR do ?
Upgrades NeMo to latest mcore and TE versions.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information