-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFT patch: (1) enable sequence parallelism and (2) enable profile #7963
Conversation
Signed-off-by: Sangkug Lym <slym@nvidia.com>
self._reset_activation_checkpointing_args() | ||
self._reset_sequence_parallelism_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good we need to remove from the constructor. But we probably need conditionals for the other calls.
@hsiehjackson could you please review?
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiehjackson |
@erhoo82 for inference do you mean generation mode instead of calculating validation loss? This assert needs config |
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins |
…IDIA#7963) * SFT profile start and end step fix Signed-off-by: Sangkug Lym <slym@nvidia.com> * Removed sequence parallelism assertion check Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…IDIA#7963) * SFT profile start and end step fix Signed-off-by: Sangkug Lym <slym@nvidia.com> * Removed sequence parallelism assertion check Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
…IDIA#7963) * SFT profile start and end step fix Signed-off-by: Sangkug Lym <slym@nvidia.com> * Removed sequence parallelism assertion check Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
What does this PR do ?
SFT patch: (1) enable sequence parallelism and (2) enable profile
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information