Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to enable/disable act ckpt and seq parallelism in GPT #6327

Merged
merged 14 commits into from
Apr 13, 2023

Conversation

markelsanz14
Copy link
Contributor

What does this PR do ?

Adds functions to enable and disable activation checkpointing and sequence parallelism. This way, we can enable them during training and disable them during inference/generation.

Collection: nemo/collections/nlp/language_modeling

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
pre-commit-ci bot and others added 2 commits March 29, 2023 23:55
@MaximumEntropy
Copy link
Contributor

This looks good to me and I've been using this for SFT models to turn this on/off between training and validation. I'll let @ericharper do a final review.

markelsanz14 and others added 2 commits April 11, 2023 14:30
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
ericharper
ericharper previously approved these changes Apr 13, 2023
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
markelsanz14 and others added 2 commits April 13, 2023 13:59
…t function.

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the changes!

@markelsanz14 markelsanz14 merged commit 11a37b3 into main Apr 13, 2023
@markelsanz14 markelsanz14 deleted the markelsanz14/disable_act_ckpt branch April 13, 2023 22:55
hsiehjackson pushed a commit to hsiehjackson/NeMo that referenced this pull request Jun 2, 2023
…IDIA#6327)

* Add ability to enable/disable act ckpt and seq parallelism

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove num_micro_batches_with_partial_activation_checkpoints

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Added property to self.model and added restore/reset config values.

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use self.model property

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed original_act_ckpt

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstrings to reset/restore act ckpt

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Property removed from self.model and replaced with get_gpt_module_list function.

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants