Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Random] Random state management #38

Merged
merged 10 commits into from
Feb 3, 2023
Merged

[Random] Random state management #38

merged 10 commits into from
Feb 3, 2023

Conversation

comaniac
Copy link
Contributor

@comaniac comaniac commented Feb 2, 2023

Description

  • Add random state management to deal with the requirements of using the same or different random seeds in a TP group. The implementation is based on the one in Megatron-LM.
  • Add an activation checkpointing with the consideration of random states.
  • Add an op DropoutWithTensorParallel that can be replaced by users when writing a schedule.
  • Add unit tests.
  • Disable randomly plugin in pytest. This plugin makes our random seed setup in the test fixture useless.

Notes:

  1. We now offer an API set_random_seed for users to call in the training script. Users have to manually call it and specify the rank of 3D parallelism.
  2. All changes in this PR will have no impact if set_random_seed is not called in advance.
  3. Fidelity testing shows the updated GPT schedule with 3D parallelism could align the loss to ZeRO-3 (with and without activation checkpointing), but flash attention has to be disabled.
  4. I'll update the flash attention to the latest one and see if the problem will gone.

Checklist

  • PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

cc @szhengac @chhzh123

Copy link
Contributor

@chhzh123 chhzh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @comaniac.

tests/test_shard_sync_op.py Show resolved Hide resolved
conftest.py Show resolved Hide resolved
# Note 1: We assume no DP and PP in this script.
# Note 2: This overrides Megatron random seed management, so we only use
# this script for benchmarking.
slapo.set_random_seed(2013, None, None, sch.rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly. all the DP ranks also use the same seed so the loss wouldn't be right, but we only use this script for benchmarking

@szhengac
Copy link
Contributor

szhengac commented Feb 3, 2023

LGTM

@szhengac szhengac merged commit d93764c into awslabs:main Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Random seed management for dropout layers in distributed environment
3 participants