-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extendability refactors #1290
Extendability refactors #1290
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending GPU test fix
tests/models/layers/test_dmoe.py:135: in test_dmoe
expert_parallel_group = device_mesh['expert_parallel'].get_group(0)
/usr/lib/python3/dist-packages/composer/trainer/_patch_pytorch.py:1041: in device_mesh__getitem__
submesh = _mesh_resources.create_child_mesh(self, mesh_dim_names)
E NameError: name '_mesh_resources' is not defined
@milocress the GPU test is unrelated. It will be fixed by the next composer release (which is why that test isn't marked as required yet) |
This PR includes a few changes for increased extendability of the code:
slice_attention_mask
toMPTBlock
configuration_mpt.py
just for HF checkpointingMPTModel
TrainConfig
Loss before and after: