Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors with nn.RMSNorm in DeepSpeed #33176

Closed
4 tasks
loadams opened this issue Aug 28, 2024 · 3 comments
Closed
4 tasks

Errors with nn.RMSNorm in DeepSpeed #33176

loadams opened this issue Aug 28, 2024 · 3 comments
Labels

Comments

@loadams
Copy link
Contributor

loadams commented Aug 28, 2024

System Info

Using the latest transformers from source (newer than the latest 4.44.2 release tag), the changes in pytorch_utils from this PR add nn.RMSNorm to the list of modules, but nn.RMSNorm isn't added to torch until the torch 2.4 release, causing CI failures when using DeepSpeed unless we either update torch or pin the transformers version.

Who can help?

@muellerzr

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Clone latest DeepSpeed or run CI from hpu_gaudi2.yml workflow, failure here.

Expected behavior

Error out when requiring a specific torch version if it doesn't exist, or similar.

@NielsRogge
Copy link
Contributor

Fix at #33177

@loadams
Copy link
Contributor Author

loadams commented Aug 28, 2024

Thanks @NielsRogge!

@loadams
Copy link
Contributor Author

loadams commented Sep 3, 2024

Fixed in linked PR, thanks!

@loadams loadams closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants