Errors with nn.RMSNorm in DeepSpeed #33176

loadams · 2024-08-28T18:49:51Z

System Info

Using the latest transformers from source (newer than the latest 4.44.2 release tag), the changes in pytorch_utils from this PR add nn.RMSNorm to the list of modules, but nn.RMSNorm isn't added to torch until the torch 2.4 release, causing CI failures when using DeepSpeed unless we either update torch or pin the transformers version.

Who can help?

@muellerzr

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Clone latest DeepSpeed or run CI from hpu_gaudi2.yml workflow, failure here.

Expected behavior

Error out when requiring a specific torch version if it doesn't exist, or similar.

The text was updated successfully, but these errors were encountered:

NielsRogge · 2024-08-28T19:26:02Z

Fix at #33177

loadams · 2024-08-28T19:38:46Z

Thanks @NielsRogge!

loadams · 2024-09-03T16:42:19Z

Fixed in linked PR, thanks!

loadams added the bug label Aug 28, 2024

loadams mentioned this issue Aug 28, 2024

Fix transformers/torch errors on nn.RMSNorm by pinning transformers. microsoft/DeepSpeed#6458

Closed

loadams closed this as completed Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors with nn.RMSNorm in DeepSpeed #33176

Errors with nn.RMSNorm in DeepSpeed #33176

loadams commented Aug 28, 2024 •

edited

Loading

NielsRogge commented Aug 28, 2024

loadams commented Aug 28, 2024

loadams commented Sep 3, 2024

Errors with nn.RMSNorm in DeepSpeed #33176

Errors with nn.RMSNorm in DeepSpeed #33176

Comments

loadams commented Aug 28, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

NielsRogge commented Aug 28, 2024

loadams commented Aug 28, 2024

loadams commented Sep 3, 2024

loadams commented Aug 28, 2024 •

edited

Loading