LlamaRMSNorm() Dtype Casting Error #30236

Ritz111 · 2024-04-13T13:27:52Z

System Info

transformers==4.37.2

Who can help?

@ArthurZucker @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

@ArthurZucker @younesbelkada
Hi~ I found a bug in the LlamaRMSNorm(nn.Module) (lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py)

class LlamaRMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        LlamaRMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        input_dtype = hidden_states.dtype
        hidden_states = hidden_states.to(torch.float32)
        variance = hidden_states.pow(2).mean(-1, keepdim=True)
        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
        return self.weight * hidden_states.to(input_dtype)

On the last line, if the input_dtype is bfloat16, the return tensor will still be float32 because the self.weight has been initialized as float32. Thus the last line should be modified to:

return (self.weight * hidden_states).to(input_dtype)

Expected behavior

see above and looking forward to your reply~ Thank you

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-04-16T08:15:18Z

Hi @Ritz111
Thanks ! I think this is not a bug, see: #23535 for more details

GuWei007 · 2024-05-11T10:08:29Z

why should class LlamaRMSNorm do ”hidden_states = hidden_states.to(torch.float32)“ ，why not flow the type promotion rules of PyToch ops

GuWei007 · 2024-05-13T06:54:58Z

self.weight is bf16，hidden_states is fp32
I found that the dtype of these two methods are different.
method 1:
return (self.weight * hidden_states).to(input_dtype) # (bf16 * fp32).to(input_dtype)
method 2:
return self.weight * hidden_states.to(input_dtype) # bf16 * bf16

github-actions · 2024-06-06T08:04:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

GuWei007 mentioned this issue May 16, 2024

upstream apex.normalization.FusedRMSNorm pytorch/pytorch#72643

Open

github-actions bot closed this as completed Jun 14, 2024

ArthurZucker mentioned this issue Aug 28, 2024

Potential RMSNorm precision issue #33133

Closed

4 tasks

konstantinjdobler mentioned this issue Jan 28, 2025

Mixed-precision with torch.autocast is broken for many models when using attn_implementation="flash_attention_2" #35945

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaRMSNorm() Dtype Casting Error #30236

LlamaRMSNorm() Dtype Casting Error #30236

Ritz111 commented Apr 13, 2024 •

edited

Loading

younesbelkada commented Apr 16, 2024

GuWei007 commented May 11, 2024

GuWei007 commented May 13, 2024 •

edited

Loading

github-actions bot commented Jun 6, 2024

LlamaRMSNorm() Dtype Casting Error #30236

LlamaRMSNorm() Dtype Casting Error #30236

Comments

Ritz111 commented Apr 13, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented Apr 16, 2024

GuWei007 commented May 11, 2024

GuWei007 commented May 13, 2024 • edited Loading

github-actions bot commented Jun 6, 2024

Ritz111 commented Apr 13, 2024 •

edited

Loading

GuWei007 commented May 13, 2024 •

edited

Loading