Extra linear scaling in `LlamaRotaryEmbedding` classes #29765

gtebbutt · 2024-03-20T22:02:54Z

I've been doing some work around NTK and YaRN scaling, and I noticed that PR #29198 adds linear scaling t = t / self.scaling_factor to the constructor for the LlamaRotaryEmbedding base class, possibly pulled in from GPTNeoXLinearScalingRotaryEmbedding (which uses a different style of forward function to LlamaLinearScalingRotaryEmbedding) - if I'm reading this correctly, it's a behavioural change that means the Llama RoPE subclasses are now being scaled twice, once linearly by the base class constructor and then again by their respective forward functions.

Should be a simple fix to just take out this one line, assuming I haven't missed anything: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L108

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-03-21T11:02:05Z

cc @ArthurZucker @gante

gante · 2024-03-22T12:11:45Z

Hi @gtebbutt 👋

It is not a bug, but a backward compatibility-intended duplication. If you look at the code, the line t = t / self.scaling_factor is used exclusively to build the backward compatible sin_cache and cos_cache. These are currently unused in the model forward pass. As such, there is no duplication :)

In any case, it is a fair doubt -- I had to double-check the code, despite being familiar with it! To avoid regressions, I'm opening a PR with a test to confirm the correctness of RoPE scaling, where a duplicated scaling factor would cause the test to fail.

gtebbutt · 2024-03-22T13:09:34Z

Thanks @gante - appreciate you taking the time to check, I'd missed that subtlety around the caching!

gante mentioned this issue Mar 22, 2024

RoPE models: add numerical sanity-check test for RoPE scaling #29808

Merged

amyeroberts added the solved label Mar 22, 2024

gante closed this as completed in #29808 Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra linear scaling in `LlamaRotaryEmbedding` classes #29765

Extra linear scaling in `LlamaRotaryEmbedding` classes #29765

gtebbutt commented Mar 20, 2024

amyeroberts commented Mar 21, 2024

gante commented Mar 22, 2024

gtebbutt commented Mar 22, 2024

Extra linear scaling in LlamaRotaryEmbedding classes #29765

Extra linear scaling in LlamaRotaryEmbedding classes #29765

Comments

gtebbutt commented Mar 20, 2024

amyeroberts commented Mar 21, 2024

gante commented Mar 22, 2024

gtebbutt commented Mar 22, 2024

Extra linear scaling in `LlamaRotaryEmbedding` classes #29765

Extra linear scaling in `LlamaRotaryEmbedding` classes #29765