-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra linear scaling in LlamaRotaryEmbedding
classes
#29765
Comments
Hi @gtebbutt 👋 It is not a bug, but a backward compatibility-intended duplication. If you look at the code, the line In any case, it is a fair doubt -- I had to double-check the code, despite being familiar with it! To avoid regressions, I'm opening a PR with a test to confirm the correctness of RoPE scaling, where a duplicated scaling factor would cause the test to fail. |
Thanks @gante - appreciate you taking the time to check, I'd missed that subtlety around the caching! |
I've been doing some work around NTK and YaRN scaling, and I noticed that PR #29198 adds linear scaling
t = t / self.scaling_factor
to the constructor for theLlamaRotaryEmbedding
base class, possibly pulled in fromGPTNeoXLinearScalingRotaryEmbedding
(which uses a different style offorward
function toLlamaLinearScalingRotaryEmbedding
) - if I'm reading this correctly, it's a behavioural change that means the Llama RoPE subclasses are now being scaled twice, once linearly by the base class constructor and then again by their respectiveforward
functions.Should be a simple fix to just take out this one line, assuming I haven't missed anything: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L108
The text was updated successfully, but these errors were encountered: