-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Bugfix] In LongRoPE, decide short vs long based on max_model_len #27431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] In LongRoPE, decide short vs long based on max_model_len #27431
Conversation
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the LongRoPE implementation for Phi-3 models. Previously, the selection between short and long RoPE factors was dynamic, which could lead to KV cache invalidation during generation. The proposed fix makes this decision static at model initialization time, based on max_model_len, ensuring consistency. My review includes a suggestion to improve the implementation by removing a dependency on global configuration, which enhances the component's modularity and testability.
vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py
Show resolved
Hide resolved
|
Can you please provide gsm8k with a |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
@LucasWilkinson done! |
vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Head branch was pushed to by a user without write access
| # For LongRoPE, default to original_max_position_embeddings to avoid | ||
| # performance degradation for shorter sequences | ||
| if rope_scaling is not None and rope_scaling["rope_type"] == "longrope": | ||
| max_model_len = int( | ||
| getattr( | ||
| hf_config, "original_max_position_embeddings", derived_max_model_len | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this not belong above where derived_max_model_len is calculated?
It could come after the first rope_scaling check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to override the default max_model_len to original_max_position_embeddings while still allowing the user to manually specify a max_model_len which is larger than this value. I had originally written the code as modifying derived_max_model_len, but (as I read it) derived_max_model_len is really intended to be max_position_embeddings, because of the later check in line 2143:
If max_model_len is manually set to a value exceeding derived_max_model_len, it will throw an error. Hopefully I'm understanding your question correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explaining, I think I understand now!
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Head branch was pushed to by a user without write access
…lm-project#27431) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>
…lm-project#27431) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…lm-project#27431) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…lm-project#27431) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
…lm-project#27431) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Purpose
Per suggestion by @LucasWilkinson, fixes #27414 by choosing short vs long RoPE factors at init time based on
max_model_lenTest Plan
Functionality
Accuracy
Test Result
Functionality
No longer produces random tokens after threshold
Accuracy
with
max-model-len=4096(new default):with
max-model-len=4097(to trigger long rope)Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.