-
Notifications
You must be signed in to change notification settings - Fork 31k
[RoPE] explicit factor > implicit factor in YaRN #40320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # values to compute the default attention scaling factor, instead of using `factor`. | ||
| if "original_max_position_embeddings" in config.rope_scaling: | ||
| original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"] | ||
| factor = config.max_position_embeddings / original_max_position_embeddings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here the implicit factor is taking precedence, which shouldn't happen
(validation is added below, in the validation function)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
| # values to compute the default attention scaling factor, instead of using `factor`. | ||
| if "original_max_position_embeddings" in config.rope_scaling: | ||
| original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"] | ||
| factor = config.max_position_embeddings / original_max_position_embeddings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this deletes factor and it seems to be still used below with attention_factor 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp factor is defined a few lines above (L219). The line deleted here is a redefinition, where factor is implicitly derived from other parameters.
We should use explicit parameterization and warn when the defined parameters don't match as a whole (which is what this PR does 🤗 )
| if implicit_factor != factor: | ||
| logger.warning_once( | ||
| f"The explicitly set RoPE scaling factor (config.rope_scaling['factor'] = {factor}) does not match " | ||
| "the ratio implicitly set by other parameters (implicit factor = " | ||
| "post-yarn context length / pre-yarn context length = " | ||
| "config.max_position_embeddings / config.rope_scaling['original_max_position_embeddings'] = " | ||
| f"{implicit_factor}). Using the explicit factor ({factor}) in YaRN. This may cause unexpected " | ||
| "behaviour in model usage, please correct the 'max_position_embeddings' fields in the model config." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we throw an error instead, if users explicitly set config values to be mismatching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed in theory, but it can be breaking for some models on the Hub 😢 As such, I believe a warning is more adequate.
|
@zucchini-nlp I see likes in my replies, so I'm assuming you're happy with the PR :p merging |
|
I think this PR is the root cause of #40461. In |
What does this PR do?
Fixes #38224
This PR: