Skip to content

Conversation

@gante
Copy link
Member

@gante gante commented Aug 20, 2025

What does this PR do?

Fixes #38224

This PR:

  • Makes the explicit RoPE factor take precedence over the implicit RoPE factor
  • On YaRN validation, throw warnings if it has missing/bad parameterization (related to the implicit RoPE factor)

# values to compute the default attention scaling factor, instead of using `factor`.
if "original_max_position_embeddings" in config.rope_scaling:
original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"]
factor = config.max_position_embeddings / original_max_position_embeddings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the implicit factor is taking precedence, which shouldn't happen

(validation is added below, in the validation function)

@gante gante changed the title [RoPE] explicit factor > implicit factor in Yarn [RoPE] explicit factor > implicit factor in YaRN Aug 20, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

# values to compute the default attention scaling factor, instead of using `factor`.
if "original_max_position_embeddings" in config.rope_scaling:
original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"]
factor = config.max_position_embeddings / original_max_position_embeddings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this deletes factor and it seems to be still used below with attention_factor 👀

Copy link
Member Author

@gante gante Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zucchini-nlp factor is defined a few lines above (L219). The line deleted here is a redefinition, where factor is implicitly derived from other parameters.

We should use explicit parameterization and warn when the defined parameters don't match as a whole (which is what this PR does 🤗 )

Comment on lines +500 to +507
if implicit_factor != factor:
logger.warning_once(
f"The explicitly set RoPE scaling factor (config.rope_scaling['factor'] = {factor}) does not match "
"the ratio implicitly set by other parameters (implicit factor = "
"post-yarn context length / pre-yarn context length = "
"config.max_position_embeddings / config.rope_scaling['original_max_position_embeddings'] = "
f"{implicit_factor}). Using the explicit factor ({factor}) in YaRN. This may cause unexpected "
"behaviour in model usage, please correct the 'max_position_embeddings' fields in the model config."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw an error instead, if users explicitly set config values to be mismatching?

Copy link
Member Author

@gante gante Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed in theory, but it can be breaking for some models on the Hub 😢 As such, I believe a warning is more adequate.

@gante
Copy link
Member Author

gante commented Aug 26, 2025

@zucchini-nlp I see likes in my replies, so I'm assuming you're happy with the PR :p merging

@gante gante merged commit 6451294 into huggingface:main Aug 26, 2025
24 checks passed
@gante gante deleted the rope_check_implicit_factor branch August 26, 2025 13:58
@zifeitong
Copy link
Contributor

I think this PR is the root cause of #40461.

  File "/opt/venv/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1319, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 808, in from_dict
    config = cls(**config_dict)
             ^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/configuration_gpt_oss.py", line 105, in __init__
    rope_config_validation(self)
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py", line 630, in rope_config_validation
    validation_fn(config, ignore_keys=ignore_keys)
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py", line 499, in _validate_yarn_parameters
    implicit_factor = config.max_position_embeddings / original_max_position_embeddings
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 207, in __getattribute__
    return super().__getattribute__(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'GptOssConfig' object has no attribute 'max_position_embeddings'

In GptOssConfig, max_position_embeddings is defined after rope_config_validation().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

YaRN: factor is not effective with original_max_position_embeddings

5 participants