[RoPE] explicit factor > implicit factor in YaRN #40320

gante · 2025-08-20T14:27:43Z

What does this PR do?

This PR:

Makes the explicit RoPE factor take precedence over the implicit RoPE factor
On YaRN validation, throw warnings if it has missing/bad parameterization (related to the implicit RoPE factor)

gante · 2025-08-20T14:28:35Z

src/transformers/modeling_rope_utils.py

-    # values to compute the default attention scaling factor, instead of using `factor`.
-    if "original_max_position_embeddings" in config.rope_scaling:
-        original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"]
-        factor = config.max_position_embeddings / original_max_position_embeddings


here the implicit factor is taking precedence, which shouldn't happen

(validation is added below, in the validation function)

HuggingFaceDocBuilderDev · 2025-08-20T14:39:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

thanks!

zucchini-nlp · 2025-08-21T10:29:14Z

src/transformers/modeling_rope_utils.py

-    # values to compute the default attention scaling factor, instead of using `factor`.
-    if "original_max_position_embeddings" in config.rope_scaling:
-        original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"]
-        factor = config.max_position_embeddings / original_max_position_embeddings


this deletes factor and it seems to be still used below with attention_factor 👀

@zucchini-nlp factor is defined a few lines above (L219). The line deleted here is a redefinition, where factor is implicitly derived from other parameters.

We should use explicit parameterization and warn when the defined parameters don't match as a whole (which is what this PR does 🤗 )

zucchini-nlp · 2025-08-21T10:30:42Z

src/transformers/modeling_rope_utils.py

+        if implicit_factor != factor:
+            logger.warning_once(
+                f"The explicitly set RoPE scaling factor (config.rope_scaling['factor'] = {factor}) does not match "
+                "the ratio implicitly set by other parameters (implicit factor = "
+                "post-yarn context length / pre-yarn context length = "
+                "config.max_position_embeddings / config.rope_scaling['original_max_position_embeddings'] = "
+                f"{implicit_factor}). Using the explicit factor ({factor}) in YaRN. This may cause unexpected "
+                "behaviour in model usage, please correct the 'max_position_embeddings' fields in the model config."


should we throw an error instead, if users explicitly set config values to be mismatching?

Agreed in theory, but it can be breaking for some models on the Hub 😢 As such, I believe a warning is more adequate.

gante · 2025-08-26T13:58:22Z

@zucchini-nlp I see likes in my replies, so I'm assuming you're happy with the PR :p merging

zifeitong · 2025-08-26T23:53:19Z

I think this PR is the root cause of #40461.

  File "/opt/venv/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1319, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 808, in from_dict
    config = cls(**config_dict)
             ^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/configuration_gpt_oss.py", line 105, in __init__
    rope_config_validation(self)
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py", line 630, in rope_config_validation
    validation_fn(config, ignore_keys=ignore_keys)
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py", line 499, in _validate_yarn_parameters
    implicit_factor = config.max_position_embeddings / original_max_position_embeddings
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 207, in __getattribute__
    return super().__getattribute__(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'GptOssConfig' object has no attribute 'max_position_embeddings'

In GptOssConfig, max_position_embeddings is defined after rope_config_validation().

explicit factor > implicit factor

1ad7066

gante requested review from ArthurZucker and zucchini-nlp August 20, 2025 14:27

gante commented Aug 20, 2025

View reviewed changes

gante changed the title ~~[RoPE] explicit factor > implicit factor in Yarn~~ [RoPE] explicit factor > implicit factor in YaRN Aug 20, 2025

gante mentioned this pull request Aug 20, 2025

YaRN: factor is not effective with original_max_position_embeddings #38224

Closed

ArthurZucker approved these changes Aug 20, 2025

View reviewed changes

zucchini-nlp reviewed Aug 21, 2025

View reviewed changes

gante merged commit 6451294 into huggingface:main Aug 26, 2025
24 checks passed

gante deleted the rope_check_implicit_factor branch August 26, 2025 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RoPE] explicit factor > implicit factor in YaRN #40320

[RoPE] explicit factor > implicit factor in YaRN #40320

Uh oh!

gante commented Aug 20, 2025

Uh oh!

gante Aug 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 20, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

zucchini-nlp Aug 21, 2025

Uh oh!

gante Aug 21, 2025 •

edited

Loading

Uh oh!

zucchini-nlp Aug 21, 2025

Uh oh!

gante Aug 21, 2025 •

edited

Loading

Uh oh!

gante commented Aug 26, 2025

Uh oh!

Uh oh!

zifeitong commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[RoPE] explicit factor > implicit factor in YaRN #40320

[RoPE] explicit factor > implicit factor in YaRN #40320

Uh oh!

Conversation

gante commented Aug 20, 2025

What does this PR do?

Uh oh!

gante Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 20, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

gante Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

gante Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante commented Aug 26, 2025

Uh oh!

Uh oh!

zifeitong commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gante Aug 21, 2025 •

edited

Loading

gante Aug 21, 2025 •

edited

Loading