Cohere: update RoPE structure #33408

gante · 2024-09-10T15:05:51Z

What does this PR do?

This PR propagates the updates to the RoPE structure to cohere -- the logic for RoPE was abstracted into a separate module for llama3.1 (#32135). Using the new structure, a model has access to all RoPE scaling strategies.

While touching the modeling code, I've taken the liberty to:

update copied from statements, which were disabled in previous PRs;
update (postpone) deprecation messages to ensure deprecated features are removed from all models in the same version.

✅ all slow tests passing

Note: #31999 was originally open to migrate all modern RoPE models into the upgraded structure. However, working on cohere, I noticed that there may be important implementation differences in RoPE. As such, I'll be opening multiple PRs, batching similar RoPE implementations together.

gante · 2024-09-10T15:06:48Z

src/transformers/models/cohere/configuration_cohere.py

@@ -79,6 +80,43 @@ class CohereConfig(PretrainedConfig):
            Whether to tie weight embeddings
        rope_theta (`float`, *optional*, defaults to 10000.0):
            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):


This is copy/paste from llama

gante · 2024-09-10T15:10:31Z

src/transformers/models/cohere/modeling_cohere.py

+    # Note: the forward pass of this RoPE is slightly different from Llama's, resulting in different `sin`/`cos` for
+    # the same parameterization. The differences are highlighted with a comment.


Aside from the line highlighted with a comment, this is copy/paste from llama

gante · 2024-09-10T15:15:36Z

@LysandreJik a PR like this one will be open for a few more modern models. Since part of the changes consists of having a global view of the model to update the copied from statements, would you like me to update the import structures as well? 🤗

HuggingFaceDocBuilderDev · 2024-09-10T15:29:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

This looks clean, nice to reuse the llama code

LysandreJik · 2024-09-12T09:33:53Z

@LysandreJik a PR like this one will be open for a few more modern models. Since part of the changes consists of having a global view of the model to update the copied from statements, would you like me to update the import structures as well? 🤗

No need to updat the import structure for now!

gante commented Sep 10, 2024

View reviewed changes

gante requested a review from LysandreJik September 10, 2024 15:14

gante added 5 commits September 10, 2024 15:37

cohere rope refactor

d1bc8f6

tmp commit

55445ff

more copies

9eaaf93

working slow tests

66cbc7a

comments

14766d0

gante force-pushed the unify_rope branch from 6f9ad5b to 14766d0 Compare September 10, 2024 15:37

LysandreJik approved these changes Sep 12, 2024

View reviewed changes

gante merged commit 95e816f into huggingface:main Sep 16, 2024
17 checks passed

gante deleted the unify_rope branch September 16, 2024 08:45

itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024

Cohere: update RoPE structure (huggingface#33408)

ab55574

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024

Cohere: update RoPE structure (huggingface#33408)

c62669a

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Cohere: update RoPE structure (huggingface#33408)

55a2328

BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024

Cohere: update RoPE structure (huggingface#33408)

94145f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cohere: update RoPE structure #33408

Cohere: update RoPE structure #33408

gante commented Sep 10, 2024 •

edited

Loading

gante Sep 10, 2024

gante Sep 10, 2024

gante commented Sep 10, 2024

HuggingFaceDocBuilderDev commented Sep 10, 2024

LysandreJik left a comment

LysandreJik commented Sep 12, 2024

		# Note: the forward pass of this RoPE is slightly different from Llama's, resulting in different `sin`/`cos` for
		# the same parameterization. The differences are highlighted with a comment.

Cohere: update RoPE structure #33408

Cohere: update RoPE structure #33408

Conversation

gante commented Sep 10, 2024 • edited Loading

What does this PR do?

gante Sep 10, 2024

Choose a reason for hiding this comment

gante Sep 10, 2024

Choose a reason for hiding this comment

gante commented Sep 10, 2024

HuggingFaceDocBuilderDev commented Sep 10, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Sep 12, 2024

gante commented Sep 10, 2024 •

edited

Loading