Llama: RoPE refactor #32135

gante · 2024-07-22T10:23:48Z

What does this PR do?

Same as #31999, but with llama being the only changed model.

Confirmed: slow tests are "passing" (same failures as main)
👉 RUN_SLOW=1 py.test -vv tests/models/llama/test_modeling_llama.py
👉 RUN_SLOW=1 py.test -vv tests/utils/test_cache_utils.py
👉 RUN_SLOW=1 py.test -vv tests/utils/test_modeling_rope_utils.py (new tests)

Throughput benchmarks: No changes vs previous main 💔

src/transformers/models/jetmoe/modeling_jetmoe.py

gante · 2024-07-22T11:28:42Z

src/transformers/models/chameleon/modeling_chameleon.py

+# copied from transformers.models.llama.modeling_llama.LlamaRotaryEmbedding with Llama->Chameleon
+# TODO(joao): add me back asap :)


#31999, which propagates the changes to all models, will fix this.

HuggingFaceDocBuilderDev · 2024-07-22T11:35:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for all the work consolidating the rope logic!

Mostly some small questions and nits. Main comment is about the testing for all the compute functions

src/transformers/modeling_rope_utils.py

amyeroberts · 2024-07-22T11:52:54Z

src/transformers/modeling_rope_utils.py

+        Dictionary containing the scaling configuration for the RoPE embeddings. IMPORTANT: RoPE scaling expects
+        `max_position_embeddings` to remain unchagned -- some methods, like 'longrope', require the original value to
+        determine which scaling to apply.
+        Expected contents:


Are all of the arguments expected, even if optional?

no, not at all :) the validation function exists to (among other things) detect incorrect parameter configurations

amyeroberts · 2024-07-22T12:26:39Z

src/transformers/modeling_rope_utils.py

+    "default": _compute_default_rope_parameters,
+    "linear": _compute_linear_scaling_rope_parameters,
+    "dynamic": _compute_dynamic_ntk_parameters,
+    "yarn": _compute_yarn_parameters,
+    "longrope": _compute_longrope_parameters,


All of these should be tested in a test rope utils module, including checks for taking rope_kwargs and config and their equivalence

Added "rope_kwargs and config and their equivalence" ✅

Numerical checks will be a todo for the post-release follow-up PR (#31999)

src/transformers/models/llama/configuration_llama.py

amyeroberts · 2024-07-22T12:36:12Z

tests/models/llama/test_modeling_llama.py

+        config.rope_scaling = {"type": "yarn", "factor": scaling_factor}
+        yarn_scaling_rope = LlamaRotaryEmbedding(config=config).to(torch_device)
+        yarn_cos_short, yarn_sin_short = yarn_scaling_rope(x, position_ids_short)
+        yarn_cos_long, yarn_sin_long = yarn_scaling_rope(x, position_ids_long)
+        torch.testing.assert_close(yarn_cos_short, yarn_cos_long[:, :short_input_length, :])
+        torch.testing.assert_close(yarn_sin_short, yarn_sin_long[:, :short_input_length, :])
+        with self.assertRaises(AssertionError):
+            torch.testing.assert_close(yarn_cos_short, original_cos_short)
+        with self.assertRaises(AssertionError):
+            torch.testing.assert_close(yarn_sin_short, original_sin_short)
+        with self.assertRaises(AssertionError):
+            torch.testing.assert_close(yarn_cos_long, original_cos_long)
+        with self.assertRaises(AssertionError):
+            torch.testing.assert_close(yarn_sin_long, original_sin_long)


This works and is consistent with the other checks above. We should really make sure to check the rescaling values with specific numerical values in tests for the compute methods as well. This tests tells us things have changed, but not whether the change is in the right direction or magnitude

Fair, but that is a test that requires some numerical diving. Given our release goals -- would it be okay for me to add a todo/open an issue?

As long as it's actually done, then yes ;)

src/transformers/models/llama/modeling_llama.py

ArthurZucker

LGTM

ArthurZucker · 2024-07-22T11:52:08Z

src/transformers/models/llama/modeling_llama.py

+            self.original_max_seq_len = config.max_position_embeddings
+
+        self.config = config
+        self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]


Suggested change

self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]

self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]

should it be rope scaling rather than rope init? nit!

I'd rather go with init -- the default rope (i.e. not scaled) uses this path as well

ArthurZucker · 2024-07-22T17:18:55Z

src/transformers/models/llama/configuration_llama.py

+            Dictionary containing the scaling configuration for the RoPE embeddings. IMPORTANT: RoPE scaling expects
+            `max_position_embeddings` to remain unchanged -- some methods, like 'longrope', require the original value
+            to determine which scaling to apply.
+            Expected contents:
+                `rope_type` (`str`):
+                    The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope'],
+                    with 'default' being the original RoPE implementation.
+                `factor` (`float`, *optional*):
+                    Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
+                    most scaling types, a `factor` of x will enable the model to handle sequences of length x *
+                    `max_position_embeddings`.
+                `attention_factor` (`float`, *optional*):
+                    Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
+                    computation. If unspecified, it defaults to value recommended by the implementation, using the
+                    `factor` field to infer the suggested value.
+                `beta_fast` (`float`, *optional*):
+                    Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
+                    ramp function. If unspecified, it defaults to 32.
+                `beta_slow` (`float`, *optional*):
+                    Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
+                    ramp function. If unspecified, it defaults to 1.
+                `short_factor` (`List[float]`, *optional*):
+                    Only used with 'longrope'. The scaling factor to be applied to short contexts (<
+                    `max_position_embeddings` * `factor`). Must be a list of numbers with the same length as the hidden
+                    size divided by the number of attention heads divided by 2
+                `long_factor` (`List[float]`, *optional*):
+                    Only used with 'longrope'. The scaling factor to be applied to short contexts (<
+                    `max_position_embeddings` * `factor`). Must be a list of numbers with the same length as the hidden
+                    size divided by the number of attention heads divided by 2


Ok this should leave enough freedom

tho, the fact that we don't have a nested config makes it simpler, checks are run somwhere else so pretty much equivalent

ArthurZucker · 2024-07-22T17:19:33Z

src/transformers/models/llama/configuration_llama.py

-
-    def _rope_scaling_validation(self):
-        """
-        Validate the `rope_scaling` configuration.
-        """
-        if self.rope_scaling is None:
-            return
-
-        if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) != 2:
-            raise ValueError(
-                "`rope_scaling` must be a dictionary with two fields, `type` and `factor`, " f"got {self.rope_scaling}"
-            )
-        rope_scaling_type = self.rope_scaling.get("type", None)
-        rope_scaling_factor = self.rope_scaling.get("factor", None)
-        if rope_scaling_type is None or rope_scaling_type not in ["linear", "dynamic"]:
-            raise ValueError(
-                f"`rope_scaling`'s type field must be one of ['linear', 'dynamic'], got {rope_scaling_type}"
-            )
-        if rope_scaling_factor is None or not isinstance(rope_scaling_factor, float) or rope_scaling_factor <= 1.0:
-            raise ValueError(f"`rope_scaling`'s factor field must be a float > 1, got {rope_scaling_factor}")


nice to see that go aways!

src/transformers/models/llama/modeling_llama.py

amyeroberts

Beautiful - thanks for adding and iterating!

amyeroberts · 2024-07-22T17:17:15Z

tests/utils/test_modeling_rope_utils.py

+
+
+@require_torch
+class RopeTest(unittest.TestCase):


YaRN (Yet another RoPE extension method) combines the NTK-By-Parts Interpolation and Attention Scaling methods, improving upon existing RoPE interpolation methods for longer context window sizes. Fine-tuned models maintain their original performance across benchmarks while enabling efficient extrapolation and transfer learning for quicker convergence, especially in compute-limited environments. We implement YaRN and Dynamic-YaRN for the following list of models: - LLaMA - Falcon - GPT-NeoX - Olmo - Persimmon - Phi - StableLM - OpenLLaMA New unit tests are added to assert YaRN's correct behavior on both short and long sequence inputs. For more details, please refer to https://arxiv.org/abs/2309.00071. Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>

Iterate on YaRN implementation for LLaMA and remove diff from remaining models for increased PR modularity. This commit includes the following changes: - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned') from YaRN classes - Inherit 'forward' method in YaRN classes from superclass - Rename 'yarn' method to 'compute_yarn_scaling' - Extend YaRN tests with further assertions - Fix style inconsistencies Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>

- Comply with the the tensor building logic introduced in huggingface#30743 - Add referencing to the optimized Attention Factor equation - Remove Dynamic YaRN for a more agile deployment Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

gante · 2024-07-23T09:10:14Z

merged the yarn PR (percursor), now merging this one as soon as CI goes green

amyeroberts · 2024-07-23T09:16:20Z

Yarn PR is failing code quality checks on main. Could you make sure to rebase and then run make fix-copies etc here before merge?

gante commented Jul 22, 2024

View reviewed changes

src/transformers/models/jetmoe/modeling_jetmoe.py Outdated Show resolved Hide resolved

gante commented Jul 22, 2024

View reviewed changes

gante requested review from ArthurZucker and amyeroberts July 22, 2024 11:30

amyeroberts reviewed Jul 22, 2024

View reviewed changes

ArthurZucker approved these changes Jul 22, 2024

View reviewed changes

amyeroberts approved these changes Jul 22, 2024

View reviewed changes

tests/utils/test_modeling_rope_utils.py

@require_torch

class RopeTest(unittest.TestCase):

Copy link

Collaborator

amyeroberts Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤗

gante mentioned this pull request Jul 23, 2024

Hub test: avoid repo creation deadlocks #32151

Closed

mig-mfreitas and others added 21 commits July 23, 2024 08:58

remove unwanted file

b8df7a2

all diff except the llama folder

0166869

add updated config

7e4e4d8

add updated rope class (and break related copies)

6564195

related classes

95304b5

llama attention

3904d32

fa2 (and break a few more copies)

c7837eb

sdpa (and break a few more copies)

e5e1cde

up to the model class

f68b9cd

up to ForSequenceClassification

2f5ace3

last set?

5d35287

missing this one

4c56e43

make fixup

f36ec3a

Update src/transformers/modeling_rope_utils.py

35699b3

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update src/transformers/modeling_rope_utils.py

b095ebb

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

rename 'type' and 'scaling_type' to a clearer 'rope_type'

3f6458b

abstract out key validation

6d315ca

safety getattr; explicit docstring

5809b5e

gante and others added 7 commits July 23, 2024 08:58

docstring nit

a7502ed

add tests

3bc7c52

remove external position_embeddings interface

39e216a

test nit

000aeba

Update src/transformers/models/llama/modeling_llama.py

80a0422

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/llama/modeling_llama.py

48ed251

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

make fixu

c824be0

gante force-pushed the llama_rope_refactor branch from 1416972 to c824be0 Compare July 23, 2024 08:58

Merge branch 'main' into llama_rope_refactor

fc1255e

make fixup and make fix-copies

75b2391

gante merged commit 2e11342 into huggingface:main Jul 23, 2024
24 checks passed

gante deleted the llama_rope_refactor branch July 23, 2024 09:43

ArthurZucker mentioned this pull request Jul 24, 2024

Llama 3 - RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920 #32170

Closed

4 tasks

gante mentioned this pull request Aug 1, 2024

RoPE: Add numerical tests ✨ #32380

Merged

Fazziekey mentioned this pull request Aug 5, 2024

support fixed ntk rope in modeling_rope_utils.py #32424

Closed

5 tasks

jonatanklosko mentioned this pull request Aug 6, 2024

Unify RoPE strategies elixir-nx/bumblebee#388

Open

ArthurZucker mentioned this pull request Aug 21, 2024

support qwen2-vl #32318

Merged

5 tasks

thepowerfuldeez mentioned this pull request Aug 22, 2024

Unrecognized keys in rope_scaling for 'rope_type'='dynamic': {'type'} #32916

Closed

4 tasks

gante mentioned this pull request Sep 12, 2024

Cohere: update RoPE structure #33408

Merged

SunMarc mentioned this pull request Oct 3, 2024

Fix tensors on "two devices" issue #32420 #33742

Closed

5 tasks

tengomucho mentioned this pull request Nov 27, 2024

Support for Llama 3.1 and 3.2 fine tuning huggingface/optimum-tpu#114

Open

ArthurZucker mentioned this pull request Jan 13, 2025

The argument "dim" is gone from LlamaRotaryEmbedding initializer. Intentional? #35621

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama: RoPE refactor #32135

Llama: RoPE refactor #32135

gante commented Jul 22, 2024 •

edited

Loading

gante Jul 22, 2024

HuggingFaceDocBuilderDev commented Jul 22, 2024

amyeroberts left a comment

amyeroberts Jul 22, 2024

gante Jul 22, 2024

amyeroberts Jul 22, 2024

gante Jul 22, 2024

amyeroberts Jul 22, 2024

gante Jul 22, 2024

amyeroberts Jul 22, 2024

ArthurZucker left a comment

ArthurZucker Jul 22, 2024

gante Jul 22, 2024

ArthurZucker Jul 22, 2024

ArthurZucker Jul 22, 2024

ArthurZucker Jul 22, 2024

amyeroberts left a comment

amyeroberts Jul 22, 2024

gante commented Jul 23, 2024

amyeroberts commented Jul 23, 2024

		# copied from transformers.models.llama.modeling_llama.LlamaRotaryEmbedding with Llama->Chameleon
		# TODO(joao): add me back asap :)

	self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
	self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]

Llama: RoPE refactor #32135

Llama: RoPE refactor #32135

Conversation

gante commented Jul 22, 2024 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 22, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante commented Jul 23, 2024

amyeroberts commented Jul 23, 2024

gante commented Jul 22, 2024 •

edited

Loading