Generic `RotaryEmbedding` Layer #1180

shivance · 2023-07-27T01:35:28Z

RotaryEmbedding is in the air. New SOTA models like Falcon, GPT-J, LLaMA are using it. I already developed RotaryEmbedding Layer to KerasNLP.

This is the perfect time to move it to layers and exposing it in our API should be a good addition to our API.

Closes #1087

mattdangerw

Thanks! A few comments.

mattdangerw · 2023-07-27T17:07:43Z

keras_nlp/layers/modeling/rotary_embedding_test.py

+from keras_nlp.tests.test_case import TestCase
+
+
+class RotaryEmbeddingTest(TestCase):


We should definitely fill out the test case now that we are exposing this standalone. Maybe look of the SinePositionEncoding layer tests as a start.

Yup, added tests from SinePositionEncoding layer, they all pass now for all backends

mattdangerw · 2023-07-27T17:10:59Z

keras_nlp/layers/modeling/rotary_embedding.py

+    matrix. It calculates the rotary encoding with a mix of sine and
+    cosine functions with geometrically increasing wavelengths.
+    Defined and formulated in [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864v4).
+    Takes as input the query and key tensors. The input must have shape


I think we should also allow this layer to take in shape [batch_size, sequence_length, feature_dim]. I'll leave some more comments below.

This still need updating in the docstring.

mattdangerw · 2023-07-27T17:14:47Z

keras_nlp/layers/modeling/rotary_embedding.py

+    References:
+     - [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864v4)
+    """
+
    def __init__(self, max_wavelength=10000, **kwargs):


We should probably add two arguments here. sequence_axis=1, and feature_axis=-1, that users can set as desired.

This would be cool. Added.

mattdangerw · 2023-07-27T17:24:08Z

keras_nlp/layers/modeling/rotary_embedding.py

+    References:
+     - [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864v4)
+    """
+
    def __init__(self, max_wavelength=10000, **kwargs):
        super().__init__(**kwargs)
        self.max_wavelength = max_wavelength


A few comments for below (github won't let me comment). Why is the following necessary?

cos_emb = cos_emb[:, : ops.shape(tensor)[1], :, :] sin_emb = sin_emb[:, : ops.shape(tensor)[1], :, :]

The cos/sin embeddings should already have seq_len shape when you compute them.

Lastly, if you wanted to make _compute_cos_sin_embedding work with any number of dimensions, you would need to update it. Here's a draft of a change, but haven't tested this yet.

embedding = ops.concatenate((freqs, freqs), axis=-1) for dim in range(len(x.shape)): if axis != self.sequence_axis and axis != self.feature_axis: embedding = ops.expand_dims(embedding, axis)

Thanks for the pointer, got it working with some tweaks !

shivance · 2023-07-29T15:32:51Z

test_float16_dtype fails for tf.keras 👀

ValueError: Unsupported dtype19 for '{{node rotary_embedding/range}} = Range[Tidx=DT_HALF](rotary_embedding/range/start, rotary_embedding/range/Cast, rotary_embedding/range/delta)' with input shapes: [], [], [] and with computed input tensors: input[0] = <0>, input[1] = <32>, input[2] = <2>.

shivance · 2023-07-29T16:46:50Z

keras_nlp/layers/modeling/rotary_embedding.py

+        self.max_wavelength = max_wavelength
+        self.sequence_axis = sequence_axis
+        self.feature_axis = feature_axis
+        self.scaling_factor = scaling_factor


@mattdangerw LLaMa Rotary Embedding layers use a scaling factor, I've added that too!

shivance · 2023-07-29T16:48:04Z

/gcbrun

mattdangerw · 2023-07-31T19:13:07Z

ValueError: Unsupported dtype19 for '{{node rotary_embedding/range}} = Range[Tidx=DT_HALF](rotary_embedding/range/start, rotary_embedding/range/Cast, rotary_embedding/range/delta)' with input shapes: [], [], [] and with computed input tensors: input[0] = <0>, input[1] = <32>, input[2] = <2>.

Potentially this is just arange not supporting half floats? You could maybe just do

freq_range = ops.arange(0, rotary_dim, 2, "float32")
freq_range = ops.cast(freq_range, self.compute_dtype)

mattdangerw

Thanks! Looking generally good in terms of the call method, but some polish needed.

keras_nlp/layers/modeling/rotary_embedding.py

mattdangerw · 2023-07-31T19:14:51Z

keras_nlp/layers/modeling/rotary_embedding.py

+    matrix. It calculates the rotary encoding with a mix of sine and
+    cosine functions with geometrically increasing wavelengths.
+    Defined and formulated in [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864v4).
+    Takes as input the query and key tensors. The input must have shape


This still need updating in the docstring.

keras_nlp/layers/modeling/rotary_embedding.py

keras_nlp/layers/modeling/rotary_embedding_test.py

shivance · 2023-08-01T06:04:18Z

Potentially this is just arange not supporting half floats? You could maybe just do
freq_range = ops.arange(0, rotary_dim, 2, "float32")
freq_range = ops.cast(freq_range, self.compute_dtype)

This is persisting in any case. I've tried this and other typecasting.

shivance · 2023-08-01T17:15:24Z

/gcbrun

Fix dtypes with arange.

shivance · 2023-08-01T18:10:52Z

/gcbrun

mattdangerw · 2023-08-01T19:52:49Z

/gcbrun

mattdangerw

Nice work! This ended up quite clean.

shivance added 5 commits July 27, 2023 07:01

init commit

f4b9c61

add basic tests

2fbbba0

formatting

8aad4b9

docstring edit + minor refactor and renaming

10d66a2

formatting

717deba

mattdangerw requested changes Jul 27, 2023

View reviewed changes

finish: rotary embedding general layer for any input size + add tests

dda17f9

shivance changed the title ~~Move RotaryEmbedding to Modeling Layers~~ Move RotaryEmbedding to Modeling Layers Jul 29, 2023

shivance changed the title ~~Move RotaryEmbedding to Modeling Layers~~ General purpose RotaryEmbedding Layer Jul 29, 2023

minor fix

e787986

added scaling_factor

9b09377

shivance commented Jul 29, 2023

View reviewed changes

shivance requested a review from mattdangerw July 29, 2023 17:06

shivance changed the title ~~General purpose RotaryEmbedding Layer~~ Generic RotaryEmbedding Layer Jul 29, 2023

mattdangerw requested changes Jul 31, 2023

View reviewed changes

addressing comments

9eb70c1

Update rotary_embedding.py

206af68

Fix dtypes with arange.

A few doc updates

ea9a413

mattdangerw approved these changes Aug 1, 2023

View reviewed changes

mattdangerw merged commit 272ba83 into keras-team:master Aug 1, 2023

shivance deleted the move_rot_emb branch August 5, 2023 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic `RotaryEmbedding` Layer #1180

Generic `RotaryEmbedding` Layer #1180

shivance commented Jul 27, 2023

mattdangerw left a comment

mattdangerw Jul 27, 2023

shivance Jul 29, 2023

mattdangerw Jul 27, 2023

shivance Jul 29, 2023

mattdangerw Jul 31, 2023

mattdangerw Jul 27, 2023

shivance Jul 29, 2023

mattdangerw Jul 27, 2023

shivance Jul 29, 2023

shivance commented Jul 29, 2023 •

edited

Loading

shivance Jul 29, 2023

shivance commented Jul 29, 2023

mattdangerw commented Jul 31, 2023

mattdangerw left a comment

mattdangerw Jul 31, 2023

shivance commented Aug 1, 2023 •

edited

Loading

shivance commented Aug 1, 2023

shivance commented Aug 1, 2023

mattdangerw commented Aug 1, 2023

mattdangerw left a comment

		from keras_nlp.tests.test_case import TestCase


		class RotaryEmbeddingTest(TestCase):

Generic RotaryEmbedding Layer #1180

Generic RotaryEmbedding Layer #1180

Conversation

shivance commented Jul 27, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivance commented Jul 29, 2023 • edited Loading

Choose a reason for hiding this comment

shivance commented Jul 29, 2023

mattdangerw commented Jul 31, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivance commented Aug 1, 2023 • edited Loading

shivance commented Aug 1, 2023

shivance commented Aug 1, 2023

mattdangerw commented Aug 1, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

Generic `RotaryEmbedding` Layer #1180

Generic `RotaryEmbedding` Layer #1180

shivance commented Jul 29, 2023 •

edited

Loading

shivance commented Aug 1, 2023 •

edited

Loading