Move `RotaryEmbedding` layer from gpt_neo_x to layers #1092

shivance · 2023-06-27T19:52:31Z

mattdangerw · 2023-06-27T19:56:52Z

/gcbrun

mattdangerw

This needs a full tests file, similar to other layers in keras_nlp/layers.

mattdangerw · 2023-06-27T20:14:44Z

keras_nlp/layers/rotary_embedding.py

-    def __init__(self, rotary_percentage, max_wavelength=10000):
+    """Rotary positional encoding layer.
+
+        Tbjs layer encodes absolute positional information with rotation matrix and naturally


tbjs -> This. The alignment this whole docstring looks wrong, just four space in should do it.

Also please make sure to check everything besides links is <= 80 characters.

Is there any automated way to work this out for line length?
I make sure to run ./shell/format.sh everytime

mattdangerw · 2023-06-27T20:20:25Z

keras_nlp/layers/rotary_embedding.py

+    ](https://arxiv.org/abs/2104.09864v4).
+
+        Takes as input the query and key tensors. The input must have shape
+        [batch_size, num_heads, sequence_length, query_length]. This layer will return


We might want to consider the general form of this. I don't think we want to require a head axis that seems a little two special cased.

We can safely assume that the batch axis is 0, most layers do this. We can also assume feature axis is -1. If we need to take in the sequence dim axis, maybe let's add that as an argument sequence_axis=1 and allow it to be specified. Then we should test this layer with "multi-head" inputs, and simpler (batch_size, sequence_length, feature_dim) inputs.

mattdangerw · 2023-06-27T20:20:40Z

keras_nlp/layers/rotary_embedding.py

+        incorporates explicit relative position dependency in self-attention formulation.
+        It layer calculates the rotary encoding with a mix of sine and cosine
+        functions with geometrically increasing wavelengths. Defined and formulized
+        in [RoFormer: Enhanced Transformer with Rotary Position Embedding


Keep the link on one line, this can exceed 80 chars.

mattdangerw · 2023-06-27T20:23:14Z

keras_nlp/layers/rotary_embedding.py

+        num_heads = 8
+        sequence_length = 256
+        query_length = key_length = 256
+        query = tf.ones((batch_size, num_heads, sequence_length, query_length))


This does not match our general ordering for dims I think? After projecting to multi-headed space, I believe our shapes will look like (batch_size, sequence_length, num_heads, head_dim). Important to follow KerasNLP conventions here, not the ones we picked up from gpt-neox

Also query_length is a bit of an odd term here, should that be head_dim? Or if this is the token length of the query, how is this different than sequence_length?

mattdangerw · 2023-06-27T20:40:56Z

keras_nlp/models/gpt_neo_x/gpt_neo_x_attention.py

@@ -14,7 +14,7 @@
 import tensorflow as tf
 from tensorflow import keras

-from keras_nlp.models.gpt_neo_x.rotary_embedding import RotaryEmbedding


one more nit, let's pass the arguments to RotaryEmbedding via keyword args, not positionally, below.

* fix rotary emb * refactor + remove unnecessary typecast * fix formatting * refactor * formatting fix * refactoring rotary emb * added a kwarg in super().__init__()

mattdangerw · 2023-07-11T23:22:40Z

Let's merge #1111 first, we will need that anyway.

shivance · 2023-07-13T01:58:00Z

Closing this for now (due to some reasons), will open a followup PR.

shivance mentioned this pull request Jun 27, 2023

Add FalconBackbone #1081

Closed

mattdangerw requested changes Jun 27, 2023

View reviewed changes

mattdangerw reviewed Jun 27, 2023

View reviewed changes

shivance closed this Jul 6, 2023

shivance force-pushed the master branch 2 times, most recently from 968c71d to f68c256 Compare July 6, 2023 17:44

shivance and others added 4 commits July 6, 2023 23:15

refactor rotary embedding layer

1528ab6

Refactor RotaryEmbedding and GPTNeoXAttention (keras-team#1101)

6af8246

* fix rotary emb * refactor + remove unnecessary typecast * fix formatting * refactor * formatting fix * refactoring rotary emb * added a kwarg in super().__init__()

rebase: rotary embedding

2ed7249

resolve: merge conflicts

ed7b652

shivance reopened this Jul 6, 2023

shivance closed this Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `RotaryEmbedding` layer from gpt_neo_x to layers #1092

Move `RotaryEmbedding` layer from gpt_neo_x to layers #1092

shivance commented Jun 27, 2023

mattdangerw commented Jun 27, 2023

mattdangerw left a comment

mattdangerw Jun 27, 2023

shivance Jun 27, 2023 •

edited

Loading

mattdangerw Jun 27, 2023

mattdangerw Jun 27, 2023

mattdangerw Jun 27, 2023 •

edited

Loading

mattdangerw Jun 27, 2023

mattdangerw commented Jul 11, 2023

shivance commented Jul 13, 2023

Move RotaryEmbedding layer from gpt_neo_x to layers #1092

Move RotaryEmbedding layer from gpt_neo_x to layers #1092

Conversation

shivance commented Jun 27, 2023

mattdangerw commented Jun 27, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Jun 27, 2023

Choose a reason for hiding this comment

shivance Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

mattdangerw Jun 27, 2023

Choose a reason for hiding this comment

mattdangerw Jun 27, 2023

Choose a reason for hiding this comment

mattdangerw Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

mattdangerw Jun 27, 2023

Choose a reason for hiding this comment

mattdangerw commented Jul 11, 2023

shivance commented Jul 13, 2023

Move `RotaryEmbedding` layer from gpt_neo_x to layers #1092

Move `RotaryEmbedding` layer from gpt_neo_x to layers #1092

shivance Jun 27, 2023 •

edited

Loading

mattdangerw Jun 27, 2023 •

edited

Loading