Add `LlamaBackbone` #1203

shivance · 2023-08-09T03:37:44Z

Keras team has accepted #1162 . This PR adds attention, decoder and backbone for Llama.

Here is the colab

This PR is still a work in progress!

@mattdangerw @fchollet

keras_nlp/models/llama/llama_attention.py

shivance · 2023-08-09T04:13:41Z

LLaMa uses SILU (sigmoid linear unit) activation function, we don't have it in keras yet (?)

ref: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/blob/main/config.json

[Update] : We have it here , but it's not available in docs page.

keras_nlp/models/llama/llama_attention.py

keras_nlp/models/llama/llama_layernorm.py

mattdangerw

Overall structure looks good! Left some high level comments for now.

keras_nlp/models/__init__.py

keras_nlp/models/llama/dynamic_ntk_scaled_rotary_embedding.py

keras_nlp/models/llama/llama_attention.py

keras_nlp/models/llama/llama_layernorm.py

mattdangerw

Few more comments.

keras_nlp/models/llama/dynamic_ntk_scaled_rotary_embedding.py

keras_nlp/layers/modeling/rotary_embedding.py

keras_nlp/models/llama/llama_layernorm.py

keras_nlp/models/llama/llama_backbone.py

keras_nlp/models/llama/dynamic_ntk_scaled_rotary_embedding.py

keras_nlp/models/llama/llama_attention.py

keras_nlp/models/llama/llama_decoder.py

shivance · 2023-08-17T02:10:53Z

TODO : Conversion script

awsaf49 · 2023-10-28T16:02:20Z

As GropuedQueryAttention has been added to Keras I think it would be nice to have it in llama-v2. PR: keras-team/keras#18488.
@fchollet @mattdangerw

shivance · 2023-11-03T16:44:41Z

@mattdangerw I've added a checkpooint conversion script. The output matching is almost there just can't figure out one thing.
The output of each decoder layer matches that of huggingface model (with high precision) but after last decoder layer something is happening and the final outputs differ by a margin. Not sure why is it so?

mattdangerw · 2023-11-03T19:13:56Z

As GropuedQueryAttention has been added to Keras I think it would be nice to have it in llama-v2. PR: keras-team/keras#18488.

@awsaf49 We absolutely should! But we first need to figure out when we drop Keras 2 support from KerasCV and KerasNLP. Until we do, we can't rely on symbols that only exists in Keras 3.

But long term no question, we should used the grouped query attention layer to cut a lot of code from KerasNLP!

shivance · 2023-11-04T10:19:39Z

Hey @mattdangerw !
This PR is ready to merge. The outputs are matching (finally) .

Huggingface Llama outputs:

KerasNLP llama outputs:

shivance · 2023-11-04T11:36:29Z

/gcbrun

shivance · 2023-11-04T14:28:19Z

All checks pass. Nice.

mattdangerw

Awesome this is working! Left a few comments.

keras_nlp/layers/modeling/rotary_embedding.py

keras_nlp/models/llama/llama_backbone.py

keras_nlp/models/llama/llama_backbone_test.py

keras_nlp/models/llama/llama_decoder.py

tools/checkpoint_conversion/convert_llama_checkpoints.py

keras_nlp/models/llama/llama_attention.py

keras_nlp/models/llama/llama_backbone.py

mattdangerw

Oops didn't mean to approve till we have the update tests and docstrings.

tirthasheshpatel

Mistral shares almost exactly the same llama backbone so just a few comments so it can easily be reused in Mistral.

keras_nlp/models/llama/llama_attention.py

shivance · 2023-11-12T11:45:02Z

@mattdangerw I've added caching as well, outputs continue to match.

shivance · 2023-11-12T12:08:06Z

/gcbrun

shivance · 2023-11-12T12:11:08Z

Added docstrings too.

keras_nlp/models/llama/llama_attention.py

tirthasheshpatel · 2023-11-12T18:03:07Z

keras_nlp/models/llama/llama_attention.py

+            mask_expansion_axis = -3
+            for _ in range(
+                len(attention_scores.shape) - len(attention_mask.shape)
+            ):
+                attention_mask = ops.expand_dims(
+                    attention_mask, axis=mask_expansion_axis
+                )


Since the inputs are constrained to be 3 dimensional, we can simplify this as:

Suggested change

mask_expansion_axis = -3

for _ in range(

len(attention_scores.shape) - len(attention_mask.shape)

):

attention_mask = ops.expand_dims(

attention_mask, axis=mask_expansion_axis

)

attention_mask = attention_mask[:, None, :, :]

IIRC @mattdangerw and I had a conversation about it. Let's keep this as is.

No strong feeling. The thing to keep in mind here is what is public API and what's internal to the model.

RotaryEmbedding is public, that's the one we want to support with multiple different call ranks/configurations.

Llama attention is unexposed, so it's ok to make assumptions about the input shape as long as it's valid for llama models.

keras_nlp/models/llama/llama_backbone.py

keras_nlp/models/llama/llama_decoder.py

keras_nlp/models/llama/llama_layernorm.py

keras_nlp/models/llama/llama_attention.py

shivance · 2023-11-13T06:49:41Z

CI failures seem unrelated

mattdangerw · 2023-11-15T16:26:58Z

/gcbrun

mattdangerw

Thanks! Mostly minor changes.

mattdangerw · 2023-11-15T16:39:18Z

keras_nlp/layers/modeling/rotary_embedding.py

            if axis != sequence_axis and axis != feature_axis:
                embedding = ops.expand_dims(embedding, axis)

        return ops.cos(embedding), ops.sin(embedding)

+    def _get_inverse_freq(self, rotary_dim):
+        freq_range = ops.arange(0, rotary_dim, 2, dtype="float32")


We should still add that unit test I was mentioning, as it's clear we have been breaking the feature_axis and sequence_axis args without meaning to. Something like below as a new unit test in for this file.

inputs = random(batch, sequence, feature) permuted_inputs = permute(inputs, (0, 2, 1) outputs = RotaryEmbedding(inputs) permuted_outputs = RotaryEmbedding(permuted_inputs, sequence_axis=-1, feature_axis=-2) assertAllEqual(outputs, permute(premuted_outputs, (0, 2, 1))

Thanks @mattdangerw for the suggestion here. Turns out that it does break.
I don't have bandwidth to fix this today.

All good! We have been heads down getting this Kaggle integration ready anyway, which will be needed before we can actually provide any llama 2 checkpoints.

Let's check in next week. If you are strapped for time I can just patch this in and fix here, I think this and comparing outputs in the conversion script are basically the last remaining issues?

keras_nlp/models/llama/llama_attention.py

mattdangerw · 2023-11-15T16:48:47Z

keras_nlp/models/llama/llama_attention.py

+        self.rope_scaling_factor = rope_scaling_factor
+        self.rope_max_wavelength = rope_max_wavelength
+
+    def build(self, inputs_shape):


Same comment as mistral... Consider something like this, where we collocate all einsum equations in build, and we add a nice key at the top. Helps readability.

https://github.com/keras-team/keras/blob/master/keras/layers/attention/grouped_query_attention.py#L124-L167

(ok if we want to punt on this for this pr)

Thanks, this looks good! Added.

keras_nlp/models/llama/llama_decoder.py

keras_nlp/models/llama/llama_layernorm.py

tools/checkpoint_conversion/convert_llama_checkpoints.py

mattdangerw · 2023-11-15T17:04:31Z

tools/checkpoint_conversion/convert_llama_checkpoints.py

+
+with torch.no_grad():
+    keras_outputs = keras_model(keras_inputs)
+print("Keras output = ", keras_outputs.numpy())


Can we add a line that also runs output through the hf version and compares the difference? How close do we get?

mattdangerw · 2023-12-02T00:15:21Z

/gcbrun

mattdangerw · 2023-12-22T00:12:35Z

Talked with @shivance, going to try to merge this in with some last fixes to the rotary embedding layer.

We will need to follow up and fix the conversion script so it actually validates the output.

ashmalvayani · 2024-01-25T10:27:20Z

`Can you please tell how to load the llama code in KerasNLP just like how we load the bert-model as below?

classifier = keras_nlp.models.BertClassifier.from_preset(
"bert_base_en_uncased",
num_classes=2,
activation="softmax",
)

shivance requested a review from mattdangerw August 9, 2023 03:46

shivance commented Aug 9, 2023

View reviewed changes

keras_nlp/models/llama/llama_attention.py Outdated Show resolved Hide resolved

shivance commented Aug 9, 2023

View reviewed changes

keras_nlp/models/llama/llama_attention.py Outdated Show resolved Hide resolved

shivance marked this pull request as draft August 9, 2023 04:18

shivance commented Aug 9, 2023

View reviewed changes

keras_nlp/models/llama/llama_attention.py Outdated Show resolved Hide resolved

shivance marked this pull request as ready for review August 9, 2023 15:52

susnato reviewed Aug 10, 2023

View reviewed changes

keras_nlp/models/llama/llama_attention.py Outdated Show resolved Hide resolved

shivance commented Aug 10, 2023

View reviewed changes

keras_nlp/models/llama/llama_layernorm.py Show resolved Hide resolved

shivance mentioned this pull request Aug 10, 2023

Move T5LayerNorm to either keras_nlp.layers or keras.layers API #1207

Closed

mattdangerw requested changes Aug 10, 2023

View reviewed changes

shivance requested a review from mattdangerw August 15, 2023 06:07

mattdangerw requested changes Aug 15, 2023

View reviewed changes

innat mentioned this pull request Sep 29, 2023

Add tutorial of how to implement llama 2 using Keras keras-team/keras#18527

Closed

shivance requested a review from mattdangerw November 4, 2023 10:14

mattdangerw approved these changes Nov 7, 2023

View reviewed changes

mattdangerw requested changes Nov 7, 2023

View reviewed changes

tirthasheshpatel reviewed Nov 9, 2023

View reviewed changes

keras_nlp/models/llama/llama_attention.py Show resolved Hide resolved

keras_nlp/models/llama/llama_attention.py Outdated Show resolved Hide resolved

tirthasheshpatel reviewed Nov 12, 2023

View reviewed changes

tirthasheshpatel mentioned this pull request Nov 13, 2023

Add MistralAI's 7B Transformer as a backbone in KerasNLP Models #1314

Merged

5 tasks

mattdangerw requested changes Nov 15, 2023

View reviewed changes

llama backbone

bcf4d81

mattdangerw force-pushed the llama-backbone branch from d054b44 to 834fb8b Compare December 22, 2023 00:11

mattdangerw force-pushed the llama-backbone branch from 834fb8b to a525b68 Compare December 22, 2023 07:35

mattdangerw mentioned this pull request Dec 22, 2023

[don't merge] llama gpu test #1376

Closed

Fixes for rotary embedding

7eb04e0

mattdangerw force-pushed the llama-backbone branch from a525b68 to 7eb04e0 Compare December 22, 2023 07:40

mattdangerw merged commit 5fd92c8 into keras-team:master Dec 22, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `LlamaBackbone` #1203

Add `LlamaBackbone` #1203

shivance commented Aug 9, 2023

shivance commented Aug 9, 2023 •

edited

Loading

mattdangerw left a comment

mattdangerw left a comment

shivance commented Aug 17, 2023

awsaf49 commented Oct 28, 2023

shivance commented Nov 3, 2023

mattdangerw commented Nov 3, 2023 •

edited

Loading

shivance commented Nov 4, 2023

shivance commented Nov 4, 2023

shivance commented Nov 4, 2023

mattdangerw left a comment

mattdangerw left a comment

tirthasheshpatel left a comment

shivance commented Nov 12, 2023

shivance commented Nov 12, 2023

shivance commented Nov 12, 2023

tirthasheshpatel Nov 12, 2023 •

edited

Loading

shivance Nov 13, 2023

mattdangerw Nov 15, 2023

shivance commented Nov 13, 2023

mattdangerw commented Nov 15, 2023

mattdangerw left a comment

mattdangerw Nov 15, 2023

shivance Nov 25, 2023 •

edited

Loading

mattdangerw Dec 2, 2023

mattdangerw Nov 15, 2023

shivance Nov 25, 2023 •

edited

Loading

mattdangerw Nov 15, 2023

mattdangerw commented Dec 2, 2023

mattdangerw commented Dec 22, 2023

ashmalvayani commented Jan 25, 2024 •

edited

Loading

Add LlamaBackbone #1203

Add LlamaBackbone #1203

Conversation

shivance commented Aug 9, 2023

shivance commented Aug 9, 2023 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

shivance commented Aug 17, 2023

awsaf49 commented Oct 28, 2023

shivance commented Nov 3, 2023

mattdangerw commented Nov 3, 2023 • edited Loading

shivance commented Nov 4, 2023

shivance commented Nov 4, 2023

shivance commented Nov 4, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

tirthasheshpatel left a comment

Choose a reason for hiding this comment

shivance commented Nov 12, 2023

shivance commented Nov 12, 2023

shivance commented Nov 12, 2023

tirthasheshpatel Nov 12, 2023 • edited Loading

Choose a reason for hiding this comment

shivance Nov 13, 2023

Choose a reason for hiding this comment

mattdangerw Nov 15, 2023

Choose a reason for hiding this comment

shivance commented Nov 13, 2023

mattdangerw commented Nov 15, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Nov 15, 2023

Choose a reason for hiding this comment

shivance Nov 25, 2023 • edited Loading

Choose a reason for hiding this comment

mattdangerw Dec 2, 2023

Choose a reason for hiding this comment

mattdangerw Nov 15, 2023

Choose a reason for hiding this comment

shivance Nov 25, 2023 • edited Loading

Choose a reason for hiding this comment

mattdangerw Nov 15, 2023

Choose a reason for hiding this comment

mattdangerw commented Dec 2, 2023

mattdangerw commented Dec 22, 2023

ashmalvayani commented Jan 25, 2024 • edited Loading

Add `LlamaBackbone` #1203

Add `LlamaBackbone` #1203

shivance commented Aug 9, 2023 •

edited

Loading

mattdangerw commented Nov 3, 2023 •

edited

Loading

tirthasheshpatel Nov 12, 2023 •

edited

Loading

shivance Nov 25, 2023 •

edited

Loading

shivance Nov 25, 2023 •

edited

Loading

ashmalvayani commented Jan 25, 2024 •

edited

Loading