add open-llama model with ckpt #22795

s-JoL · 2023-04-16T15:46:20Z

This PR adds a new model called Open-Llama, which is based on Llama's implementation in Transformers.
In Open-Llama, emory-efficient attention has been added, resulting in a 30% improvement in training efficiency. Additionally, hidden dropout and attention dropout have been added for better generalization during training.

We have also added two optional features: stable embedding from Bloom and shared input-output vectors from PALM, which have been tested and found to improve training stability and performance.

The following code snippet shows the implementation of memory-efficient attention:

try:
    from xformers import ops as xops
except ImportError:
    xops = None
    print("xformers is not installed correctly.")

if self.config.use_memorry_efficient_attention and xops is not None and self.training:
    attn_weights = None
    query_states = query_states.transpose(1, 2)
    key_states = key_states.transpose(1, 2)
    value_states = value_states.transpose(1, 2)
    attn_output = xops.memory_efficient_attention(
        query_states, key_states, value_states, attn_bias=xops.LowerTriangularMask(), p=self.dropout_prob
    )

At the same time, for maximum compatibility, we have made xformers an optional dependency so that the original implementation can still be used for training and inference if it is not installed.

We implemented pre-training of the Llama model based on transformers + accelerate, incorporating the modifications described above.
Open-Llama

The pre-trained model has already been open-sourced on s-JoL/Open-Llama-V1.

ref: #22386

cc: @sgugger

HuggingFaceDocBuilderDev · 2023-04-20T14:12:07Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-04-21T16:09:11Z

cc @ArthurZucker and @younesbelkada

s-JoL · 2023-04-25T12:47:32Z

Please help me review this pull request. @ArthurZucker @younesbelkada

ArthurZucker · 2023-04-25T13:08:55Z

Hey! Thanks will review now

ArthurZucker

Thanks for working on this! Seems like the model is overlall very similar, so missing bunch of copied form here and there. Most importantly I dont think we need a new tokenizer, it's still llama tokenizer.

README.md

docs/source/en/_toctree.yml

docs/source/en/model_doc/open-llama.mdx

src/transformers/models/auto/tokenization_auto.py

ArthurZucker · 2023-04-25T13:29:31Z

src/transformers/models/open_llama/configuration_open_llama.py

Not convinced that you need a new configuration file either. Args can be added kind of the fly and not be in the default llama config WDYT?

I'm concerned that using the default LlamaConfig directly may result in missing parameters and cause errors.

src/transformers/models/open_llama/modeling_open_llama.py

ArthurZucker · 2023-04-25T14:47:14Z

src/transformers/models/open_llama/modeling_open_llama.py

+"""
+
+
+@add_start_docstrings(


missing copied from statements

sorry, i did't quite understand how to add the copied from statements for this class, there are slight differences here.

Ok you can keep it as is!

src/transformers/models/open_llama/modeling_open_llama.py

tests/models/open_llama/test_modeling_open_llama.py

ArthurZucker

LGTM, waiting for @sgugger's review

ArthurZucker · 2023-04-28T13:16:09Z

src/transformers/models/open_llama/convert_open_llama_weights_to_hf.py

Same comment here, is this not the same as in the llama folder?

Thank you for the reminder. This file is identical to the one in Llama, and since I trained directly with Transformers, there is no need for any conversion. I will delete it.

ArthurZucker · 2023-04-28T13:19:52Z

src/transformers/models/open_llama/modeling_open_llama.py

+"""
+
+
+@add_start_docstrings(


Ok you can keep it as is!

ArthurZucker · 2023-04-28T13:24:42Z

src/transformers/models/open_llama/modeling_open_llama.py

+        loss = None
+        if labels is not None:
+            # Shift so that tokens < n predict n
+            shift_logits = logits[..., :-1, :].contiguous()
+            shift_labels = labels[..., 1:].contiguous()
+            # Flatten the tokens
+            loss_fct = CrossEntropyLoss()
+            shift_logits = shift_logits.view(-1, self.config.vocab_size)
+            shift_labels = shift_labels.view(-1)
+            # Enable model parallelism
+            shift_labels = shift_labels.to(shift_logits.device)
+            loss = loss_fct(shift_logits, shift_labels)


Suggested change

loss = None

if labels is not None:

# Shift so that tokens < n predict n

shift_logits = logits[..., :-1, :].contiguous()

shift_labels = labels[..., 1:].contiguous()

# Flatten the tokens

loss_fct = CrossEntropyLoss()

shift_logits = shift_logits.view(-1, self.config.vocab_size)

shift_labels = shift_labels.view(-1)

# Enable model parallelism

shift_labels = shift_labels.to(shift_logits.device)

loss = loss_fct(shift_logits, shift_labels)

lm_loss = None

if labels is not None:

# we are doing next-token prediction; shift prediction scores and input ids by one

shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()

labels = labels[:, 1:].contiguous()

loss_fct = CrossEntropyLoss()

lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

We usually just use this, but I am guessing the point of the PR is fast / model paralellism so ignore my comment if this doesn't work (we leave parallelism to accelerate)

ArthurZucker · 2023-04-28T13:26:09Z

docs/source/en/model_doc/open-llama.mdx

+The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM.
+And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.


If you have them, would be cool to add the performance gains here!

This is a great suggestion, but currently I have not conducted a complete ablation experiment. I plan to gradually add it to the documentation after conducting the experiment.

sgugger

Very clean, thanks a lot for adding this! I have just a comment on the config and default checkpoint.

sgugger · 2023-04-28T13:51:21Z

src/transformers/models/open_llama/modeling_open_llama.py

+    warnings.warn(
+        "Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers\npip install xformers."
+    )


Should use our logger here with logger.warn (so move this after the logger is defined below).

sgugger · 2023-04-28T13:52:47Z

utils/check_config_docstrings.py

@@ -42,6 +42,7 @@
    "VisionEncoderDecoderConfig",
    "VisionTextDualEncoderConfig",
    "LlamaConfig",
+    "OpenLlamaConfig",


Should be removed as there is a checkpoint for OpenLlama.

sgugger · 2023-04-28T13:54:15Z

src/transformers/models/open_llama/configuration_open_llama.py

+    r"""
+    This is the configuration class to store the configuration of a [`OpenLlamaModel`]. It is used to instantiate an
+    Open-Llama model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the Open-Llama-7B.


Put the full checkpoint name here and link to the Hub. Example we have for GPT-2:

a similar configuration to that of the [gpt2](https://huggingface.co/gpt2) architecture.

It wasn't there for Llama since there is no official checkpoint on the Hub.

Thank you for the review. The three issues mentioned have been fixed.

sgugger · 2023-04-28T15:01:45Z

Thanks a lot for your contribution!

s-JoL · 2023-05-11T07:31:26Z

Thanks a lot for your contribution!

Hello, I have a question, why the open-Llama model cannot be searched in the documentation of transformers? Is there something I forgot to add?

amyeroberts · 2023-05-11T10:50:15Z

Hi @s-JoL, thanks for notifying.

There was an issue in the doc rendering (resolved with 1, 2) leading to some pages not being retrievable in search. Should be working now!

PenutChen · 2023-05-22T05:02:52Z

@s-JoL I noticed that the links pertaining to Open-LLaMA are currently leading to 404 errors. Could you please provide some information on what might have happened?

heya5 · 2023-05-24T03:30:48Z

@s-JoL Hi, I can't find a Open-LLaMA checkpoint and I noticed you delete your original repo. What happend? How Can I have a try of Open-LLaMA?

* update Open-Llama model * update * update format * update doc * update * update stable embedding test * update test case * update format * update readme * fix typo * update name * remove tokenizer and update format * remove convert_open_llama_weights_to_hf * update warning and doc_string --------- Co-authored-by: songliang.bayesian <songliang.bayesian@bytedance.com>

PenutChen · 2023-06-13T01:00:57Z

@heya5 Possibly due to some controversies surrounding this project, the original author has closed the original project.
https://github.com/chenfeng357/open-Chinese-ChatLLaMA/issues/1

* update Open-Llama model * update * update format * update doc * update * update stable embedding test * update test case * update format * update readme * fix typo * update name * remove tokenizer and update format * remove convert_open_llama_weights_to_hf * update warning and doc_string --------- Co-authored-by: songliang.bayesian <songliang.bayesian@bytedance.com>

This reverts commit c2c99dc.

s-JoL added 6 commits April 16, 2023 21:57

update Open-Llama model

e772218

update

5af45ce

Merge branch 'huggingface:main' into dev

d3bf39b

Merge branch 'huggingface:main' into dev

df7bd25

update format

69c7b3a

update doc

042cd7e

s-JoL added 5 commits April 20, 2023 22:14

update

4e5faa7

update stable embedding test

5a983f1

update test case

13b6c50

update format

5e1bd1f

Merge branch 'huggingface:main' into dev

a79ffd1

s-JoL changed the title ~~Dev~~ add open-llama model with ckpt Apr 21, 2023

s-JoL added 4 commits April 21, 2023 23:24

update readme

8aa2473

fix typo

22ed031

update name

81541c2

Merge branch 'huggingface:main' into dev

5c6902d

s-JoL mentioned this pull request Apr 21, 2023

Add memory-efficient attention and optional features to Llama #22386

Closed

younesbelkada requested a review from ArthurZucker April 25, 2023 12:49

ArthurZucker reviewed Apr 25, 2023

View reviewed changes

songliang.bayesian and others added 2 commits April 26, 2023 16:44

remove tokenizer and update format

2939b39

Merge branch 'huggingface:main' into dev

1898cbb

ArthurZucker reviewed Apr 28, 2023

View reviewed changes

ArthurZucker requested a review from sgugger April 28, 2023 13:27

remove convert_open_llama_weights_to_hf

0f459e5

sgugger approved these changes Apr 28, 2023

View reviewed changes

update warning and doc_string

422871e

sgugger approved these changes Apr 28, 2023

View reviewed changes

ArthurZucker approved these changes Apr 28, 2023

View reviewed changes

sgugger merged commit c2c99dc into huggingface:main Apr 28, 2023

tomaarsen added a commit to tomaarsen/transformers that referenced this pull request Jul 19, 2023

Revert "add open-llama model with ckpt (huggingface#22795)"

3aabb60

This reverts commit c2c99dc.

tomaarsen mentioned this pull request Jul 19, 2023

Remove unsupported and confusing "OpenLlama" architecture #24913

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add open-llama model with ckpt #22795

add open-llama model with ckpt #22795

s-JoL commented Apr 16, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 20, 2023 •

edited

Loading

sgugger commented Apr 21, 2023

s-JoL commented Apr 25, 2023

ArthurZucker commented Apr 25, 2023

ArthurZucker left a comment

ArthurZucker Apr 25, 2023

s-JoL Apr 26, 2023

ArthurZucker Apr 25, 2023

s-JoL Apr 26, 2023

ArthurZucker Apr 28, 2023

ArthurZucker left a comment

ArthurZucker Apr 28, 2023

s-JoL Apr 28, 2023

ArthurZucker Apr 28, 2023

ArthurZucker Apr 28, 2023

ArthurZucker Apr 28, 2023

s-JoL Apr 28, 2023

sgugger left a comment

sgugger Apr 28, 2023

sgugger Apr 28, 2023

sgugger Apr 28, 2023

s-JoL Apr 28, 2023

sgugger commented Apr 28, 2023

s-JoL commented May 11, 2023 •

edited

Loading

amyeroberts commented May 11, 2023

PenutChen commented May 22, 2023

heya5 commented May 24, 2023

PenutChen commented Jun 13, 2023

		The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM.
		And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.

		"""


		@add_start_docstrings(

		"""


		@add_start_docstrings(

add open-llama model with ckpt #22795

add open-llama model with ckpt #22795

Conversation

s-JoL commented Apr 16, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Apr 20, 2023 • edited Loading

sgugger commented Apr 21, 2023

s-JoL commented Apr 25, 2023

ArthurZucker commented Apr 25, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger commented Apr 28, 2023

s-JoL commented May 11, 2023 • edited Loading

amyeroberts commented May 11, 2023

PenutChen commented May 22, 2023

heya5 commented May 24, 2023

PenutChen commented Jun 13, 2023

s-JoL commented Apr 16, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 20, 2023 •

edited

Loading

s-JoL commented May 11, 2023 •

edited

Loading