-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OLMo: 1B & 7B #927
Add OLMo: 1B & 7B #927
Conversation
I'm a big stuck with the conversion and would appreciate your advice and ideas @carmocca or @Andrei-Aksionov! So, here are 3 special things about Olmo:
The HF version is like this, which is confusing, because the ops are not applied in that "sequential" order as far as I can tell:
Unless I'm wrong, I think what happens is that What I am thinking is that we have to split the fc weights, which would avoid us having to write some custom code in the GPT model class:
is this somehow possible with the lit-gpt SaveProxyTensor? |
Hey! All your suggestions make sense to me. You should be able to split the combined ff linear as you suggest, especially if load_param has ben called already. We also manipulate the qkv linears for llama2 checkpoints in a similar way. However, note that your workarounds will only work for inference. During training, wte and ff_out will not be tied and the layernorm parameters wont be frozen. |
Hello @rasbt Looks like you are correct. I just wanted to add a couple of things that I've noticed while reviewing their code. For posterity, sorta speak. I also don't like when the layers are initialized not in the order they are executed. Lit-GPT also does it: first we create So the order of execution should be as such:
Olmo has only two layer: def forward(self, x: torch.Tensor) -> torch.Tensor:
x, gate = x.chunk(2, dim=-1)
return F.silu(gate) * x and then they apply Important to note the way they split and then apply activation function on a chunk. That means that: |
Thanks so much for the feedback @carmocca and @Andrei-Aksionov , this was super helpful! After more tinkering, I went with a custom OLMoMLP (analogous to LLaMALMLP) because I thought this was easier than the other workarounds -- both from an implementation perspective but also code-readability in the future. The weights load ok now, but for some reason, the results are garbage. E.g., for python generate/base.py --checkpoint_dir ./checkpoints/allenai/OLMo-1b/
And python generate/base.py --checkpoint_dir ./checkpoints/allenai/OLMo-7b/
|
Actually, upon further inspection they only use weight tying for the 1B (https://huggingface.co/allenai/OLMo-1B/blob/main/config.json#L42) model not for the 7B model (https://huggingface.co/allenai/OLMo-7B/blob/main/config.json#L42). I adjusted the code accordingly. Still not working well though. |
I would strongly prefer that we don't add this new MLP class. To debug the output, you'll have to inspect the activations for both models layer by layer to see where they diverge |
Ok! Maybe let's leave it in there until we got it to work, and then we can refactor it into one of the existing classes somehow. |
Just to add a note about pinpointing the difference. With Carlos's help, we found that the difference currently is in how the QKV matrix is split into queries, keys, and values. https://github.com/Lightning-AI/lit-gpt/blob/main/lit_gpt/model.py#L195-L202 and https://github.com/allenai/OLMo/blob/main/olmo/model.py#L687 In Lit-GPT, the Q, K, and V are interleaved (to also support MQA) whereas in OLMo, QKV are not interleaved. We could potentially accommodate OLMo in Lit-GPT if we apply the steps here from Llama in the conversion script but in reverse: https://github.com/Lightning-AI/lit-gpt/blob/main/scripts/convert_hf_checkpoint.py#L182-L186 |
Hi there! |
Hello @jwkirchenbauer #1013 is a breaking change, so it might take time for reviewers to do their part. |
If this PR gets revived some time, we should check out the |
A tale of a PR that rose from ashes. 🙂 I tested both model, they generate normally looking text, though cannot answer a question. @rasbt I didn't try to finetune it, but it looks like there are no significant architectural changes, so it should be ok. cc @jwkirchenbauer Finally made time. |
Wow thanks for resurrecting it and pushing it forward! |
It looks like the instruct variant expects a prompt in a specific format: from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-Instruct-hf")
message = [{"role": "user", "content": "{prompt}"}]
display(tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True))
|
Hey |
Hello @aflah02 If you want to contribute (which is awesome), then you need to:
|
Thanks for the pointers @Andrei-Aksionov |
Fine by me.
Sebastian and I worked on it. What do you think, @rasbt ? |
Thanks for your interest in pushing this PR to the finish line @aflah02. Unfortunately I recently got swamped with other work responsibilities so I currently can't spend as much time as I wanted to on this and would definitely appreciate the contribution!
I personally also don't mind the lack of attribution for me here. Thanks for doing this tedious task of copying over the branch! |
Thanks @Andrei-Aksionov @rasbt |
I was looking at the chat template in the OLMo HF tokenizer config - https://huggingface.co/allenai/OLMo-7B-Instruct/blob/main/tokenizer_config.json#L242 and it looks much more complex than the one you've shared above partly due to the for loops etc. which I guess are used to keep applying the template to all messages in the conversation. I am not super familiar with how LitGPT adds these prompts but looking at prompts.py it seems that for some models the implementation is just to apply the template -
While for others multiple cases like when the prompt is a str v/s a list and what are the stop_tokens are handled explicitly -
How does one figure out which route is needed for their specific model? |
Also as I was copying over the changes I was also referring to this adding models guide (https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md) and I noticed that even without copying over the checkpoint conversion files it just worked out of the box. Is this expected? I thought it wouldn't given there was custom logic written to convert checkpoint from hf to lit and vice-a-versa |
Also raised a PR. Not all changes are done but some of the changes in this PR don't seem to be needed - https://github.com/Lightning-AI/litgpt/pull/1827/files |
Hello @aflah02 I believe only the prompt for Llama 3 supports multi-turn generation. I think for now it's fine to add a template for the basic template and then, in later PRs, extend it.
I didn't notice that, but the script, that I wrote for converting weights, is actually a special case of |
Thanks for the confirmation! |
Closing it as it was implemented in #1827 |
Adds the popular and fully open-source OLMo models by Allen AI.
generate.py
produces reasonable outputsFixes #925