Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OLMo model family #29890

Merged
merged 71 commits into from
Apr 17, 2024
Merged

Conversation

2015aroras
Copy link
Contributor

@2015aroras 2015aroras commented Mar 27, 2024

What does this PR do?

This PR adds the OLMo model family to transformers. A base OLMoModel and a casual LM OLMoForCausalLM are implemented. The models are already present in HF Hub (e.g. allenai/OLMo-7B), and this implementation is compatible with the checkpoints in the Hub.

UPDATE: The current version of the PR is not compatible with the old OLMo models in HF Hub. New models have been uploaded to HF Hub (e.g. allenai/OLMo-7B-hf) to support this PR.

Fixes #29885

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@2015aroras
Copy link
Contributor Author

@ArthurZucker and @younesbelkada for general guidance and/or review, since this is a PyTorch text model.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for submitting the PR!
The code seems in a very research state, let's try to get it to transformers philosophy.
My main query would be: what are the main differences with MixtralMoe ? (Llama + MoE) ? Only include differences for trained checkpoints (I see both alibi and rotary, all of them use both? Or only one was kept?)

The fast tokenizer also looks very similar to GptNeoX / Bloom one, any reason to have a new one?

@2015aroras
Copy link
Contributor Author

2015aroras commented Mar 27, 2024

Hey! Thanks for submitting the PR! The code seems in a very research state, let's try to get it to transformers philosophy. My main query would be: what are the main differences with MixtralMoe ? (Llama + MoE) ? Only include differences for trained checkpoints (I see both alibi and rotary, all of them use both? Or only one was kept?)

The fast tokenizer also looks very similar to GptNeoX / Bloom one, any reason to have a new one?

Thanks for having a look!

In a general sense, I tried to stay as faithful to OLMo as I could while following general transformers principles (all model code in 1 file, use "Copied from" with re-usable logic, etc.).

Regarding tokenization, OLMo adds an eos token to the end of inputs that don't have them. The GptNeoX and Bloom ones don't. Llama's tokenizer does, but it also has a slow tokenizer and I couldn't figure out how to convert the fast OLMo tokenizer to a slow sentencepiece one. My current implementation copies a lot from the Llama fast tokenizer.

In terms of the model, OLMo doesn't have MoE whereas MixtralMoe does. I would have to figure out if I could use MixtralMoe while avoiding the MoE functionality. OLMo is more directly map-able to Llama; the model architectures mostly map 1 to 1, and the main discrepancies in output come from the 2 models using different precision for some operations.

The official OLMo models use rope and at most 1 of rope and alibi is used in a run (i.e. no alibi), but we hope to release checkpoints from our experiments at some point. Some of those would use alibi. I think it would be nice if transformers could support those checkpoints too.

@dirkgr FYI

@ArthurZucker
Copy link
Collaborator

Hey! Sorry I might have misunderstood (thought I saw some MoE things here).
Alright when I mean research I mean there are a lot of things that we never do in transformers.

  1. Tokenizer, we can update bloom / Gpneox to support this option as IMO it makes a lot of sense! No need for a slow one 😉
  2. The modeling code does not seem to include a lot of # Copied from, would recommend you to look at Gemma modeling for a good example! Basically we want to avoid all the autocasting code when it is not necessary. we do checkpointing a bit differently and we want to have the least number of code paths as possible.
  3. For the config, all of the different enums for layer norms and act fn, we already have the from ...activations import ACT2FN and the ALL_LAYER_NORMS which should be used.
  4. BufferCache is not really something we use, and I am not sure we had issue with the community after the last Llama fixes!
  5. k_norm and q_norm are used in Persimmon so a good place to copy from as well!
  6. _scaled_dot_product_attention should follow the classing we use! So xxxxSdpaAttention
  7. if clip_qkv is always true let's not have a if else!
  8. attention mask logic should use the _update_causal_mask from Llama and have a similar logic
  9. We don't use reset_parameters or device let's try to get closer to Llama!
  10. We don't incude get_fsdp_wrap_policy and fsdp specifics in transformers
  11. Let's add some generation tests!

@2015aroras
Copy link
Contributor Author

2015aroras commented Mar 28, 2024

Just want to clarify my thoughts before I try making changes, so I'm expressing my understanding of what I should do below. Please correct me where I am wrong.

Regarding the tokenizer, if possible I should update one of the existing tokenizers and then use that tokenizer directly (similar to how mamba appears to use GPTNeoXTokenizerFast).

The comments regarding modeling code suggest that I should aim to re-use existing code using #Copied from and any existing functions. More generally, I should aim to be more "transformers-like" than faithful to the original code. I should try to make the code run like OLMo (produce the same output) but look like transformers.

Assuming it were possible, the ideal solution would be for me to use some existing model and convert OLMo checkpoints to work with that model.


Regarding copying existing code vs being faithful, what should I do when some operation (say, RoPE) exists in transformers but is performed in a different precision (e.g. float32 vs bfloat16) in OLMo compared to the transformer code? Do I use the transformers code and disregard the "small" error, or write separate code that matches the original OLMo behavior?

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 28, 2024

Assuming it were possible, the ideal solution would be for me to use some existing model and convert OLMo checkpoints to work with that model.

Yes! This way it's easier for all the community to use the model, and keeps the diff very clear and clean!

For the tokenizer, if GPTNeoXTokenizerFast does not have add_eos or add_bos, we can add it but in a backward compatible way!

The comments regarding modeling code suggest that I should aim to re-use existing code using #Copied from and any existing functions. More generally, I should aim to be more "transformers-like" than faithful to the original code. I should try to make the code run like OLMo (produce the same output) but look like transformers.

Exactly 💯

For ROPE, we should always compute it in float32 that is from experiences and long debugging.
Otherwise you can use the code to match original Olmo behavior where it is needed of course! We aim for 1e-4 logits equivalence!

initializer_range=0.02,
use_cache=True,
pad_token_id=1,
bos_token_id=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it wanted that default config do not set the bos_token_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLMo's config class (in the original Github) doesn't have a bos token and so no OLMo checkpoints use bos tokens to my knowledge. Thus setting no bos_token_id is intentional.

@molbap
Copy link
Contributor

molbap commented Apr 13, 2024

I think the PR is in a good state. Thanks for adding this! pinging @ArthurZucker for a final review :)

@2015aroras
Copy link
Contributor Author

@molbap @ArthurZucker All checks are passing! Now it's just a matter of reviews

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good state!

  • camel casing needs to be used on a ll classes!
  • the main diff is clamping, it should not be an if else, because else it's a llama model 🤗

@@ -174,6 +174,7 @@
("nllb-moe", "NllbMoeConfig"),
("nougat", "VisionEncoderDecoderConfig"),
("nystromformer", "NystromformerConfig"),
("olmo", "OLMoConfig"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("olmo", "OLMoConfig"),
("olmo", "OlmoConfig"),

we need camel casing on all classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not modify an unrelated file as important as Llama! Remove the copied from and for the forward only! / use # Ignore copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 75 to 79
pretraining_tp (`int`, *optional*, defaults to 1):
Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
document](https://huggingface.co/docs/transformers/main/perf_train_gpu_many#tensor-parallelism) to understand more about it. This value is
necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
issue](https://github.com/pytorch/pytorch/issues/76232).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of that no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/transformers/models/olmo/modeling_olmo.py Show resolved Hide resolved
Comment on lines 335 to 352
if self.config.pretraining_tp > 1:
key_value_slicing = (self.num_key_value_heads * self.head_dim) // self.config.pretraining_tp
query_slices = self.q_proj.weight.split(
(self.num_heads * self.head_dim) // self.config.pretraining_tp, dim=0
)
key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)

query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)]
query_states = torch.cat(query_states, dim=-1)

key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)]
key_states = torch.cat(key_states, dim=-1)

value_states = [F.linear(hidden_states, value_slices[i]) for i in range(self.config.pretraining_tp)]
value_states = torch.cat(value_states, dim=-1)

else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretraining TP should be removed. Was mostly unused for Llama

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/transformers/models/olmo/modeling_olmo.py Show resolved Hide resolved
Comment on lines 399 to 403
if self.config.pretraining_tp > 1:
attn_output = attn_output.split(self.hidden_size // self.config.pretraining_tp, dim=2)
o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.config.pretraining_tp, dim=1)
attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.config.pretraining_tp)])
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 952 to 957
# Register a causal mask to separate causal and padding mask creation. Merging happens in the attention class.
# NOTE: This is not friendly with TorchScript, ONNX, ExportedProgram serialization for very large `max_position_embeddings`.
causal_mask = torch.full(
(config.max_position_embeddings, config.max_position_embeddings), fill_value=True, dtype=torch.bool
)
self.register_buffer("causal_mask", torch.triu(causal_mask, diagonal=1), persistent=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove as it is not used anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no new tokenizer, so this should be done in the modeling imo, and just minimal tests to make sure bos and eos is added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2015aroras
Copy link
Contributor Author

Addressed all PR comments, and hopefully all checks will pass.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great work! 🔥

@ArthurZucker ArthurZucker merged commit e4ea19b into huggingface:main Apr 17, 2024
21 checks passed
@2015aroras 2015aroras deleted the add-olmo-implementation branch April 17, 2024 16:39
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Apr 18, 2024
* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv
ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024
* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv
ydshieh pushed a commit that referenced this pull request Apr 23, 2024
* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv
itazap pushed a commit that referenced this pull request May 14, 2024
* Add OLMo using add-new-model-like with Llama

* Fix incorrect tokenizer for OLMo

* Copy-paste relevant OLMo methods and their imports

* Add OLMo config

* Modify OLMo config to follow HF conventions

* Remove unneeded Llama code from OLMo model

* Add ability for OLMo model to output attentions

* Add OLMoPreTrainedModel and OLMoModel

* Add OLMoForCausalLM

* Minor fixes to OLMo model for style and missing functions

* Implement OLMo tokenizer

* Implement OLMo to HF conversion script

* Add tests for OLMo model

* Add tests for OLMo fast tokenizer

* Add auto-generated dummy objects

* Remove unimplemented OLMo classes from auto and init classes and re-format

* Add README and associated auto-generated files

* Use OLMo names for common properties

* Run make fixup

* Remove `|` from OLMo typing

* Remove unneeded tokenization_olmo.py

* Revert model, config and converter to add-new-model-like Llama

* Move logic for adding bos/eos token into GPTNeoxTokenizerFast

* Change OLMoConfig defaults to match OLMo-7B

* Use GPTNeoXToknizerFast in OLMo tokenizer tests

* Modify auto-generated OLMoModelTests to work for OLMo

* Add non-parametric layer norm OLMoLayerNorm

* Update weight conversion script for OLMo

* Fix __init__ and auto structure for OLMo

* Fix errors from make fixup

* Remove OLMoTokenizerFast from documentation

* Add missing 'Copied from' for OLMoModel._update_causal_mask

* Run make fix-copies

* Rearrange string replacements in OLMoForCausalLM Copied from

* Move OLMo and Llama CausalLM.forward example into global constants

* Fix OLMO_GENERATION_EXAMPLE doc string typo

* Add option for qkv clipping to OLMo

* Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf

* Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf

* Fix OLMo tokenization bug using conversion script

* Keep model in full precision after conversion

* Do not add eos token automatically

* Update references to OLMo model in HF Hub

* Do not add eos token during encoding by default

* Fix Llama generation example

* Run make fixup

* OLMo 7B integration test fix

* Remove unneeded special case for OLMoConfig

* OLMo 7B Twin 2T integration test fix

* Fix test_model_7b_greedy_generation

* Remove test_compile_static_cache

* Fix OLMo and Llama generation example

* Run make fixup

* Revert "OLMo 7B integration test fix"

This reverts commit 4df56a4.

* Revert "OLMo 7B Twin 2T integration test fix"

This reverts commit 9ff65a4.

* Ungate 7B integration tests and fix greedy generation test

* Add retries for flaky test_eager_matches_sdpa_generate

* Fix output of doc example for OLMoForCausalLM.forward

* Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model

* Try fix incorrect characters in OLMoForCausalLM.forward doct test

* Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes

* Remove pretraining_tp from OLMo config and model

* Add missing 'Copied from' instances

* Remove unneeded causal_mask from OLMoModel

* Revert Llama changes

* Ignore copy for OLMoForCausalLM.forward

* Change 'OLMo' to 'Olmo' in classes

* Move minimal OLMo tokenization tests to model tests

* Add missed 'Copied from' for repeat_kv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OLMo family
3 participants