-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Llama-3.1 (8b) - inference #80
Comments
Hi @Bihan, we do not support Llama3.1 yet, but we will definitely work on supporting it soon. |
@tengomucho That is good news. I found fine-tuning works with
Looking eagerly for an update. |
What is thee kind of support you are trying to achieve @Bihan, is that fine-tuning or inference/generation? |
@tengomucho Inference/generation |
In that case, I would suggest you to start with this example instead. It should be much simpler. |
@tengomucho Thank you for your support I was able to serve Llama-3.1 (8b) inference by setting default value of key |
Contributed implementation: #85 |
Fine tuning example works with
Llama-3.1 (8b)
withTransformers version 4.43.3
and modification of--rope-scaling
in model'sconfig.json
.Below is the error log without rope-scaling modification
Traceback (most recent call last): File "/root/optimum-tpu/examples/custom/train.py", line 138, in <module> model, tokenizer = create_and_prepare_model(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/examples/custom/train.py", line 72, in create_and_prepare_model model = AutoModelForCausalLM.from_pretrained(args.model_name, use_cache=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling.py", line 64, in from_pretrained model = cls.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/workflow/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3788, in from_pretrained model = cls(config, *model_args, **model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 1180, in __init__ self.model = LlamaModel(config, rank, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in __init__ [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 956, in <listcomp> [LlamaDecoderLayer(config, layer_idx, rank, world_size) for layer_idx in range(config.num_hidden_layers)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 746, in __init__ self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 333, in __init__ self._init_rope() File "/root/optimum-tpu/optimum/tpu/modeling_llama.py", line 343, in _init_rope scaling_type = self.config.rope_scaling["type"] ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ KeyError: 'type'
I would really appreciate your suggestions/plans on the following:
The text was updated successfully, but these errors were encountered: