You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's your difficulty of supporting the model you want?
LLama 3 instruct requires a different stop token than is specified in the tokenizer.json file.
The tokenizer.json specifies <|end_of_text|> as the end of string token which works for the base LLama 3 model, but this is not the right token for the instruct tune. The instruct tune uses <|eot_id|>.
importtransformersimporttorchmodel_id="meta-llama/Meta-Llama-3-8B-Instruct"pipeline=transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages= [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
prompt=pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# HERE is where they add the `<|eot_id|>` token, which is not the default end of string token, to the list of terminators.terminators= [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs=pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
The model to consider.
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
The closest model vllm already supports.
LLama 1 & 2
What's your difficulty of supporting the model you want?
LLama 3 instruct requires a different stop token than is specified in the
tokenizer.json
file.The
tokenizer.json
specifies<|end_of_text|>
as the end of string token which works for the base LLama 3 model, but this is not the right token for the instruct tune. The instruct tune uses<|eot_id|>
.You can see this in the inference code for the model on the llama 3 8B instruct model card, where this token is added:
Here is a discussion of this topic in the
llama.cpp
repository:ggerganov/llama.cpp#6751
The text was updated successfully, but these errors were encountered: