[New Model]: Llama 3 8B Instruct #4297

K-Mistele · 2024-04-23T15:58:17Z

The model to consider.

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

The closest model vllm already supports.

LLama 1 & 2

What's your difficulty of supporting the model you want?

LLama 3 instruct requires a different stop token than is specified in the tokenizer.json file.
The tokenizer.json specifies <|end_of_text|> as the end of string token which works for the base LLama 3 model, but this is not the right token for the instruct tune. The instruct tune uses <|eot_id|>.

You can see this in the inference code for the model on the llama 3 8B instruct model card, where this token is added:

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

# HERE is where they add the `<|eot_id|>` token, which is not the default end of string token, to the list of terminators.
terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

Here is a discussion of this topic in the llama.cpp repository:
ggerganov/llama.cpp#6751

The text was updated successfully, but these errors were encountered:

agt · 2024-04-23T16:29:56Z

This will be resolved via #4182 which has been merged and will be released with 0.4.1 .

In the meantime, #4180 has suggestions on workarounds including manually editing config.json to set the correct stop token.

JPonsa · 2024-05-24T20:59:46Z

@agt is still open? is llama3 still not supported in vLLM?

hmellor · 2024-05-31T21:28:06Z

Llama-3 is supported

K-Mistele added the new model Requests to new models label Apr 23, 2024

JPonsa mentioned this issue Apr 25, 2024

[Bug]: Engine iteration timed out. This should never happen occurred when vllm 0.4.1 deployed llama3. #4293

Closed

hmellor closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model]: Llama 3 8B Instruct #4297

[New Model]: Llama 3 8B Instruct #4297

K-Mistele commented Apr 23, 2024

agt commented Apr 23, 2024

JPonsa commented May 24, 2024

hmellor commented May 31, 2024

[New Model]: Llama 3 8B Instruct #4297

[New Model]: Llama 3 8B Instruct #4297

Comments

K-Mistele commented Apr 23, 2024

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

agt commented Apr 23, 2024

JPonsa commented May 24, 2024

hmellor commented May 31, 2024