Support eos_token_id from generation_config.json #4182

simon-mo · 2024-04-18T23:14:40Z

Related to #4180

Some models uses eos_token_id field (Optional[Union[int, list[int]]) in generation_config.json
https://huggingface.co/docs/transformers/v4.39.3/en/main_classes/text_generation#transformers.GenerationConfig

This PR will load the config, get the value if user supplied, and inject it into stop_token_ids in sampling params. Notably this does not change the os_token_id in the sampling param or tokenizer config.

One example is DRBX. Meta Llama 3 might use generation config to reconcile the difference between <endoftext|> and <eot_id|>.

Testing

Because this model dependent, I have performed manual testing:

Run Meta Llama 3 8B instruct, see the endofturn is not respected.

~$ curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you?"
      }
    ],
    "max_tokens": 256
  }'
{"id":"cmpl-ca00059831714382b0104ca1cb7e407d","object":"chat.completion","created":1713481143,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm your helpful assistant! I'm an AI designed to assist and support you in various ways. I can help with tasks, answer questions, provide information, and even engage in conversations. My purpose is to make your life easier and more efficient, so feel free to ask me anything or tell me what's on your mind! What can I help you with today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm happy to help with any questions or tasks you have.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm a large language model, trained on a massive dataset of text from the internet, books, and other sources. I can understand and respond to natural language input, and I'm constantly learning and improving my abilities.\n\nI can help with a wide range of tasks, such as:\n\n* Answering questions on various topics, from science and history to entertainment and culture\n* Generating text, such as articles, stories, or emails\n* Translating text from one language to another\n* Summarizing long pieces of text into shorter, more digestible versions\n* Offering suggestions and ideas for creative projects or problems you're trying to solve\n* Even just chatting with you and engaging in conversation!\n\nWhat do you need help with today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm excited to hear"},"logprobs":null,"finish_reason":"length","stop_reason":null}],"

Change the field in generation config of the hf model

-   "eos_token_id": 128001,
+   "eos_token_id": [128001,128009],

Same query

~$ curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you?"
      }
    ],
    "max_tokens": 256
  }'
{"id":"cmpl-bf80caf7d899446fa9e148d1714b0552","object":"chat.completion","created":1713481243,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm your helpful assistant! I'm an AI designed to assist and support you in various ways. I can help with tasks, answer questions, provide information, and even engage in conversations. My purpose is to make your life easier and more efficient, so feel free to ask me anything or tell me what's on your mind! What can I help you with today?"},"logprobs":null,"finish_

premg16 · 2024-04-19T06:36:13Z

I am running vllm from docker image and facing the same issue what shall i do ?

simon-mo · 2024-04-19T06:39:05Z

For now you can add stop_token_ids as part of your request parameter, see #4180 (comment)

To go without this extra step, we need the model checkpoint's generation config to be updated, which is pending on HF side.

simon-mo added 2 commits April 18, 2024 23:05

Support eos_token_idfromn generation_config

edeae92

isort

dd25887

simon-mo mentioned this pull request Apr 18, 2024

v0.4.1 Release Tracker #4181

Closed

9 tasks

robertgshaw2-redhat approved these changes Apr 18, 2024

View reviewed changes

simon-mo enabled auto-merge (squash) April 18, 2024 23:27

fix test

37d3aa2

simon-mo merged commit a134ef6 into vllm-project:main Apr 19, 2024
46 checks passed

youkaichao mentioned this pull request Apr 19, 2024

[Bug]: The LLama 3 base generation does not stop based on the passed stop words #4188

Closed

aliozts mentioned this pull request Apr 19, 2024

[Usage]: Llama 3 8B Instruct Inference #4180

Closed

hmellor mentioned this pull request Apr 20, 2024

codellama 70b don't stop generating #2686

Closed

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

3a4fa49

agt mentioned this pull request Apr 23, 2024

[New Model]: Llama 3 8B Instruct #4297

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 25, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

3d2f446

xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 26, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

8cca938

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

e726a89

bigPYJ1151 mentioned this pull request Apr 30, 2024

[Bugfix][Minor] Make ignore_eos effective #4468

Merged

alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

fc91387

CatherineSue mentioned this pull request May 3, 2024

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True #4589

Closed

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

35b9861

victorlwchen mentioned this pull request May 10, 2024

server: phi-3 end token not handled? ggerganov/llama.cpp#6903

Closed

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

This was referenced Jun 28, 2024

[Feature]: Add argument terminators "eos_token_id" to serving models api_server.py #4260

Closed

[Bugfix] Support eos_token_id from config.json #5954

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Support eos_token_id from generation_config.json (vllm-project#4182)

5332fee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support eos_token_id from generation_config.json #4182

Support eos_token_id from generation_config.json #4182

simon-mo commented Apr 18, 2024

premg16 commented Apr 19, 2024

simon-mo commented Apr 19, 2024

Support eos_token_id from generation_config.json #4182

Support eos_token_id from generation_config.json #4182

Conversation

simon-mo commented Apr 18, 2024

Testing

premg16 commented Apr 19, 2024

simon-mo commented Apr 19, 2024