Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Automatic detection of chat content format from AST #9919

Merged
merged 27 commits into from
Nov 16, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d484401
Write out skeleton code
DarkLight1337 Nov 1, 2024
b91de40
Iterate
DarkLight1337 Nov 2, 2024
569c076
Update docs and logs
DarkLight1337 Nov 2, 2024
4a1b1e0
Improve detection
DarkLight1337 Nov 2, 2024
0410d9f
Add and fix tests
DarkLight1337 Nov 2, 2024
4856117
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 5, 2024
54c25c3
Improve error handling
DarkLight1337 Nov 5, 2024
fea7481
Add example
DarkLight1337 Nov 5, 2024
825ebe9
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 7, 2024
7e6772d
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 8, 2024
4afe254
Remove repeated definition
DarkLight1337 Nov 8, 2024
67fcdad
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 9, 2024
77951a2
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 13, 2024
231b4d9
Remove unused attribute
DarkLight1337 Nov 13, 2024
656b334
Consider variable reassignment
DarkLight1337 Nov 13, 2024
3cec391
Cleanup
DarkLight1337 Nov 13, 2024
6299788
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 13, 2024
f00419d
Fix
DarkLight1337 Nov 13, 2024
7c594e1
format
DarkLight1337 Nov 13, 2024
5b87baf
Simplify the code
DarkLight1337 Nov 13, 2024
4b3dd75
Fix bug when chat_template is None
DarkLight1337 Nov 13, 2024
a75d813
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 13, 2024
03f6e98
Recurse into var assignment
DarkLight1337 Nov 13, 2024
d98735e
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 14, 2024
c8a6a75
Fix redundant check
DarkLight1337 Nov 15, 2024
1ea0b37
Use iterative BFS
DarkLight1337 Nov 15, 2024
ea474fa
Merge branch 'main' into chat-template-content-format
DarkLight1337 Nov 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,20 @@ completion = client.chat.completions.create(
]
)
```
Most chat templates for LLMs expect the `content` to be a `string` but there are some newer models like
`meta-llama/Llama-Guard-3-1B` that expect the content to be parsed with the new OpenAI spec. In order to choose which
format the content needs to be parsed in by vLLM, please use the `--chat-template-text-format` argument to specify
between `string` or `openai`. The default value is `string` and vLLM internally converts both spec formats to match
this, unless explicitly specified.

Most chat templates for LLMs expect the `content` field to be a string, but there are some newer models like
`meta-llama/Llama-Guard-3-1B` that expect the content to be formatted according to the OpenAI schema in the
request. vLLM provides best-effort support to detect this automatically, which is logged as a string like
*"Detected the chat template content format to be..."*, and internally converts incoming requests to match
the detected format, which can be one of:

- `"string"`: A string.
- Example: `"Hello world"`
- `"openai"`: A list of dictionaries, similar to OpenAI schema.
- Example: `[{"type": "text", "text": "Hello world!"}]`

If the result is not what you expect, you can set the `--chat-template-content-format` CLI argument
to override which format to use.

## Command line arguments for the server

Expand Down
3 changes: 2 additions & 1 deletion tests/entrypoints/openai/test_serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ class MockModelConfig:
tokenizer = MODEL_NAME
trust_remote_code = False
tokenizer_mode = "auto"
chat_template_text_format = "string"
max_model_len = 100
tokenizer_revision = None
multimodal_config = MultiModalConfig()
Expand All @@ -49,6 +48,7 @@ async def _async_serving_chat_init():
BASE_MODEL_PATHS,
response_role="assistant",
chat_template=CHAT_TEMPLATE,
chat_template_content_format="auto",
lora_modules=None,
prompt_adapters=None,
request_logger=None)
Expand All @@ -70,6 +70,7 @@ def test_serving_chat_should_set_correct_max_tokens():
BASE_MODEL_PATHS,
response_role="assistant",
chat_template=CHAT_TEMPLATE,
chat_template_content_format="auto",
lora_modules=None,
prompt_adapters=None,
request_logger=None)
Expand Down
Loading