Add Chat Template Support to vLLM #1493

… better documentation as well.

# Conflicts: # vllm/entrypoints/openai/api_server.py

… parameter controls whether the prompt ends with tokens indicating the start of an assistant message. When set to False (and the template/model supports it) the model should complete the last response in the list. By default, it maintains compatibility with OpenAI's API behavior. Fixed Role Determination in Responses: Resolved an issue where the role in responses defaulted to "assistant" regardless of context. This fix ensures the response role aligns with the intended conversational participant, enhancing the API's versatility in various chat scenarios. Introduced 'return_full_response' Request Parameter (Default: False): This parameter, when set to True, negates the need for client-side response merging. It simplifies client integration by providing complete responses even when the client had started the response for the model.

…streaming responses after testing how it worked with the regular OpenAI completion API. Fixed inconsistencies with official OpenAI API, and what we were returning for streaming responses for the chat completion API. Added error handling so if there is an issue with applying the template, it is reported to the user through an API error, and logged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Chat Template Support to vLLM #1493

Add Chat Template Support to vLLM #1493

Commits on Oct 28, 2023

Commits on Nov 8, 2023

Commits on Nov 11, 2023

Commits on Nov 14, 2023

Commits on Nov 15, 2023