Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chat Template Support to vLLM #1493

Closed
wants to merge 7 commits into from

Commits on Oct 28, 2023

  1. Add support for HF chat templates to OpenAI chat completions API. Add…

    … better documentation as well.
    Tostino committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    f641c76 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a2f6df1 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2023

  1. Merge branch 'main' into chat_templates

    # Conflicts:
    #	vllm/entrypoints/openai/api_server.py
    Tostino committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    858c1ec View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2023

  1. Configuration menu
    Copy the full SHA
    5e2fcba View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2023

  1. Added 'add_generation_prompt' Request Parameter (Default: True): This…

    … parameter controls whether the prompt ends with tokens indicating the start of an assistant message. When set to False (and the template/model supports it) the model should complete the last response in the list. By default, it maintains compatibility with OpenAI's API behavior.
    
    Fixed Role Determination in Responses: Resolved an issue where the role in responses defaulted to "assistant" regardless of context. This fix ensures the response role aligns with the intended conversational participant, enhancing the API's versatility in various chat scenarios.
    
    Introduced 'return_full_response' Request Parameter (Default: False): This parameter, when set to True, negates the need for client-side response merging. It simplifies client integration by providing complete responses even when the client had started the response for the model.
    Tostino committed Nov 14, 2023
    Configuration menu
    Copy the full SHA
    9ca35c1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8e9875f View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2023

  1. Renamed return_full_response to echo, and made it also work with …

    …streaming responses after testing how it worked with the regular OpenAI completion API.
    
    Fixed inconsistencies with official OpenAI API, and what we were returning for streaming responses for the chat completion API.
    
    Added error handling so if there is an issue with applying the template, it is reported to the user through an API error, and logged.
    Tostino committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    f159a62 View commit details
    Browse the repository at this point in the history