Fix OpenAI ChatCompletion Ignore stop from FastChat Conv Template #1503

xingyaoww · 2023-10-29T18:44:19Z

When using vLLM's OpenAI API server to serve models, I find that the ChatCompletion request by default does not honor the stop_token_ids and stop_str set by FastChat conversation templates. It will cause issues (model keep generating irrelevant stuff) when using vLLM served OpenAI API as an OpenAI compatible server for Gradio interface of FastChat.

This PR add a check in OpenAI ChatCompletion request to make sure the stop_token_ids and stop_str are merged with the request before send it to generation, similar to the implementation of FastChat vllm_worker.

Tostino · 2023-10-29T20:47:22Z

I replaced the implementation of this function already in: #1493

That should be the way to solve this anyways. FastChat is a hack for formatting.

xingyaoww added 2 commits October 29, 2023 18:27

add stop tokens from conv template for chat completion

8f5a827

fix line too long

8264204

WoosukKwon closed this Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OpenAI ChatCompletion Ignore stop from FastChat Conv Template #1503

Fix OpenAI ChatCompletion Ignore stop from FastChat Conv Template #1503

xingyaoww commented Oct 29, 2023

Tostino commented Oct 29, 2023 •

edited

Loading

Fix OpenAI ChatCompletion Ignore stop from FastChat Conv Template #1503

Fix OpenAI ChatCompletion Ignore stop from FastChat Conv Template #1503

Conversation

xingyaoww commented Oct 29, 2023

Tostino commented Oct 29, 2023 • edited Loading

Tostino commented Oct 29, 2023 •

edited

Loading