Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OpenAI ChatCompletion Ignore stop from FastChat Conv Template #1503

Conversation

xingyaoww
Copy link

When using vLLM's OpenAI API server to serve models, I find that the ChatCompletion request by default does not honor the stop_token_ids and stop_str set by FastChat conversation templates. It will cause issues (model keep generating irrelevant stuff) when using vLLM served OpenAI API as an OpenAI compatible server for Gradio interface of FastChat.

This PR add a check in OpenAI ChatCompletion request to make sure the stop_token_ids and stop_str are merged with the request before send it to generation, similar to the implementation of FastChat vllm_worker.

@Tostino
Copy link
Contributor

Tostino commented Oct 29, 2023

I replaced the implementation of this function already in: #1493

That should be the way to solve this anyways. FastChat is a hack for formatting.

@WoosukKwon WoosukKwon closed this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants