Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix stop detections #1392

Merged
merged 7 commits into from
May 22, 2023
Merged

fix stop detections #1392

merged 7 commits into from
May 22, 2023

Conversation

mingfang
Copy link
Contributor

The current stop detection works but only after the partial stop sequence was already streamed.
This causes ReAct to break.

This change adds partial stop detection and avoids streaming it.

fastchat/serve/inference.py Outdated Show resolved Hide resolved
fastchat/serve/inference.py Outdated Show resolved Hide resolved
@mingfang mingfang requested a review from suquark May 20, 2023 21:17
fastchat/serve/inference.py Outdated Show resolved Hide resolved
fastchat/serve/inference.py Outdated Show resolved Hide resolved
@mingfang mingfang requested a review from merrymercy May 21, 2023 00:08
fastchat/serve/inference.py Outdated Show resolved Hide resolved
@mingfang mingfang requested a review from suquark May 21, 2023 00:38
@suquark
Copy link
Collaborator

suquark commented May 21, 2023

@merrymercy any suggestions of testing this

@mingfang
Copy link
Contributor Author

mingfang commented May 21, 2023

One way to test is to use this curl that simulates the LangChain calculator agent example.
The example is to ask what is 1+1.
It should respond by asking the calculator tool for 1+1, with the fix.

curl -d '{"model":"vicuna-13b-v1.1","temperature":0,"max_tokens":256,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"best_of":1,"stop":["\nObservation: "],"stream":true,"prompt":["Answer the following questions as best you can. You have access to the following tools:\n\ncalculator: Useful for getting the result of a math expression. The input to this tool should be a valid mathematical expression that could be executed by a simple calculator.\n\nUse the following format in your response:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [calculator]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: what is 1+1\nThought:"]}' localhost:8000/v1/completions -H 'content-type: application/json'

The output should end like this

...more before this...
data: {"id": "cmpl-3ADpLjweTitvHHpddeV55R", "object": "text_completion", "model": "vicuna-13b-v1.1", "choices": [{"index": 0, "text": "+1", "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3ADpLjweTitvHHpddeV55R", "object": "text_completion", "model": "vicuna-13b-v1.1", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "stop"}]}

data: [DONE]

@mingfang mingfang requested a review from suquark May 21, 2023 02:02
Copy link
Member

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@suquark suquark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@suquark suquark merged commit 75d8ab2 into lm-sys:main May 22, 2023
@plancktree
Copy link

but wen

One way to test is to use this curl that simulates the LangChain calculator agent example. The example is to ask what is 1+1. It should respond by asking the calculator tool for 1+1, with the fix.

curl -d '{"model":"vicuna-13b-v1.1","temperature":0,"max_tokens":256,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"best_of":1,"stop":["\nObservation: "],"stream":true,"prompt":["Answer the following questions as best you can. You have access to the following tools:\n\ncalculator: Useful for getting the result of a math expression. The input to this tool should be a valid mathematical expression that could be executed by a simple calculator.\n\nUse the following format in your response:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [calculator]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: what is 1+1\nThought:"]}' localhost:8000/v1/completions -H 'content-type: application/json'

The output should end like this

...more before this...
data: {"id": "cmpl-3ADpLjweTitvHHpddeV55R", "object": "text_completion", "model": "vicuna-13b-v1.1", "choices": [{"index": 0, "text": "+1", "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3ADpLjweTitvHHpddeV55R", "object": "text_completion", "model": "vicuna-13b-v1.1", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "stop"}]}

data: [DONE]

but when I use vllm in 4 v100 GPUS the problem occurs
there is not "stop"
when I use vllm in 4 v100 GPUS the problem occurs
there is "stop" reason.I wander why
nostop
stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants