Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support guided decoding for vllm async engine #2391

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wxiwnd
Copy link
Contributor

@wxiwnd wxiwnd commented Oct 3, 2024

Support Guided Decoding for vllm async engine
waiting for vllm release, a version bump is needed.

#1562
vllm-project/vllm#8252

@XprobeBot XprobeBot added this to the v0.15 milestone Oct 3, 2024
@wxiwnd wxiwnd marked this pull request as draft October 3, 2024 06:18
@wxiwnd wxiwnd marked this pull request as ready for review October 5, 2024 07:55
@qinxuye
Copy link
Contributor

qinxuye commented Oct 11, 2024

Which version is required?

@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 11, 2024

Which version is required?

latest version after 0.6.2, waiting for vllm to release new version

@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from 2968700 to cd0812a Compare October 15, 2024 10:35
@qinxuye
Copy link
Contributor

qinxuye commented Oct 17, 2024

vllm has release v0.6.3, is this PR ready to work?

@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 17, 2024 via email

@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 7 times, most recently from 4d9e044 to 852c86c Compare October 22, 2024 09:30
@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 22, 2024

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now
Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

@qinxuye
Copy link
Contributor

qinxuye commented Oct 22, 2024

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

Signed-off-by: wxiwnd <wxiwnd@outlook.com>
Signed-off-by: wxiwnd <wxiwnd@outlook.com>
@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 26, 2024

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

It now works properly even if vllm version < 0.6.3
All the guided encoding parameters will be ignored if vllm version is under 0.6.3

# FIXME schema replica error in Pydantic
# source: ResponseFormatJSONSchema in ResponseFormat
# use alias
# _response_format: Optional[ResponseFormat] = Field(alias="response_format")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any solution to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be the same issue as Xinference Issue #2032, and I have not yet found a solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have developed a parameter parser for the RESTful API component to ensure that the functionality remains intact. Therefore, these parts can be safely ignored, even though parsing requests in the RESTful API part is somewhat "dirty".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we solved by copying the openai pydantic model into xinference, refer to https://github.com/xorbitsai/inference/pull/2231/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the error is Field name "schema" shadows a BaseModel attribute; use a different field name with "alias='schema'".
I found https://github.com/openai/openai-python/blob/main/src/openai/types/shared_params/response_format_json_schema.py#L11 maybe the source of conflict. Because another declaration of json_schema https://github.com/openai/openai-python/blob/main/src/openai/types/shared/response_format_json_schema.py#L27
use schema_ instead of schema.

I will try to fix this by define ResponseFormat myself.

@XprobeBot XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants