feat: support guided decoding for vllm async engine #2391

wxiwnd · 2024-10-03T06:17:44Z

Support Guided Decoding for vllm async engine
waiting for vllm release, a version bump is needed.

#1562
vllm-project/vllm#8252

qinxuye · 2024-10-11T05:01:49Z

Which version is required?

wxiwnd · 2024-10-11T05:16:24Z

Which version is required?

latest version after 0.6.2, waiting for vllm to release new version

qinxuye · 2024-10-17T09:37:46Z

vllm has release v0.6.3, is this PR ready to work?

wxiwnd · 2024-10-17T17:49:21Z

I will do the test.

…

________________________________ 寄件者: Xuye Qin ***@***.***> 寄件日期: 星期四, 10月 17, 2024 5:38:13 下午收件者: xorbitsai/inference ***@***.***> 副本: wxiwnd ***@***.***>; Author ***@***.***> 主旨: Re: [xorbitsai/inference] feat: support guided decoding for vllm async engine (PR #2391) vllm has release v0.6.3, is this PR ready to work? — Reply to this email directly, view it on GitHub<#2391 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJSDNXWKTHLIQE35VTQLT23Z36AQDAVCNFSM6AAAAABPJCVJQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJZGA2DOMZWGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

wxiwnd · 2024-10-22T09:57:53Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now
Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

qinxuye · 2024-10-22T14:31:50Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

Signed-off-by: wxiwnd <wxiwnd@outlook.com>

wxiwnd · 2024-10-26T08:58:28Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

It now works properly even if vllm version < 0.6.3
All the guided encoding parameters will be ignored if vllm version is under 0.6.3

qinxuye · 2024-10-30T05:01:42Z

xinference/_compat.py

+    # FIXME schema replica error in Pydantic
+    # source: ResponseFormatJSONSchema in ResponseFormat
+    # use alias
+    # _response_format: Optional[ResponseFormat] = Field(alias="response_format")


Is there any solution to this?

It appears to be the same issue as Xinference Issue #2032, and I have not yet found a solution.

I have developed a parameter parser for the RESTful API component to ensure that the functionality remains intact. Therefore, these parts can be safely ignored, even though parsing requests in the RESTful API part is somewhat "dirty".

IIRC, we solved by copying the openai pydantic model into xinference, refer to https://github.com/xorbitsai/inference/pull/2231/files

While the error is Field name "schema" shadows a BaseModel attribute; use a different field name with "alias='schema'".
I found https://github.com/openai/openai-python/blob/main/src/openai/types/shared_params/response_format_json_schema.py#L11 maybe the source of conflict. Because another declaration of json_schema https://github.com/openai/openai-python/blob/main/src/openai/types/shared/response_format_json_schema.py#L27
use schema_ instead of schema.

I will try to fix this by define ResponseFormat myself.

XprobeBot added the feature label Oct 3, 2024

XprobeBot added this to the v0.15 milestone Oct 3, 2024

wxiwnd marked this pull request as draft October 3, 2024 06:18

wxiwnd marked this pull request as ready for review October 5, 2024 07:55

wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from 2968700 to cd0812a Compare October 15, 2024 10:35

wxiwnd force-pushed the feat/guided_generation branch 7 times, most recently from 4d9e044 to 852c86c Compare October 22, 2024 09:30

wxiwnd force-pushed the feat/guided_generation branch from 852c86c to 894352e Compare October 26, 2024 07:51

feat: support response_format call

820c726

Signed-off-by: wxiwnd <wxiwnd@outlook.com>

wxiwnd force-pushed the feat/guided_generation branch from 894352e to 820c726 Compare October 26, 2024 08:05

fix: vllm version compatibility

df849b1

Signed-off-by: wxiwnd <wxiwnd@outlook.com>

wxiwnd force-pushed the feat/guided_generation branch from 823887f to df849b1 Compare October 26, 2024 08:35

qinxuye reviewed Oct 30, 2024

View reviewed changes

XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support guided decoding for vllm async engine #2391

feat: support guided decoding for vllm async engine #2391

wxiwnd commented Oct 3, 2024 •

edited

Loading

qinxuye commented Oct 11, 2024

wxiwnd commented Oct 11, 2024 •

edited

Loading

qinxuye commented Oct 17, 2024

wxiwnd commented Oct 17, 2024 via email

wxiwnd commented Oct 22, 2024

qinxuye commented Oct 22, 2024 •

edited

Loading

wxiwnd commented Oct 26, 2024 •

edited

Loading

qinxuye Oct 30, 2024

wxiwnd Oct 30, 2024

wxiwnd Oct 30, 2024

qinxuye Oct 30, 2024

wxiwnd Oct 30, 2024

feat: support guided decoding for vllm async engine #2391

Are you sure you want to change the base?

feat: support guided decoding for vllm async engine #2391

Conversation

wxiwnd commented Oct 3, 2024 • edited Loading

qinxuye commented Oct 11, 2024

wxiwnd commented Oct 11, 2024 • edited Loading

qinxuye commented Oct 17, 2024

wxiwnd commented Oct 17, 2024 via email

wxiwnd commented Oct 22, 2024

qinxuye commented Oct 22, 2024 • edited Loading

wxiwnd commented Oct 26, 2024 • edited Loading

qinxuye Oct 30, 2024

Choose a reason for hiding this comment

wxiwnd Oct 30, 2024

Choose a reason for hiding this comment

wxiwnd Oct 30, 2024

Choose a reason for hiding this comment

qinxuye Oct 30, 2024

Choose a reason for hiding this comment

wxiwnd Oct 30, 2024

Choose a reason for hiding this comment

wxiwnd commented Oct 3, 2024 •

edited

Loading

wxiwnd commented Oct 11, 2024 •

edited

Loading

qinxuye commented Oct 22, 2024 •

edited

Loading

wxiwnd commented Oct 26, 2024 •

edited

Loading