feat: add ChatCompletion endpoint in OpenAI demo server. #330

gesanqiu · 2023-07-02T07:31:50Z

Adapt from https://github.com/lm-sys/FastChat/blob/v0.2.14/fastchat/serve/openai_api_server.py
Test on vicuna-7b-v1.3 and WizardCoder.

merrymercy

See also lm-sys/FastChat#1835

vllm/entrypoints/openai/conversation.py

merrymercy · 2023-07-02T09:35:17Z

The vLLM integration for OpenAI API server has been fixed by lm-sys/FastChat#1835.
Could you test it? It should be compatible with (completion, chat-completion) x (streaming, non-streaming)

With vLLM only, you get simplicity and continuous batching.
With FastChat + vllm_worker, you get a distributed multi-model multi-worker controller + continuous batching.

gesanqiu · 2023-07-02T09:40:36Z

The vLLM integration for OpenAI API server has been fixed by lm-sys/FastChat#1835. Could you test it? It should be compatible with (completion, chat-completion) x (streaming, non-streaming)

With vLLM only, you get simplicity and continuous batching. With FastChat + vllm_worker, you get a distributed multi-model multi-worker controller + continuous batching.

I will try it tomorrow, thx.

zhuohan123

Thank you for your contribution! The changes look good to me in general. Left some small comments to add advanced sampling functionality into the ChatCompletion API.

zhuohan123 · 2023-07-02T16:05:33Z

vllm/entrypoints/openai/protocol.py

+    object: str = "chat.completion.chunk"
+    created: int = Field(default_factory=lambda: int(time.time()))
+    model: str
+    choices: List[ChatCompletionResponseStreamChoice]


Nit: new line

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/api_server.py

…t#330)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>

This PR fixes all the little warnings gaudi-installation.rst introduces during documentation build ("WARNING: Title underline too short." etc.)

Upstream merge 24 12 16

### What this PR does / why we need it? Fix bugs of installation doc and format tool. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. Signed-off-by: shen-shanshan <467638484@qq.com>

* enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* layernorm use vllm_xpu_kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update activation kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * try remove ipex Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * switch to xpu kernel for w8a16 gemm (vllm-project#323) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable cutlass chunked-prefill (vllm-project#330) * enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable topk/grouped_gemm based on llama4 (vllm-project#354) * enable topk/grouped_gemm based on llama4 Signed-off-by: Yan Ma <yan.ma@intel.com> * address comments Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable CI * replace lora kernels (vllm-project#347) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove ipex (vllm-project#370) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update QA CI branch * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * fix conflict --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>

* enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* layernorm use vllm_xpu_kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update activation kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * try remove ipex Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * switch to xpu kernel for w8a16 gemm (vllm-project#323) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable cutlass chunked-prefill (vllm-project#330) * enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable topk/grouped_gemm based on llama4 (vllm-project#354) * enable topk/grouped_gemm based on llama4 Signed-off-by: Yan Ma <yan.ma@intel.com> * address comments Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable CI * replace lora kernels (vllm-project#347) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove ipex (vllm-project#370) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update QA CI branch * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * fix conflict --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>

feat: add ChatCompletion endpoint in OpenAI demo server.

5b31792

gesanqiu mentioned this pull request Jul 2, 2023

Support ChatCompletion Endpoint in OpenAI demo server #311

Closed

merrymercy suggested changes Jul 2, 2023

View reviewed changes

vllm/entrypoints/openai/conversation.py Outdated Show resolved Hide resolved

fix: import fast.conversation

a3c8f73

zhuohan123 approved these changes Jul 2, 2023

View reviewed changes

feat: Additional parameters supported by vLLM.

c238171

zhuohan123 merged commit 49b26e2 into vllm-project:main Jul 3, 2023

gesanqiu deleted the chatcompletion branch July 7, 2023 02:49

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

feat: add ChatCompletion endpoint in OpenAI demo server. (vllm-projec…

7ebf67b

…t#330)

yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024

bump version to 0.5.1 (vllm-project#330)

250f6d8

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Sep 30, 2024

Fix doc build warnings (vllm-project#330)

41217cf

This PR fixes all the little warnings gaudi-installation.rst introduces during documentation build ("WARNING: Title underline too short." etc.)

billishyahao pushed a commit to billishyahao/vllm that referenced this pull request Dec 31, 2024

Merge pull request vllm-project#330 from ROCm/upstream_merge_24_12_16

1f0f4c6

Upstream merge 24 12 16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add ChatCompletion endpoint in OpenAI demo server. #330

feat: add ChatCompletion endpoint in OpenAI demo server. #330

Uh oh!

gesanqiu commented Jul 2, 2023 •

edited

Loading

Uh oh!

merrymercy left a comment

Uh oh!

Uh oh!

merrymercy commented Jul 2, 2023 •

edited

Loading

Uh oh!

gesanqiu commented Jul 2, 2023

Uh oh!

zhuohan123 left a comment

Uh oh!

zhuohan123 Jul 2, 2023

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: add ChatCompletion endpoint in OpenAI demo server. #330

feat: add ChatCompletion endpoint in OpenAI demo server. #330

Uh oh!

Conversation

gesanqiu commented Jul 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merrymercy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merrymercy commented Jul 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gesanqiu commented Jul 2, 2023

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

zhuohan123 Jul 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gesanqiu commented Jul 2, 2023 •

edited

Loading

merrymercy commented Jul 2, 2023 •

edited

Loading