Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323

ironpinguin · 2023-06-30T13:36:13Z

The langchain implementation sends the prompt as an array of strings to the /v1/completions endpoint.

With this change, it is possible to use a simple string or an array of strings to send the prompt.

If the prompt is an array, then we concatenate all strings to one string. And the following engine will work with both prompt data types.

This is a solution for #186

zhuohan123 · 2023-06-30T22:44:54Z

Hi @ironpinguin! Thanks for the contribution! However, I believe this is not how OpenAI API behaves. Can you take a look at the example below?

import openai

completion = openai.Completion.create(
    model="text-davinci-003", prompt=["Say", "this", "is", "a", "test"], echo=True, n=1,
    stream=stream)

print(completion)

Output:

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": "Say(\"T\");\n\t\t\tSingSay(\"S\");\n\t\t\t"
    },
    {
      "finish_reason": "length",
      "index": 1,
      "logprobs": null,
      "text": "this->get_setting_value('show_tax_totals_in"
    },
    {
      "finish_reason": "length",
      "index": 2,
      "logprobs": null,
      "text": "is_eof()) {\n\t\t<*ddc>\n\t\t"
    },
    {
      "finish_reason": "length",
      "index": 3,
      "logprobs": null,
      "text": "aient \u00e0 tous moment (avec leur million de pi\u00e8ces en"
    },
    {
      "finish_reason": "length",
      "index": 4,
      "logprobs": null,
      "text": "test.mp3','rb') as f: #rb \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd \ufffd\ufffd\ufffd"
    }
  ],
  "created": 1688142843,
  "id": "cmpl-7XBMJLEFOhWfBg5ngxoMBiD5vMSzh",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 79,
    "prompt_tokens": 5,
    "total_tokens": 84
  }
}

OpenAI API is treating the strings in the list as separate prompts.

As a temp fix, when the request.prompt is a list, can you only proceed when its length is 1?

vllm/entrypoints/openai/api_server.py

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

zhuohan123

LGTM! Thank you for your contribution!

XBeg9 · 2023-11-20T21:52:16Z

Hi, just want to be sure that it's on my side and it's not a regression. Trying to use langchain with vllm and I am getting exactly this problem:

from langchain.llms import VLLMOpenAI

llm = VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base="http://localhost:8000/v1",
    model_name="TheBloke/Llama-2-70B-chat-AWQ",
    model_kwargs={"stop": ["."]},
)
print(llm("Rome is"))

with response Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Invalid response object from API: '{"detail":[{"loc":["body","prompt"],"msg":"str type expected","type":"type_error.str"}]}' (HTTP response code was 422).

…pletions (vllm-project#323) * allow str or List[str] for prompt * Update vllm/entrypoints/openai/api_server.py Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

Continuation of HabanaAI/vllm-hpu-extension#4 I've also removed is_tpu, as it got mistakenly restored in the rebase. It's not in the upstream.

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* layernorm use vllm_xpu_kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update activation kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * try remove ipex Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * switch to xpu kernel for w8a16 gemm (vllm-project#323) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable cutlass chunked-prefill (vllm-project#330) * enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable topk/grouped_gemm based on llama4 (vllm-project#354) * enable topk/grouped_gemm based on llama4 Signed-off-by: Yan Ma <yan.ma@intel.com> * address comments Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable CI * replace lora kernels (vllm-project#347) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove ipex (vllm-project#370) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update QA CI branch * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * fix conflict --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* layernorm use vllm_xpu_kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update activation kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * try remove ipex Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * switch to xpu kernel for w8a16 gemm (vllm-project#323) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable cutlass chunked-prefill (vllm-project#330) * enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable topk/grouped_gemm based on llama4 (vllm-project#354) * enable topk/grouped_gemm based on llama4 Signed-off-by: Yan Ma <yan.ma@intel.com> * address comments Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable CI * replace lora kernels (vllm-project#347) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove ipex (vllm-project#370) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update QA CI branch * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * fix conflict --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>

allow str or List[str] for prompt

5745a1f

zhuohan123 requested changes Jun 30, 2023

View reviewed changes

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

Update vllm/entrypoints/openai/api_server.py

5cec010

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

zhuohan123 approved these changes Jul 3, 2023

View reviewed changes

zhuohan123 merged commit 0bd2a57 into vllm-project:main Jul 3, 2023

zhuohan123 mentioned this pull request Jul 3, 2023

[Fix] Better error message for batched prompts #342

Merged

XBeg9 mentioned this pull request Nov 20, 2023

Langchain integration ray-project/ray-llm#96

Closed

drbh mentioned this pull request Apr 1, 2024

OpenAI API support - Langchain passes prompt as a list instead of str huggingface/text-generation-inference#1690

Closed

4 tasks

dvirmor mentioned this pull request Apr 1, 2024

Langchain passes prompt as a list instead of str in Retrieval Chain to Openai API langchain-ai/langchain#19872

Closed

5 tasks

yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024

[ README ] Update README.md (vllm-project#323)

046eb08

yma11 pushed a commit to yma11/vllm that referenced this pull request Sep 9, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

0b056df

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

jikunshang added a commit to jikunshang/vllm that referenced this pull request Sep 16, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

bc89d09

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

yma11 pushed a commit to yma11/vllm that referenced this pull request Sep 25, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

45c0786

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

yma11 pushed a commit to yma11/vllm that referenced this pull request Oct 10, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

93b9570

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

jikunshang added a commit to jikunshang/vllm that referenced this pull request Oct 11, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

1f878bb

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

jikunshang added a commit to jikunshang/vllm that referenced this pull request Oct 27, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

3d16a2d

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

yma11 pushed a commit to yma11/vllm that referenced this pull request Nov 10, 2025

switch to xpu kernel for w8a16 gemm (vllm-project#323)

43eb34b

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323

Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323

Uh oh!

ironpinguin commented Jun 30, 2023

Uh oh!

zhuohan123 commented Jun 30, 2023

Uh oh!

Uh oh!

zhuohan123 left a comment

Uh oh!

XBeg9 commented Nov 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323

Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323

Uh oh!

Conversation

ironpinguin commented Jun 30, 2023

Uh oh!

zhuohan123 commented Jun 30, 2023

Uh oh!

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

XBeg9 commented Nov 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants