Skip to content

Conversation

@ironpinguin
Copy link
Contributor

The langchain implementation sends the prompt as an array of strings to the /v1/completions endpoint.

With this change, it is possible to use a simple string or an array of strings to send the prompt.

If the prompt is an array, then we concatenate all strings to one string. And the following engine will work with both prompt data types.

This is a solution for #186

@zhuohan123
Copy link
Member

Hi @ironpinguin! Thanks for the contribution! However, I believe this is not how OpenAI API behaves. Can you take a look at the example below?

import openai

completion = openai.Completion.create(
    model="text-davinci-003", prompt=["Say", "this", "is", "a", "test"], echo=True, n=1,
    stream=stream)

print(completion)

Output:

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": "Say(\"T\");\n\t\t\tSingSay(\"S\");\n\t\t\t"
    },
    {
      "finish_reason": "length",
      "index": 1,
      "logprobs": null,
      "text": "this->get_setting_value('show_tax_totals_in"
    },
    {
      "finish_reason": "length",
      "index": 2,
      "logprobs": null,
      "text": "is_eof()) {\n\t\t<*ddc>\n\t\t"
    },
    {
      "finish_reason": "length",
      "index": 3,
      "logprobs": null,
      "text": "aient \u00e0 tous moment (avec leur million de pi\u00e8ces en"
    },
    {
      "finish_reason": "length",
      "index": 4,
      "logprobs": null,
      "text": "test.mp3','rb') as f: #rb \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd \ufffd\ufffd\ufffd"
    }
  ],
  "created": 1688142843,
  "id": "cmpl-7XBMJLEFOhWfBg5ngxoMBiD5vMSzh",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 79,
    "prompt_tokens": 5,
    "total_tokens": 84
  }
}

OpenAI API is treating the strings in the list as separate prompts.

As a temp fix, when the request.prompt is a list, can you only proceed when its length is 1?

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution!

@zhuohan123 zhuohan123 merged commit 0bd2a57 into vllm-project:main Jul 3, 2023
@XBeg9
Copy link

XBeg9 commented Nov 20, 2023

Hi, just want to be sure that it's on my side and it's not a regression. Trying to use langchain with vllm and I am getting exactly this problem:

from langchain.llms import VLLMOpenAI

llm = VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base="http://localhost:8000/v1",
    model_name="TheBloke/Llama-2-70B-chat-AWQ",
    model_kwargs={"stop": ["."]},
)
print(llm("Rome is"))

with response Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Invalid response object from API: '{"detail":[{"loc":["body","prompt"],"msg":"str type expected","type":"type_error.str"}]}' (HTTP response code was 422).

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
…pletions (vllm-project#323)

* allow str or List[str] for prompt

* Update vllm/entrypoints/openai/api_server.py

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Sep 24, 2024
Continuation of HabanaAI/vllm-hpu-extension#4

I've also removed is_tpu, as it got mistakenly restored in the rebase.
It's not in the upstream.
yma11 pushed a commit to yma11/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
jikunshang added a commit to jikunshang/vllm that referenced this pull request Sep 16, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
yma11 pushed a commit to yma11/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
yma11 pushed a commit to yma11/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
jikunshang added a commit to jikunshang/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
jikunshang added a commit to jikunshang/vllm that referenced this pull request Oct 27, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
yma11 added a commit to yma11/vllm that referenced this pull request Nov 5, 2025
* layernorm use vllm_xpu_kernels

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* update activation kernels

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* try remove ipex

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* switch to xpu kernel for w8a16 gemm (vllm-project#323)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* enable cutlass chunked-prefill (vllm-project#330)

* enable cutlass chunked-prefill

Signed-off-by: Yan Ma <yan.ma@intel.com>

* add required pkg for xpu-kernels compilation

Signed-off-by: Yan Ma <yan.ma@intel.com>

---------

Signed-off-by: Yan Ma <yan.ma@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* enable topk/grouped_gemm based on llama4 (vllm-project#354)

* enable topk/grouped_gemm based on llama4

Signed-off-by: Yan Ma <yan.ma@intel.com>

* address comments

Signed-off-by: Yan Ma <yan.ma@intel.com>

---------

Signed-off-by: Yan Ma <yan.ma@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* enable CI

* replace lora kernels (vllm-project#347)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* remove ipex (vllm-project#370)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* update QA CI branch

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* fix conflict

---------

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>
yma11 pushed a commit to yma11/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
yma11 added a commit to yma11/vllm that referenced this pull request Nov 10, 2025
* layernorm use vllm_xpu_kernels

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* update activation kernels

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* try remove ipex

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* switch to xpu kernel for w8a16 gemm (vllm-project#323)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* enable cutlass chunked-prefill (vllm-project#330)

* enable cutlass chunked-prefill

Signed-off-by: Yan Ma <yan.ma@intel.com>

* add required pkg for xpu-kernels compilation

Signed-off-by: Yan Ma <yan.ma@intel.com>

---------

Signed-off-by: Yan Ma <yan.ma@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* enable topk/grouped_gemm based on llama4 (vllm-project#354)

* enable topk/grouped_gemm based on llama4

Signed-off-by: Yan Ma <yan.ma@intel.com>

* address comments

Signed-off-by: Yan Ma <yan.ma@intel.com>

---------

Signed-off-by: Yan Ma <yan.ma@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* enable CI

* replace lora kernels (vllm-project#347)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* remove ipex (vllm-project#370)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* update QA CI branch

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* update QA CI yaml

* fix conflict

---------

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants