-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow send list of str for the Prompt on openai demo endpoint /v1/completions #323
Conversation
|
Hi @ironpinguin! Thanks for the contribution! However, I believe this is not how OpenAI API behaves. Can you take a look at the example below? import openai
completion = openai.Completion.create(
model="text-davinci-003", prompt=["Say", "this", "is", "a", "test"], echo=True, n=1,
stream=stream)
print(completion)Output: OpenAI API is treating the strings in the list as separate prompts. As a temp fix, when the |
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
zhuohan123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your contribution!
|
Hi, just want to be sure that it's on my side and it's not a regression. Trying to use langchain with vllm and I am getting exactly this problem: from langchain.llms import VLLMOpenAI
llm = VLLMOpenAI(
openai_api_key="EMPTY",
openai_api_base="http://localhost:8000/v1",
model_name="TheBloke/Llama-2-70B-chat-AWQ",
model_kwargs={"stop": ["."]},
)
print(llm("Rome is"))with response |
…pletions (vllm-project#323) * allow str or List[str] for prompt * Update vllm/entrypoints/openai/api_server.py Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> --------- Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Continuation of HabanaAI/vllm-hpu-extension#4 I've also removed is_tpu, as it got mistakenly restored in the rebase. It's not in the upstream.
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
* layernorm use vllm_xpu_kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update activation kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * try remove ipex Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * switch to xpu kernel for w8a16 gemm (vllm-project#323) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable cutlass chunked-prefill (vllm-project#330) * enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable topk/grouped_gemm based on llama4 (vllm-project#354) * enable topk/grouped_gemm based on llama4 Signed-off-by: Yan Ma <yan.ma@intel.com> * address comments Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable CI * replace lora kernels (vllm-project#347) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove ipex (vllm-project#370) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update QA CI branch * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * fix conflict --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
* layernorm use vllm_xpu_kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ww34] switch silu_and_mul, reshape_and_cache_flash, rope to xpu kernel Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update activation kernels Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * try remove ipex Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * switch to xpu kernel for w8a16 gemm (vllm-project#323) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable cutlass chunked-prefill (vllm-project#330) * enable cutlass chunked-prefill Signed-off-by: Yan Ma <yan.ma@intel.com> * add required pkg for xpu-kernels compilation Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable topk/grouped_gemm based on llama4 (vllm-project#354) * enable topk/grouped_gemm based on llama4 Signed-off-by: Yan Ma <yan.ma@intel.com> * address comments Signed-off-by: Yan Ma <yan.ma@intel.com> --------- Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * enable CI * replace lora kernels (vllm-project#347) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * remove ipex (vllm-project#370) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * update QA CI branch * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * update QA CI yaml * fix conflict --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Liu, Wenjun <wenjun.liu@intel.com>
The langchain implementation sends the prompt as an array of strings to the /v1/completions endpoint.
With this change, it is possible to use a simple string or an array of strings to send the prompt.
If the prompt is an array, then we concatenate all strings to one string. And the following engine will work with both prompt data types.
This is a solution for #186