[Feature]: Add OpenAI server `prompt_logprobs` support #6508

Theodotus1243 · 2024-07-17T13:32:43Z

🚀 The feature, motivation and pitch

As noted in documentation OpenAI API don't support outputing only one token
But it's a very strong advantage over commertial models
Been able to get logits for prompt tokens

Alternatives

No response

Additional context

No response

gabrielhuang · 2024-07-19T15:21:40Z

Hey I'm doing the same thing, actually, you can try echo=True logprobs=1 and this should return the prompt logprobabilities. You have to disable prompt caching. You may have to use max_tokens=1 as well. Let me know if it works, and what parameters you use in the end.

Theodotus1243 · 2024-07-19T20:20:28Z

@gabrielhuang Thanks, it's working
With vllm 0.5.2 (0.4.0post1 not working)

completion = client.completions.create(
        model=model_name,
        prompt=prompt,
        echo=True,
        logprobs=1,
        max_tokens=1,
    )

completion.choices[0].logprobs.tokens
completion.choices[0].logprobs.token_logprobs

Great workaround to switch to completions api

DarkLight1337 · 2024-07-30T09:54:02Z

Reopening this as the workaround is not really ideal to solve this problem. It would be better to add an option to explicitly return the logprobs of the input prompt.

cjfcsjt · 2024-07-30T12:18:24Z

@DarkLight1337 According to the latest doc and sampling parameters

The chat API allows "echo", "logprobs", "max_tokens" as (extra) parameters, but actually it is not work for Vision LLM (e.g., I tried on served OpenGVLab/InternVL2-4B, which only returns output logprobs, not input.) The following code could reproduce this problem.

from openai import OpenAI
    client = OpenAI(
        base_url="http://localhost:8000/v1",
        api_key="token-abc123",
    )
    # sampling_params = SamplingParams(temperature=0.2, max_tokens=64, prompt_logprobs= 1, stop= ['<|eot_id|>'])
    completion = client.chat.completions.create(
        model="OpenGVLab/InternVL2-8B",
        messages= [
        {
            "role": "system",
            "content": "You are chatting with a language model."
        }
        ],
        extra_body={
            "stop": ['<|eot_id|>'],
            "echo": True,
            "max_tokens": 1,
            "logprobs": 1,
        }
    )

The the latest doc show that completions API does not allow "echo" as extra parameters, but actually it allows, as demonstrated by this issue, and the completions API can provide input prompt logprobs. However, the completions API seems only accept string as the input prompt, then how do we extend this to allow images?

DarkLight1337 · 2024-07-30T12:38:48Z

The chat API allows "echo", "logprobs", "max_tokens" as (extra) parameters, but actually it is not work for Vision LLM (e.g., I tried on served OpenGVLab/InternVL2-4B, which only returns output logprobs, not input.) The following code could reproduce this problem.

There is currently no way to explicitly return logprobs for the input prompt in online inference, which is why I called the above solution a workaround (and reopened this issue). It would be great if this can be enabled similar to the offline SamplingParams.

The the latest doc show that completions API does not allow "echo" as extra parameters, but actually it allows, as demonstrated by this issue, and the completions API can provide input prompt logprobs. However, the completions API seems only accept string as the input prompt, then how do we extend this to allow images?

The Completions API was never made for image input. Since it is now considered legacy by OpenAI, we should focus on adding this feature to Chat Completions API instead.

cjfcsjt · 2024-07-30T12:49:53Z

There is currently no way to explicitly return logprobs for the input prompt in online inference, which is why I called the above solution a workaround (and reopened this issue). It would be great if this can be enabled similar to the offline SamplingParams.

Thanks for your patience. It would be great to see an option to return the logprobs of the input prompt (with image) in online inference.

gnpinkert · 2024-08-05T23:33:48Z

Could I take this issue since it has been reopened? It is unclear if anyone is working on it, so apologies if someone already is. @cjfcsjt @DarkLight1337

DarkLight1337 · 2024-08-06T02:00:25Z

Could I take this issue since it has been reopened? It is unclear if anyone is working on it, so apologies if someone already is. @cjfcsjt @DarkLight1337

Not that I'm aware of. Thanks for helping out!

gnpinkert · 2024-08-06T12:28:05Z

Great, thanks! Will get right on it @DarkLight1337

ashgold · 2024-08-09T07:05:56Z

Hey I'm doing the same thing, actually, you can try echo=True logprobs=1 and this should return the prompt logprobabilities. You have to disable prompt caching. You may have to use max_tokens=1 as well. Let me know if it works, and what parameters you use in the end.

Hi @gabrielhuang ,
Why should I disable prompt caching?
if the prompt changed by even one character, the engine would only utilise the kv up to the block before that, so I thought I could enable prefix caching.

This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to true, it will return the log probabilities of the decoded input tokens. This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful. Currently, the server will report an error if stream and prompt_logprobs are both enabled. The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline. It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations. FIX vllm-project#6508

…ect#6508

This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to a value higher than 0, the response will return the log probabilities of the decoded input tokens. The same option has been included for the completions API. Note that the prompt_logprobs will be included for every prompt that the completions request contains. This is why the prompt_logprompts in the completions response in nested further than in the chat completions response. This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful. Currently, the server will report an error if stream is enabled and prompt_logprobs is set to a value higher than 0. The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline. It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations. FIX vllm-project#6508

vllm-project#7453)

vllm-project#7453) Signed-off-by: Alvant <alvasian@yandex.ru>

vllm-project#7453)

Theodotus1243 added the feature request label Jul 17, 2024

Theodotus1243 closed this as completed Jul 19, 2024

cjfcsjt mentioned this issue Jul 28, 2024

[Model] Adding support for MiniCPM-V #4087

Merged

DarkLight1337 mentioned this issue Jul 30, 2024

[Bug]: Vllm api server does not receive supported parameter truncate_prompt_tokens #6890

Closed

DarkLight1337 reopened this Jul 30, 2024

DarkLight1337 added the good first issue Good for newcomers label Jul 30, 2024

gnpinkert mentioned this issue Aug 13, 2024

[Feature]: Add OpenAI server prompt_logprobs support #6508 #7453

Merged

gnpinkert added a commit to gnpinkert/vllm that referenced this issue Aug 15, 2024

fixup! [Feature]: Add OpenAI server prompt_logprobs support vllm-proj…

9e6a49e

…ect#6508

gnpinkert added a commit to gnpinkert/vllm that referenced this issue Aug 15, 2024

fixup! [Feature]: Add OpenAI server prompt_logprobs support vllm-proj…

b1f117a

…ect#6508

gnpinkert added a commit to gnpinkert/vllm that referenced this issue Aug 15, 2024

fixup! [Feature]: Add OpenAI server prompt_logprobs support vllm-proj…

23fc4b3

…ect#6508

DarkLight1337 pushed a commit that referenced this issue Aug 16, 2024

[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453)

f878c8f

DarkLight1337 closed this as completed in #7453 Aug 16, 2024

kylesayrs pushed a commit to neuralmagic/vllm that referenced this issue Aug 17, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

d572f00

vllm-project#7453)

zifeitong pushed a commit to zifeitong/vllm that referenced this issue Aug 20, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

5682e14

vllm-project#7453)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this issue Aug 22, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

d4b3b98

vllm-project#7453)

omrishiv pushed a commit to omrishiv/vllm that referenced this issue Aug 26, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

7eb47e4

vllm-project#7453)

Alvant pushed a commit to compressa-ai/vllm that referenced this issue Oct 26, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

8a93a6e

vllm-project#7453) Signed-off-by: Alvant <alvasian@yandex.ru>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this issue Nov 20, 2024

[Feature]: Add OpenAI server prompt_logprobs support vllm-project#6508 (

fafeec2

vllm-project#7453)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add OpenAI server `prompt_logprobs` support #6508

[Feature]: Add OpenAI server `prompt_logprobs` support #6508

Theodotus1243 commented Jul 17, 2024

gabrielhuang commented Jul 19, 2024

Theodotus1243 commented Jul 19, 2024

DarkLight1337 commented Jul 30, 2024

cjfcsjt commented Jul 30, 2024 •

edited

Loading

DarkLight1337 commented Jul 30, 2024 •

edited

Loading

cjfcsjt commented Jul 30, 2024

gnpinkert commented Aug 5, 2024 •

edited

Loading

DarkLight1337 commented Aug 6, 2024

gnpinkert commented Aug 6, 2024

ashgold commented Aug 9, 2024 •

edited

Loading

[Feature]: Add OpenAI server prompt_logprobs support #6508

[Feature]: Add OpenAI server prompt_logprobs support #6508

Comments

Theodotus1243 commented Jul 17, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

gabrielhuang commented Jul 19, 2024

Theodotus1243 commented Jul 19, 2024

DarkLight1337 commented Jul 30, 2024

cjfcsjt commented Jul 30, 2024 • edited Loading

DarkLight1337 commented Jul 30, 2024 • edited Loading

cjfcsjt commented Jul 30, 2024

gnpinkert commented Aug 5, 2024 • edited Loading

DarkLight1337 commented Aug 6, 2024

gnpinkert commented Aug 6, 2024

ashgold commented Aug 9, 2024 • edited Loading

[Feature]: Add OpenAI server `prompt_logprobs` support #6508

[Feature]: Add OpenAI server `prompt_logprobs` support #6508

cjfcsjt commented Jul 30, 2024 •

edited

Loading

DarkLight1337 commented Jul 30, 2024 •

edited

Loading

gnpinkert commented Aug 5, 2024 •

edited

Loading

ashgold commented Aug 9, 2024 •

edited

Loading