Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add OpenAI server prompt_logprobs support #6508

Closed
Theodotus1243 opened this issue Jul 17, 2024 · 10 comments · Fixed by #7453
Closed

[Feature]: Add OpenAI server prompt_logprobs support #6508

Theodotus1243 opened this issue Jul 17, 2024 · 10 comments · Fixed by #7453
Labels

Comments

@Theodotus1243
Copy link

🚀 The feature, motivation and pitch

As noted in documentation OpenAI API don't support outputing only one token
But it's a very strong advantage over commertial models
Been able to get logits for prompt tokens

Alternatives

No response

Additional context

No response

@gabrielhuang
Copy link

Hey I'm doing the same thing, actually, you can try echo=True logprobs=1 and this should return the prompt logprobabilities. You have to disable prompt caching. You may have to use max_tokens=1 as well. Let me know if it works, and what parameters you use in the end.

@Theodotus1243
Copy link
Author

@gabrielhuang Thanks, it's working
With vllm 0.5.2 (0.4.0post1 not working)

completion = client.completions.create(
        model=model_name,
        prompt=prompt,
        echo=True,
        logprobs=1,
        max_tokens=1,
    )

completion.choices[0].logprobs.tokens
completion.choices[0].logprobs.token_logprobs

Great workaround to switch to completions api

@DarkLight1337
Copy link
Member

Reopening this as the workaround is not really ideal to solve this problem. It would be better to add an option to explicitly return the logprobs of the input prompt.

@DarkLight1337 DarkLight1337 reopened this Jul 30, 2024
@DarkLight1337 DarkLight1337 added the good first issue Good for newcomers label Jul 30, 2024
@cjfcsjt
Copy link

cjfcsjt commented Jul 30, 2024

@DarkLight1337 According to the latest doc and sampling parameters

  1. The chat API allows "echo", "logprobs", "max_tokens" as (extra) parameters, but actually it is not work for Vision LLM (e.g., I tried on served OpenGVLab/InternVL2-4B, which only returns output logprobs, not input.) The following code could reproduce this problem.
from openai import OpenAI
    client = OpenAI(
        base_url="http://localhost:8000/v1",
        api_key="token-abc123",
    )
    # sampling_params = SamplingParams(temperature=0.2, max_tokens=64, prompt_logprobs= 1, stop= ['<|eot_id|>'])
    completion = client.chat.completions.create(
        model="OpenGVLab/InternVL2-8B",
        messages= [
        {
            "role": "system",
            "content": "You are chatting with a language model."
        }
        ],
        extra_body={
            "stop": ['<|eot_id|>'],
            "echo": True,
            "max_tokens": 1,
            "logprobs": 1,
        }
    )
  1. The the latest doc show that completions API does not allow "echo" as extra parameters, but actually it allows, as demonstrated by this issue, and the completions API can provide input prompt logprobs. However, the completions API seems only accept string as the input prompt, then how do we extend this to allow images?

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jul 30, 2024

  1. The chat API allows "echo", "logprobs", "max_tokens" as (extra) parameters, but actually it is not work for Vision LLM (e.g., I tried on served OpenGVLab/InternVL2-4B, which only returns output logprobs, not input.) The following code could reproduce this problem.

There is currently no way to explicitly return logprobs for the input prompt in online inference, which is why I called the above solution a workaround (and reopened this issue). It would be great if this can be enabled similar to the offline SamplingParams.

  1. The the latest doc show that completions API does not allow "echo" as extra parameters, but actually it allows, as demonstrated by this issue, and the completions API can provide input prompt logprobs. However, the completions API seems only accept string as the input prompt, then how do we extend this to allow images?

The Completions API was never made for image input. Since it is now considered legacy by OpenAI, we should focus on adding this feature to Chat Completions API instead.

@cjfcsjt
Copy link

cjfcsjt commented Jul 30, 2024

There is currently no way to explicitly return logprobs for the input prompt in online inference, which is why I called the above solution a workaround (and reopened this issue). It would be great if this can be enabled similar to the offline SamplingParams.

Thanks for your patience. It would be great to see an option to return the logprobs of the input prompt (with image) in online inference.

@gnpinkert
Copy link
Contributor

gnpinkert commented Aug 5, 2024

Could I take this issue since it has been reopened? It is unclear if anyone is working on it, so apologies if someone already is. @cjfcsjt @DarkLight1337

@DarkLight1337
Copy link
Member

Could I take this issue since it has been reopened? It is unclear if anyone is working on it, so apologies if someone already is. @cjfcsjt @DarkLight1337

Not that I'm aware of. Thanks for helping out!

@gnpinkert
Copy link
Contributor

Great, thanks! Will get right on it @DarkLight1337

@ashgold
Copy link

ashgold commented Aug 9, 2024

Hey I'm doing the same thing, actually, you can try echo=True logprobs=1 and this should return the prompt logprobabilities. You have to disable prompt caching. You may have to use max_tokens=1 as well. Let me know if it works, and what parameters you use in the end.

Hi @gabrielhuang ,
Why should I disable prompt caching?
if the prompt changed by even one character, the engine would only utilise the kv up to the block before that, so I thought I could enable prefix caching.

gnpinkert added a commit to gnpinkert/vllm that referenced this issue Aug 13, 2024
This commit adds a prompt_logprobs option in the extra body field of the
chat completions API. When set to true, it will return the log
probabilities of the decoded input tokens.

This option was not included in the streaming API. This decision was made
since streaming is meant for real time feedback with reduced latency, it
doesn't make much sense to include the same prompt log probabilities every
single time. This can be included if that is also deemed to be useful.

Currently, the server will report an error if stream and prompt_logprobs
are both enabled.

The return value in the chat completions API was modeled after the
prompt_logprobs return value during offline inference to reduce coding
complexity if switching between online/offline.

It was possible to get the prompt_logprobs earlier if echo and
top_logprobs were enabled. This behavior was kept the same to not break
any existing configurations.

FIX vllm-project#6508
gnpinkert added a commit to gnpinkert/vllm that referenced this issue Aug 16, 2024
This commit adds a prompt_logprobs option in the extra body field of the
chat completions API. When set to a value higher than 0, the response
will return the log probabilities of the decoded input tokens.

The same option has been included for the completions API. Note that the
prompt_logprobs will be included for every prompt that the completions
request contains. This is why the prompt_logprompts in the completions
response in nested further than in the chat completions response.

This option was not included in the streaming API. This decision was made
since streaming is meant for real time feedback with reduced latency, it
doesn't make much sense to include the same prompt log probabilities every
single time. This can be included if that is also deemed to be useful.

Currently, the server will report an error if stream is enabled and
prompt_logprobs is set to a value higher than 0.

The return value in the chat completions API was modeled after the
prompt_logprobs return value during offline inference to reduce coding
complexity if switching between online/offline.

It was possible to get the prompt_logprobs earlier if echo and
top_logprobs were enabled. This behavior was kept the same to not break
any existing configurations.

FIX vllm-project#6508
Alvant pushed a commit to compressa-ai/vllm that referenced this issue Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants