-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add OpenAI server prompt_logprobs
support
#6508
Comments
Hey I'm doing the same thing, actually, you can try |
@gabrielhuang Thanks, it's working
Great workaround to switch to completions api |
Reopening this as the workaround is not really ideal to solve this problem. It would be better to add an option to explicitly return the logprobs of the input prompt. |
@DarkLight1337 According to the latest doc and sampling parameters
|
There is currently no way to explicitly return logprobs for the input prompt in online inference, which is why I called the above solution a workaround (and reopened this issue). It would be great if this can be enabled similar to the offline
The Completions API was never made for image input. Since it is now considered legacy by OpenAI, we should focus on adding this feature to Chat Completions API instead. |
Thanks for your patience. It would be great to see an option to return the logprobs of the input prompt (with image) in online inference. |
Could I take this issue since it has been reopened? It is unclear if anyone is working on it, so apologies if someone already is. @cjfcsjt @DarkLight1337 |
Not that I'm aware of. Thanks for helping out! |
Great, thanks! Will get right on it @DarkLight1337 |
Hi @gabrielhuang , |
This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to true, it will return the log probabilities of the decoded input tokens. This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful. Currently, the server will report an error if stream and prompt_logprobs are both enabled. The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline. It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations. FIX vllm-project#6508
This commit adds a prompt_logprobs option in the extra body field of the chat completions API. When set to a value higher than 0, the response will return the log probabilities of the decoded input tokens. The same option has been included for the completions API. Note that the prompt_logprobs will be included for every prompt that the completions request contains. This is why the prompt_logprompts in the completions response in nested further than in the chat completions response. This option was not included in the streaming API. This decision was made since streaming is meant for real time feedback with reduced latency, it doesn't make much sense to include the same prompt log probabilities every single time. This can be included if that is also deemed to be useful. Currently, the server will report an error if stream is enabled and prompt_logprobs is set to a value higher than 0. The return value in the chat completions API was modeled after the prompt_logprobs return value during offline inference to reduce coding complexity if switching between online/offline. It was possible to get the prompt_logprobs earlier if echo and top_logprobs were enabled. This behavior was kept the same to not break any existing configurations. FIX vllm-project#6508
vllm-project#7453) Signed-off-by: Alvant <alvasian@yandex.ru>
🚀 The feature, motivation and pitch
As noted in documentation OpenAI API don't support outputing only one token
But it's a very strong advantage over commertial models
Been able to get logits for prompt tokens
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: