Attempt to pipe logit_bias to sampler's embedding_bias #1279

benbot · 2023-10-06T20:52:10Z

logit_bias is an important feature of the OpenAI API that vllm seems to have implemented but not exposed to the actual api.

This is me taking a crack on exposing that functionality.

For the life of me, I can't get my cuda versions to all agree to build this locally, so while I try to do that i'm opening the PR for others to try out.

Should resolve: #379

benbot · 2023-10-06T20:53:37Z

I think i'm passing an array of arrays of logit biases, so that will probably need to be changed.
But i can't get vllm to build locally yet, so I can't verify

viktor-ferenczi · 2023-10-29T14:14:50Z

Please rebase this branch to current main, then I will build and test this. I also want to get LMQL working with vLLM, because performance is not good with any of the other backends.

benbot · 2023-11-01T09:11:19Z

Sure thing. I'll rebase tomorrow.
:)

benbot · 2023-11-01T14:22:03Z

No idea how this got closed. Reopened and rebased.

Finally got it built too!
Haven't had a chance to test if logit bias works yet though

viktor-ferenczi

Looks good, added minor comments.

The client must know the vocabulary and the vocab_size in order to pass a logits_bias array which works with the model loaded. Right now the client has to load the tokenizer corresponding to the model loaded into the vLLM server to achieve this. It makes clients more complex and there is a risk of using the wrong tokenizer for the model, ending up in errors.

I suggest exposing two new REST API calls:

vocabulary Returns the vocabulary (array of strings). The vocab_size is the length of this array. It may also return the special tokens, stop tokens, start/end of conversation tokens, etc.
tokenize Accepts text and returns an array of integers (token values).

This way no tokenizer needs to be built on client side and no way to use the wrong vocabulary. The client can build the logits_bias based on the information returned (the client must acquire the vocabulary once on initialization).

Having these would allow for a pretty straightforward test case and easier integration with LMQL.

viktor-ferenczi · 2023-11-03T07:36:19Z

vllm/sampling_params.py

        stop_token_ids: Optional[List[int]] = None,
        ignore_eos: bool = False,
        max_tokens: int = 16,
+        logit_bias: float = [],


Proper type is: Optional[List[float]]

viktor-ferenczi · 2023-11-03T07:36:39Z

vllm/sampling_params.py

        self.max_tokens = max_tokens
        self.logprobs = logprobs
        self.prompt_logprobs = prompt_logprobs
+        self.logit_bias = logit_bias


Define type here as well

Also add self.logit_bias to __repr__ below.

viktor-ferenczi · 2023-11-03T08:25:21Z

vllm/model_executor/layers/sampler.py

+    logit_biases: any = []
+    for seq_group in input_metadata.seq_groups:
+        set_ids, sampling_params = seq_group
+        logit_biases += [sampling_params.logit_bias]


Add validation of the size of the logit_bias array received from sampling_params. Having an explicit error with a clear explanation here is better than ending up with a cryptic PyTorch error message while trying to add arrays of mismatching sizes later.

(It cannot be validated inside SamplingParams, because the right size is not known there.)

benbot · 2023-11-05T16:47:25Z

@viktor-ferenczi I've been trying to test this locally, but it doesn't seem like the logit_bias parameter is actually working :(

Were you able to see the logit_bias taking effect?

benbot · 2023-11-06T01:07:35Z

The client must know the vocabulary and the vocab_size in order to pass a logits_bias array which works with the model loaded.

@viktor-ferenczi I may be missing something, but why do they need to know the vocab size? Isn't logit_bias a sparse mapping of token ids to bias to apply?

viktor-ferenczi · 2023-11-09T11:25:54Z

Related to:

viktor-ferenczi · 2023-11-09T11:29:49Z

The client must know the vocabulary and the vocab_size in order to pass a logits_bias array which works with the model loaded.

@viktor-ferenczi I may be missing something, but why do they need to know the vocab size? Isn't logit_bias a sparse mapping of token ids to bias to apply?

I did not know it is a sparse mapping, I expected it to be an array of floats with vocab_size items.

Anyway, the client must be able to retrieve the vocabulary of the currently loaded model for the solution to be fully usable. Otherwise the client must instantiate the tokenizer of the exact same model loaded into the vLLM server just to get the vocabulary, which means a lot more dependencies and room for error, not to mention the added loading time.

benbot · 2023-11-20T02:07:50Z

Looks like on OAI's API it's a map of token_ids to bias values

https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias

creatorrr · 2024-01-12T16:10:55Z

@viktor-ferenczi @benbot did you guys come to a consensus on how to go about implementing this? Loading the tokenizer vocab will be tricky but then again this feature is meant for rather advanced use cases and we could just leave it to the api consumer to figure out the mapping between token_ids and tokens (that's what openai did anyway).

viktor-ferenczi · 2024-01-12T17:51:27Z

There is a logits_processors parameter in SamplingParams already. It was added by #1469 which has been merged to main already.

viktor-ferenczi · 2024-01-12T17:59:20Z

Depending on your use case using a grammar would be a much cleaner alternative than trying to manipulate logits, it also moves the problem of handling tokens into the server completely.

See #2105: Add grammars

benbot · 2024-03-20T18:06:24Z

Covered by #3027

Bringing back option of 256 as possible block-size arg value, that has been lost in some of the last rebases. It has been first added via HabanaAI#971 The options of arguments are now defined by unpacking predefined type hints ![image](https://github.com/user-attachments/assets/f1c0429b-6449-44a5-b0e8-326f465590ab) https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/engine/arg_utils.py#L611

benbot closed this Nov 1, 2023

benbot force-pushed the main branch from dbfda99 to 5687d58 Compare November 1, 2023 13:17

pipes logit_bias to sampler's embedding_bias

1c0224e

benbot reopened this Nov 1, 2023

viktor-ferenczi suggested changes Nov 3, 2023

View reviewed changes

viktor-ferenczi mentioned this pull request Nov 9, 2023

Support for Constrained decoding #288

Closed

benbot closed this Mar 20, 2024

Uh oh!

Attempt to pipe logit_bias to sampler's embedding_bias #1279

Attempt to pipe logit_bias to sampler's embedding_bias #1279

Uh oh!

Conversation

benbot commented Oct 6, 2023

Uh oh!

benbot commented Oct 6, 2023

Uh oh!

viktor-ferenczi commented Oct 29, 2023

Uh oh!

benbot commented Nov 1, 2023

Uh oh!

benbot commented Nov 1, 2023

Uh oh!

viktor-ferenczi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viktor-ferenczi Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

viktor-ferenczi Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

viktor-ferenczi Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viktor-ferenczi Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benbot commented Nov 5, 2023

Uh oh!

benbot commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viktor-ferenczi commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viktor-ferenczi commented Nov 9, 2023

Uh oh!

benbot commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

creatorrr commented Jan 12, 2024

Uh oh!

viktor-ferenczi commented Jan 12, 2024

Uh oh!

viktor-ferenczi commented Jan 12, 2024

Uh oh!

benbot commented Mar 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viktor-ferenczi left a comment •

edited

Loading

viktor-ferenczi Nov 3, 2023 •

edited

Loading

viktor-ferenczi Nov 3, 2023 •

edited

Loading

benbot commented Nov 6, 2023 •

edited

Loading

viktor-ferenczi commented Nov 9, 2023 •

edited

Loading

benbot commented Nov 20, 2023 •

edited

Loading