-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Feature/vllm/input embedding completion api #17590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/vllm/input embedding completion api #17590
Conversation
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Nan2018 <nan@protopia.ai> Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
…mpty tensors instead of none Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
…oid having two vLLM instances in memory at once Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
…ion endpoint while remaining type safe for non-completions endpoints Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
@DarkLight1337 any ideas about this? Do you think it is a blocker for this pr? |
|
I'm fine with not supporting LoRA for now, unless LoRA is a very important use case for this. |
|
Can you add an example script to the documentation for both offline and online inference? |
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
… engine is chosen implicitly Signed-off-by: Andrew Sansom <andrew@protopia.ai>
I don't think this is an important use case at this time. I think it only came up because the existing completion tests checked for LoRA compatibility and @Nan2018 tried to use both of them together.
I added the |
|
Yeah they should be added automatically |
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
@DarkLight1337 It looks like docs build timed out. All of the fast checks are passing. I do think this PR is ready for review. Thanks for your help with this! |
|
Regarding the subprocess issue, it may be related to #18308 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this first though
|
@DarkLight1337 will this make it into the v0.9.0 release? |
|
Yes |
|
|
||
|
|
||
| @pytest.fixture(scope="module") | ||
| def zephyr_lora_added_tokens_files(zephyr_lora_files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this lora module used for?
Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Andrew Sansom <andrew@protopia.ai> Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
adds support for passing prompt_embeds as b64 encoded bytes to the completions api.
Start the server with
query example:
Note
this does not work with lora or prompt adapters