-
-
Couldn't load subscription status.
- Fork 10.8k
[Frontend] support matryoshka representation / support embedding API dimensions #16331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Currently the following tests can be passed locally:
potential problems:
|
|
Pass is_matryoshka to PoolerHead via pooler_config to increase logic clarity Now the logic is: Fully controlled by is_matryoshka, always do normalize when is_matryoshka |
|
Split the normalize and the change the output dimension how about now |
|
I have closed all previous versions of conversation. Do you have any suggestions to the latest version? |
|
how about “overwrite normalize in _init_pooler_config” |
I think we should make this a user-facing error instead of silently overwriting the user's configuration. |
|
raise ValueError when is matryoshka and normalize being disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks for bearing with me!
|
Thanks for reviewing I think I should submit more code to the open source community to improve my coding skills. |
|
We should also update the docs for Pooling Models to tell users how to use |
Head branch was pushed to by a user without write access
I will try. |
|
CI is stuck Every time pr go to the CI stage, I don’t know what to do |
|
Force merging |
QvQ |
Pass pooling_metadata to pooler head in gritlm. This was broken by PR vllm-project#16331 broke gritlm. PR vllm-project#14516 broke gritlm tests due to changing xformers to flash_atnn Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
…dimensions (vllm-project#16331) Signed-off-by: Yang Wang <elainewy@meta.com>
…dimensions (vllm-project#16331) Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Summary
Matryoshka Embeddings or Matryoshka Representation Learning (MRL) is a technique used in training embedding models. It allows user to trade off between performance and cost.
Not all embeddings models support MRL. Changing the output dimension for models that do not support MRL will lead to poor results. vllm returns an error for requests that attempt to change the output dimension of an unsupported MRL model.
We hope that the open source community will adopt the terms “is_matryoshka ” or “matryoshka_dimensions ” to denote whether a model is compatible with Matryoshka Representation Learning (MRL).
Usage
offline
online
expected output
FIX #15465