[Frontend] support matryoshka representation / support embedding API dimensions #16331

noooop · 2025-04-09T10:27:22Z

Summary

Matryoshka Embeddings or Matryoshka Representation Learning (MRL) is a technique used in training embedding models. It allows user to trade off between performance and cost.

Not all embeddings models support MRL. Changing the output dimension for models that do not support MRL will lead to poor results. vllm returns an error for requests that attempt to change the output dimension of an unsupported MRL model.

  raise ValueError(
      f'Model "{model_config.served_model_name}" does not '
      f'support matryoshka representation, '
      f'changing output dimensions will lead to poor results.')

We hope that the open source community will adopt the terms “is_matryoshka ” or “matryoshka_dimensions ” to denote whether a model is compatible with Matryoshka Representation Learning (MRL).

Usage

offline

python -m examples.offline_inference.embed_matryoshka_fy

online

vllm serve jinaai/jina-embeddings-v3 --trust-remote-code

curl http://127.0.0.1:8000/v1/embeddings \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "input": "The capital of Brazil is Brasilia.",
    "model": "jinaai/jina-embeddings-v3",
    "encoding_format": "float",
    "dimensions": 1
  }'

expected output

{"id":"embd-0aab28c384d348c3b8f0eb783109dc5f","object":"list","created":1744195454,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-1.0]}],"usage":{"prompt_tokens":10,"total_tokens":10,"completion_tokens":0,"prompt_tokens_details":null}}

FIX #15465

github-actions · 2025-04-09T10:27:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

noooop · 2025-04-09T10:34:18Z

@DarkLight1337

Currently the following tests can be passed locally:

vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
pytest tests/models/embedding/language/test_jina.py
pytest tests/entrypoints/openai/test_embedding_dimensions.py
python examples/offline_inference/embed_matryoshka_fy.py

potential problems:

Variable naming and docstrings
test_embedding_dimensions.py merge into tests/entrypoints/openai/test_embedding.py
Is online (openai server) matryoshka_fy correctness testing required?

tests/entrypoints/openai/test_embedding_dimensions.py

vllm/model_executor/layers/pooler.py

…shka

noooop · 2025-04-11T03:56:13Z

@DarkLight1337

Pass is_matryoshka to PoolerHead via pooler_config to increase logic clarity

Now the logic is:

Fully controlled by is_matryoshka, always do normalize when is_matryoshka

vllm/model_executor/layers/pooler.py

noooop · 2025-04-11T05:00:50Z

@DarkLight1337

Split the normalize and the change the output dimension

how about now

noooop · 2025-04-11T05:52:01Z

@DarkLight1337

I have closed all previous versions of conversation.

Do you have any suggestions to the latest version?

noooop · 2025-04-11T08:18:02Z

@DarkLight1337

how about “overwrite normalize in _init_pooler_config”

DarkLight1337 · 2025-04-11T08:27:41Z

@DarkLight1337

how about “overwrite normalize in _init_pooler_config”

I think we should make this a user-facing error instead of silently overwriting the user's configuration.

vllm/config.py

noooop · 2025-04-11T08:58:58Z

@DarkLight1337

raise ValueError when is matryoshka and normalize being disabled

vllm/pooling_params.py

vllm/model_executor/layers/pooler.py

DarkLight1337

LGTM now, thanks for bearing with me!

noooop · 2025-04-11T09:22:34Z

Thanks for reviewing

I think I should submit more code to the open source community to improve my coding skills.

DarkLight1337 · 2025-04-11T09:22:44Z

We should also update the docs for Pooling Models to tell users how to use is_matryoshka, but that can be done in another PR if you want.

noooop · 2025-04-11T09:28:18Z

We should also update the docs for Pooling Models to tell users how to use is_matryoshka, but that can be done in another PR if you want.

I will try.

noooop · 2025-04-12T04:48:14Z

@DarkLight1337

CI is stuck

Every time pr go to the CI stage, I don’t know what to do

DarkLight1337 · 2025-04-12T06:23:11Z

Force merging

noooop · 2025-04-12T06:26:56Z

Force merging

QvQ

Pass pooling_metadata to pooler head in gritlm. This was broken by PR vllm-project#16331 broke gritlm. PR vllm-project#14516 broke gritlm tests due to changing xformers to flash_atnn Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>

…dimensions (vllm-project#16331) Signed-off-by: Yang Wang <elainewy@meta.com>

…dimensions (vllm-project#16331)

…dimensions (vllm-project#16331) Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

support matryoshka

097cf7c

mergify bot added documentation Improvements or additions to documentation frontend labels Apr 9, 2025

noooop marked this pull request as ready for review April 9, 2025 10:49

noooop requested review from DarkLight1337, robertgshaw2-redhat, simon-mo and ywang96 as code owners April 9, 2025 10:49

DarkLight1337 reviewed Apr 9, 2025

View reviewed changes

tests/entrypoints/openai/test_embedding_dimensions.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

noooop added 2 commits April 10, 2025 13:24

+ verify

6395806

Fully controlled by is_matryoshka, always do normalize when is_matryo…

2145dc7

…shka

Remove duplicate code

ca7bc8d