Skip to content

Conversation

@noooop
Copy link
Collaborator

@noooop noooop commented Apr 9, 2025

Summary

Matryoshka Embeddings or Matryoshka Representation Learning (MRL) is a technique used in training embedding models. It allows user to trade off between performance and cost.

Not all embeddings models support MRL. Changing the output dimension for models that do not support MRL will lead to poor results. vllm returns an error for requests that attempt to change the output dimension of an unsupported MRL model.

  raise ValueError(
      f'Model "{model_config.served_model_name}" does not '
      f'support matryoshka representation, '
      f'changing output dimensions will lead to poor results.')

We hope that the open source community will adopt the terms “is_matryoshka ” or “matryoshka_dimensions ” to denote whether a model is compatible with Matryoshka Representation Learning (MRL).

Usage

offline

python -m examples.offline_inference.embed_matryoshka_fy

online

vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
curl http://127.0.0.1:8000/v1/embeddings \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "input": "The capital of Brazil is Brasilia.",
    "model": "jinaai/jina-embeddings-v3",
    "encoding_format": "float",
    "dimensions": 1
  }'

expected output

{"id":"embd-0aab28c384d348c3b8f0eb783109dc5f","object":"list","created":1744195454,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-1.0]}],"usage":{"prompt_tokens":10,"total_tokens":10,"completion_tokens":0,"prompt_tokens_details":null}}

FIX #15465

@github-actions
Copy link

github-actions bot commented Apr 9, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation frontend labels Apr 9, 2025
@noooop
Copy link
Collaborator Author

noooop commented Apr 9, 2025

@DarkLight1337

Currently the following tests can be passed locally:

  • vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
  • pytest tests/models/embedding/language/test_jina.py
  • pytest tests/entrypoints/openai/test_embedding_dimensions.py
  • python examples/offline_inference/embed_matryoshka_fy.py

potential problems:

  • Variable naming and docstrings
  • test_embedding_dimensions.py merge into tests/entrypoints/openai/test_embedding.py
  • Is online (openai server) matryoshka_fy correctness testing required?

@noooop noooop marked this pull request as ready for review April 9, 2025 10:49
@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

@DarkLight1337

Pass is_matryoshka to PoolerHead via pooler_config to increase logic clarity

Now the logic is:

Fully controlled by is_matryoshka, always do normalize when is_matryoshka

@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

@DarkLight1337

Split the normalize and the change the output dimension

how about now

@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

@DarkLight1337

I have closed all previous versions of conversation.

Do you have any suggestions to the latest version?

@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

@DarkLight1337

how about “overwrite normalize in _init_pooler_config”

@DarkLight1337
Copy link
Member

@DarkLight1337

how about “overwrite normalize in _init_pooler_config”

I think we should make this a user-facing error instead of silently overwriting the user's configuration.

@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

@DarkLight1337

raise ValueError when is matryoshka and normalize being disabled

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks for bearing with me!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 11, 2025 09:20
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 11, 2025
@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

Thanks for reviewing

I think I should submit more code to the open source community to improve my coding skills.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Apr 11, 2025

We should also update the docs for Pooling Models to tell users how to use is_matryoshka, but that can be done in another PR if you want.

auto-merge was automatically disabled April 11, 2025 09:25

Head branch was pushed to by a user without write access

@noooop
Copy link
Collaborator Author

noooop commented Apr 11, 2025

We should also update the docs for Pooling Models to tell users how to use is_matryoshka, but that can be done in another PR if you want.

I will try.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 11, 2025 13:41
@noooop
Copy link
Collaborator Author

noooop commented Apr 12, 2025

@DarkLight1337

CI is stuck

Every time pr go to the CI stage, I don’t know what to do

@vllm-bot vllm-bot merged commit fbf722c into vllm-project:main Apr 12, 2025
42 of 43 checks passed
@DarkLight1337
Copy link
Member

Force merging

@noooop
Copy link
Collaborator Author

noooop commented Apr 12, 2025

Force merging

QvQ

pooyadavoodi added a commit to pooyadavoodi/vllm that referenced this pull request Apr 15, 2025
Pass pooling_metadata to pooler head in gritlm. This was broken by PR vllm-project#16331 broke gritlm.

PR vllm-project#14516 broke gritlm tests
due to changing xformers to flash_atnn

Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
…dimensions (vllm-project#16331)

Signed-off-by: Yang Wang <elainewy@meta.com>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…dimensions (vllm-project#16331)

Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
@noooop noooop deleted the matryoshka branch July 10, 2025 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Embedding API dimensions is currently not supported.

3 participants