[Frontend] Chat-based Embeddings API #9759

DarkLight1337 · 2024-10-28T13:12:13Z

This PR extends the existing Embeddings API to accept chat conversations similar to Chat Completions API. This enables multi-modal conversations to be passed to the embedding model.

To reduce code duplication, I've also factored out the common code for handling completion and chat-based inputs into the base OpenAIServing class.

FIX #8967
FIX #9303 (comment)

github-actions · 2024-10-28T13:12:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/entrypoints/openai/serving_embedding.py

mergify · 2024-10-29T12:34:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. @DarkLight1337 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2024-10-29T17:49:14Z

Sorry for the commit spam, I'm done refining the docs now.

DarkLight1337 · 2024-10-30T04:29:37Z

Entrypoints tests pass now.

Isotr0py

Overall LGTM since entrypoint tests all passed. See if @ywang96 is OK with the frontend refactor as well.

DarkLight1337 · 2024-10-31T02:16:21Z

@simon-mo @njhill do you have time to take a look at this? @ywang96 is busy today.

maxdebayser

I left a few comments but overall this looks good to me.

maxdebayser · 2024-10-31T19:00:10Z

docs/source/models/vlm.rst

+    Since VLM2Vec has the same model architecture as Phi-3.5-Vision, we have to explicitly pass ``--task embedding``
+    to run this model in embedding mode instead of text generation mode.
+
+Since this schema is not defined by OpenAI client, we post a request to the server using the lower-level ``requests`` library:


Just leaving this as a thought here: should we perhaps have a fork of the openai client that support our extensions explicitly?

This sounds good, but not sure whether we have bandwidth to maintain it 😅

I suggest opening an issue for this.

tests/entrypoints/openai/test_embedding.py

maxdebayser · 2024-10-31T19:55:49Z

vllm/pooling_params.py

@@ -7,7 +7,7 @@ class PoolingParams(
        msgspec.Struct,
        omit_defaults=True,  # type: ignore[call-arg]
        array_like=True):  # type: ignore[call-arg]
-    """Pooling parameters for pooling.
+    """Pooling parameters for embeddings API.


I might be missing something, but the additional_data attribute doesn't seem to be used anywhere. Which is good, because it can by anything and is passed without validation from the request to the Pooler.forward() method as part of the PoolingMetadata object. If there is no use case for this, can we remove it in this PR?

@robertgshaw2-neuralmagic originally added this (#4800 (comment)). I am not sure whether this is still relevant since we can now set the pooling strategy via CLI (#9697).

@robertgshaw2-neuralmagic can you comment on this?

Meanwhile let's merge this PR first.

ywang96

Left a few comments - PTAL!

docs/source/models/vlm.rst

tests/entrypoints/openai/test_embedding.py

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_embedding.py

ywang96

LGTM!

DarkLight1337 added 2 commits October 28, 2024 13:10

Initial implementation

1b91750

Update docs

61e0fcf

DarkLight1337 changed the title ~~Chat embeddings api~~ [Frontend] Chat-based Embeddings API Oct 28, 2024

DarkLight1337 mentioned this pull request Oct 28, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

69 tasks

DarkLight1337 added 5 commits October 28, 2024 14:04

Cleanup

c62be47

Consolidate and make code consistent

cc999b1

Remove useless statement

9ed87c1

Rename back

efa7c6f

Factor out common code

ab9297e

mergify bot added documentation Improvements or additions to documentation frontend labels Oct 28, 2024

maxdebayser reviewed Oct 28, 2024

View reviewed changes

vllm/entrypoints/openai/serving_embedding.py Outdated Show resolved Hide resolved

DarkLight1337 added 9 commits October 29, 2024 02:23

Reinstate truncate_prompt_tokens check

5a4f271

Rename

4a969b4

Fix

279b9ce

Remove unused code

7de803f

Migrate tokenization API

c1ef363

Some fixes

a10fa85

format

89e0710

remoev unused imports

81b94de

Migrate chat and completion APIs

a79d3b2

mergify bot added the needs-rebase label Oct 29, 2024

DarkLight1337 added 2 commits October 29, 2024 13:54

Factor out trace headers code

8b950dd

Merge branch 'main' into chat-embeddings-api

2c91855

mergify bot removed the needs-rebase label Oct 29, 2024

DarkLight1337 added 2 commits October 29, 2024 13:59

Clean

f5e72ff

More precise error handling

9cd1ac3

DarkLight1337 force-pushed the chat-embeddings-api branch from 3cc07d5 to 9cd1ac3 Compare October 29, 2024 14:04

DarkLight1337 added 8 commits October 29, 2024 17:23

Reword

50ad3aa

Fix

9c1df21

Update

8049030

Update

a387845

Update

d80ec7e

format

ea5fd96

Convert to tip

b05ede6

newline

dba9806

Fix missing client

557c9ef

Isotr0py approved these changes Oct 30, 2024

View reviewed changes

DarkLight1337 added 2 commits October 31, 2024 01:25

Merge branch 'main' into chat-embeddings-api

8c8ee96

Merge branch 'main' into chat-embeddings-api

c3ba030

DarkLight1337 requested a review from njhill October 31, 2024 02:16

maxdebayser approved these changes Oct 31, 2024

View reviewed changes

ywang96 reviewed Oct 31, 2024

View reviewed changes

docs/source/models/vlm.rst Outdated Show resolved Hide resolved

tests/entrypoints/openai/test_embedding.py Show resolved Hide resolved

vllm/entrypoints/openai/protocol.py Show resolved Hide resolved

vllm/entrypoints/openai/serving_embedding.py Outdated Show resolved Hide resolved

DarkLight1337 added 2 commits November 1, 2024 04:14

Optionally initialize request handlers

46f316f

Update tip

1179f66

ywang96 approved these changes Nov 1, 2024

View reviewed changes

DarkLight1337 added 3 commits November 1, 2024 05:54

Update tests

eb4b235

format

bf46a16

Rename

7f188f9

DarkLight1337 enabled auto-merge (squash) November 1, 2024 05:58

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2024

DarkLight1337 merged commit 06386a6 into main Nov 1, 2024
69 checks passed

DarkLight1337 deleted the chat-embeddings-api branch November 1, 2024 08:13

DarkLight1337 mentioned this pull request Nov 1, 2024

[Frontend] Use a proper chat template for VLM2Vec #9912

Merged

FurtherAI mentioned this pull request Nov 2, 2024

[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 #9944

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Chat-based Embeddings API #9759

[Frontend] Chat-based Embeddings API #9759

DarkLight1337 commented Oct 28, 2024 •

edited

Loading

github-actions bot commented Oct 28, 2024

mergify bot commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024

DarkLight1337 commented Oct 30, 2024

Isotr0py left a comment

DarkLight1337 commented Oct 31, 2024 •

edited

Loading

maxdebayser left a comment

maxdebayser Oct 31, 2024

DarkLight1337 Nov 1, 2024

DarkLight1337 Nov 1, 2024

maxdebayser Oct 31, 2024

DarkLight1337 Nov 1, 2024 •

edited

Loading

DarkLight1337 Nov 1, 2024

DarkLight1337 Nov 1, 2024

ywang96 left a comment

ywang96 left a comment

[Frontend] Chat-based Embeddings API #9759

[Frontend] Chat-based Embeddings API #9759

Conversation

DarkLight1337 commented Oct 28, 2024 • edited Loading

github-actions bot commented Oct 28, 2024

mergify bot commented Oct 29, 2024

DarkLight1337 commented Oct 29, 2024

DarkLight1337 commented Oct 30, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Oct 31, 2024 • edited Loading

maxdebayser left a comment

Choose a reason for hiding this comment

maxdebayser Oct 31, 2024

Choose a reason for hiding this comment

DarkLight1337 Nov 1, 2024

Choose a reason for hiding this comment

DarkLight1337 Nov 1, 2024

Choose a reason for hiding this comment

maxdebayser Oct 31, 2024

Choose a reason for hiding this comment

DarkLight1337 Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 Nov 1, 2024

Choose a reason for hiding this comment

DarkLight1337 Nov 1, 2024

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Oct 28, 2024 •

edited

Loading

DarkLight1337 commented Oct 31, 2024 •

edited

Loading

DarkLight1337 Nov 1, 2024 •

edited

Loading