[ModelRunner] Support embedding inputs #916

Potabk · 2025-05-21T08:46:18Z

What this PR does / why we need it?

Adds support for passing prompt_embeds to LLM.generate as

llm.generate({"prompt_embeds": input_embeds}, sampling_params)

or

llm.generate(
    [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)

Add prompt_embeds to examples

Does this PR introduce any user-facing change?

How was this patch tested?

CI passed with new added/existing test.
and I have test with the example script in this pr, and the output seems looks good:

[Single Inference Output]
------------------------------
The capital of France is Paris. Paris is the largest city in France and is
------------------------------
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3966.87it/s]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.99it/s, est. speed input: 177.08 toks/s, output: 63.91 toks/s]

[Batch Inference Outputs]
------------------------------
Q1: Please tell me about the capital of France.
A1: The capital of France is Paris. It is located in the northern part of the

Q2: When is the day longest during the year?
A2: The day is longest during the year at the summer solstice. This typically occurs

Q3: Where is bigger, the moon or the sun?
A3: The sun is significantly bigger than the moon. 

The sun has a diameter of

------------------------------

Potabk · 2025-05-22T08:40:25Z

@wangxiyuan this is ready for review

wangxiyuan

it's good to add a test as well

wangxiyuan · 2025-05-22T10:27:12Z

examples/prompt_embedding_inference.py

@@ -0,0 +1,83 @@
+import torch


Nice example

vllm_ascend/worker/model_runner.py

github-actions · 2025-06-04T08:26:30Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Yikun · 2025-06-04T13:38:33Z

Need to support V1

Potabk · 2025-06-04T15:10:35Z

Need to support V1
For v1 it will fallback to v0 if use this feature
https://github.com/vllm-project/vllm/blob/8f4ffbd373cb19e8f8dcfa6dec1dbbe98fbeae96/vllm/engine/arg_utils.py#L1327

github-actions · 2025-06-05T08:36:40Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan

please fix the merge conflict

Signed-off-by: wangli <wangli858794774@gmail.com>

### What this PR does / why we need it? - Adds support for passing prompt_embeds to LLM.generate as ```bash llm.generate({"prompt_embeds": input_embeds}, sampling_params) ``` or ```bash llm.generate( [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params ) ``` - Add `prompt_embeds` to examples ### How was this patch tested? CI passed with new added/existing test. and I have test with the example script in this pr, and the output seems looks good: ```bash [Single Inference Output] ------------------------------ The capital of France is Paris. Paris is the largest city in France and is ------------------------------ Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3966.87it/s] Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3.99it/s, est. speed input: 177.08 toks/s, output: 63.91 toks/s] [Batch Inference Outputs] ------------------------------ Q1: Please tell me about the capital of France. A1: The capital of France is Paris. It is located in the northern part of the Q2: When is the day longest during the year? A2: The day is longest during the year at the summer solstice. This typically occurs Q3: Where is bigger, the moon or the sun? A3: The sun is significantly bigger than the moon. The sun has a diameter of ------------------------------ ``` --------- Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk force-pushed the dev branch 3 times, most recently from 1532649 to 0bd6624 Compare May 22, 2025 07:19

Potabk changed the title ~~[Worker][ModelRunner][WIP] Support embedding inputs~~ [ModelRunner] Support embedding inputs May 22, 2025

wangxiyuan approved these changes May 22, 2025

View reviewed changes

wangxiyuan added the ready read for review label May 22, 2025

Potabk force-pushed the dev branch 2 times, most recently from 2fd64bf to 727f97f Compare May 23, 2025 07:43

github-actions bot added the module:tests label May 23, 2025

Potabk force-pushed the dev branch 5 times, most recently from d5b3ad9 to 331a251 Compare May 29, 2025 07:57

github-actions bot added the module:core label May 29, 2025

Potabk mentioned this pull request Jun 3, 2025

[Bugfix] Add verification for quant_action.choices to avoid TypeError #1046

Merged

Potabk force-pushed the dev branch from 7858ce9 to a29add2 Compare June 3, 2025 04:11

github-actions bot removed the module:core label Jun 3, 2025

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Closed

76 tasks

github-actions bot added merge-conflicts and removed ready read for review labels Jun 4, 2025

Potabk force-pushed the dev branch from a29add2 to d4ff993 Compare June 4, 2025 15:36

github-actions bot added merge-conflicts and removed merge-conflicts labels Jun 4, 2025

wangxiyuan approved these changes Jun 6, 2025

View reviewed changes

support input embedding

fd0cd24

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk force-pushed the dev branch from d4ff993 to fd0cd24 Compare June 6, 2025 01:45

add test on v0

3e194ad

Signed-off-by: wangli <wangli858794774@gmail.com>

github-actions bot removed the merge-conflicts label Jun 6, 2025

wangxiyuan added the ready read for review label Jun 6, 2025

wangxiyuan merged commit 11a7df4 into vllm-project:main Jun 6, 2025
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ModelRunner] Support embedding inputs #916

[ModelRunner] Support embedding inputs #916

Uh oh!

Potabk commented May 21, 2025 •

edited

Loading

Uh oh!

Potabk commented May 22, 2025

Uh oh!

wangxiyuan left a comment

Uh oh!

wangxiyuan May 22, 2025

Uh oh!

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

Yikun commented Jun 4, 2025

Uh oh!

Potabk commented Jun 4, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

wangxiyuan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[ModelRunner] Support embedding inputs #916

[ModelRunner] Support embedding inputs #916

Uh oh!

Conversation

Potabk commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Potabk commented May 22, 2025

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

Yikun commented Jun 4, 2025

Uh oh!

Potabk commented Jun 4, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Potabk commented May 21, 2025 •

edited

Loading