Skip to content

Conversation

@Potabk
Copy link
Collaborator

@Potabk Potabk commented May 21, 2025

What this PR does / why we need it?

  • Adds support for passing prompt_embeds to LLM.generate as
llm.generate({"prompt_embeds": input_embeds}, sampling_params)

or

llm.generate(
    [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)
  • Add prompt_embeds to examples

Does this PR introduce any user-facing change?

How was this patch tested?

CI passed with new added/existing test.
and I have test with the example script in this pr, and the output seems looks good:

[Single Inference Output]
------------------------------
The capital of France is Paris. Paris is the largest city in France and is
------------------------------
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3966.87it/s]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.99it/s, est. speed input: 177.08 toks/s, output: 63.91 toks/s]

[Batch Inference Outputs]
------------------------------
Q1: Please tell me about the capital of France.
A1: The capital of France is Paris. It is located in the northern part of the

Q2: When is the day longest during the year?
A2: The day is longest during the year at the summer solstice. This typically occurs

Q3: Where is bigger, the moon or the sun?
A3: The sun is significantly bigger than the moon. 

The sun has a diameter of

------------------------------

@Potabk Potabk force-pushed the dev branch 3 times, most recently from 1532649 to 0bd6624 Compare May 22, 2025 07:19
@Potabk
Copy link
Collaborator Author

Potabk commented May 22, 2025

@wangxiyuan this is ready for review

@Potabk Potabk changed the title [Worker][ModelRunner][WIP] Support embedding inputs [ModelRunner] Support embedding inputs May 22, 2025
Copy link
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's good to add a test as well

@@ -0,0 +1,83 @@
import torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot added merge-conflicts and removed ready read for review labels Jun 4, 2025
@Yikun
Copy link
Collaborator

Yikun commented Jun 4, 2025

Need to support V1

@Potabk
Copy link
Collaborator Author

Potabk commented Jun 4, 2025

Need to support V1
For v1 it will fallback to v0 if use this feature
https://github.com/vllm-project/vllm/blob/8f4ffbd373cb19e8f8dcfa6dec1dbbe98fbeae96/vllm/engine/arg_utils.py#L1327

@github-actions
Copy link

github-actions bot commented Jun 5, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the merge conflict

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
@wangxiyuan wangxiyuan added the ready read for review label Jun 6, 2025
@wangxiyuan wangxiyuan merged commit 11a7df4 into vllm-project:main Jun 6, 2025
28 checks passed
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
- Adds support for passing prompt_embeds to LLM.generate as
```bash
llm.generate({"prompt_embeds": input_embeds}, sampling_params)
```
or
```bash
llm.generate(
    [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)
```
- Add `prompt_embeds` to examples

### How was this patch tested?
CI passed with new added/existing test.
and I have test with the example script in this pr, and the output seems
looks good:
```bash

[Single Inference Output]
------------------------------
The capital of France is Paris. Paris is the largest city in France and is
------------------------------
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3966.87it/s]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.99it/s, est. speed input: 177.08 toks/s, output: 63.91 toks/s]

[Batch Inference Outputs]
------------------------------
Q1: Please tell me about the capital of France.
A1: The capital of France is Paris. It is located in the northern part of the

Q2: When is the day longest during the year?
A2: The day is longest during the year at the summer solstice. This typically occurs

Q3: Where is bigger, the moon or the sun?
A3: The sun is significantly bigger than the moon. 

The sun has a diameter of

------------------------------
```

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
- Adds support for passing prompt_embeds to LLM.generate as
```bash
llm.generate({"prompt_embeds": input_embeds}, sampling_params)
```
or
```bash
llm.generate(
    [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)
```
- Add `prompt_embeds` to examples

### How was this patch tested?
CI passed with new added/existing test.
and I have test with the example script in this pr, and the output seems
looks good:
```bash

[Single Inference Output]
------------------------------
The capital of France is Paris. Paris is the largest city in France and is
------------------------------
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3966.87it/s]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  3.99it/s, est. speed input: 177.08 toks/s, output: 63.91 toks/s]

[Batch Inference Outputs]
------------------------------
Q1: Please tell me about the capital of France.
A1: The capital of France is Paris. It is located in the northern part of the

Q2: When is the day longest during the year?
A2: The day is longest during the year at the summer solstice. This typically occurs

Q3: Where is bigger, the moon or the sun?
A3: The sun is significantly bigger than the moon. 

The sun has a diameter of

------------------------------
```

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants