Skip to content

Conversation

@Bryce1010
Copy link
Contributor

@Bryce1010 Bryce1010 commented Jan 2, 2025

adds support for passing prompt_embeds to LLM.generate as

llm.generate({"prompt_embeds": input_embeds}, sampling_params)

or

llm.generate(
    [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)

this enables use cases when only the embedding layer is finetuned, and have the same model backend support multiple custom tuned embedding layers

FIX #416
FIX #8323
FIX #14621

@github-actions
Copy link

github-actions bot commented Jan 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Jan 2, 2025
@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch 5 times, most recently from 0429875 to 8c7751d Compare January 2, 2025 11:17
@DarkLight1337
Copy link
Member

Thanks for opening this PR, can you explain what this PR is about and how this is related to #11375?

@Bryce1010
Copy link
Contributor Author

Bryce1010 commented Jan 2, 2025

Thanks for opening this PR, can you explain what this PR is about and how this is related to #11375?

@DarkLight1337 Sorry for referencing the incorrect issue number. Please refer to the following issues for
#416
#8323

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jan 2, 2025

There is a related PR #6869, but your solution can sidestep the problem of missing input token IDs (at the cost of being less efficient).

Can you update this PR with the unit tests in #6869 to see whether this solution works correctly?

@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch 2 times, most recently from 5963444 to f70bbb3 Compare January 6, 2025 11:06
@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch 4 times, most recently from 5c82b7a to 868a730 Compare January 8, 2025 06:29
@mergify
Copy link

mergify bot commented Feb 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Bryce1010.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 8, 2025
@OswaldoBornemann
Copy link

@Bryce1010 Hi, when I used your branch, I ran the following case

import torch
import numpy as np

test_prompts = [
    "Who are you",
]

test_embeds = torch.randn(len(test_prompts), 10, 892).to(dtype=torch.float32)  # [batch_size, seq_len, hidden_size]
llm.generate(
    {"prompt_embeds":test_embeds.to(device)}
)

But I came out of the error like RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20250211-170512.pkl): expected scalar type Float but found BFloat16

@OswaldoBornemann
Copy link

@Bryce1010 I changed the test_embeds to test_embeds = torch.randn(len(test_prompts), 10, 896).to(dtype=torch.bfloat16) # [batch_size, seq_len, hidden_size] .

Now the error is AssertionError: Error in model execution (input dumped to /tmp/err_execute_model_input_20250211-172100.pkl):

@qthequartermasterman
Copy link
Contributor

@Bryce1010 @DarkLight1337 Is there an expected time for when this will be updated/merged?

@sidhartha-roy
Copy link

I would love to see this feature merged. Is there a timeline for when you expect this to be reviewed and merged @njhill and @alexm-redhat ?

@mergify mergify bot removed the needs-rebase label Feb 20, 2025
@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch from 209e8ae to 868a730 Compare February 20, 2025 12:15
@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch from 868a730 to cd8fcf8 Compare March 6, 2025 02:35
@mergify mergify bot removed the needs-rebase label Mar 6, 2025
@Bryce1010
Copy link
Contributor Author

@Bryce1010 do tell me when it's ready for another round of review. Seems that many people are interested in this.

@DarkLight1337 Thanks for your response! The code has been rebased. Could you please review it and let me know if any further changes are needed before merging into the main branch?

@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch 3 times, most recently from 3fdc9b5 to 3451391 Compare March 6, 2025 03:24
@DarkLight1337
Copy link
Member

Can you fix the pre-commit errors?

@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch 3 times, most recently from 6f480cb to c51e4a1 Compare March 6, 2025 07:09
@Bryce1010 Bryce1010 force-pushed the feature/vllm/add-input-embedding branch from c51e4a1 to fa8caec Compare March 6, 2025 07:14
@DarkLight1337
Copy link
Member

There are still errors, please fix them

@mergify
Copy link

mergify bot commented Mar 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Bryce1010.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 11, 2025
@Reek12138
Copy link

I would like to ask when this feature will be merged into the main branch. Once merged, will it support Python 3.8?

@DarkLight1337
Copy link
Member

DarkLight1337 commented Mar 14, 2025

Support for Python 3.8 in general has been dropped a while ago as it has reached EOL.

@liangwythu
Copy link

Would love to see this one merged!

@qthequartermasterman
Copy link
Contributor

@Bryce1010 @DarkLight1337 Since it has been a few weeks without an update, and I would love to have this feature, I took the initiative to rebase this PR onto main and fix the pre-commit errors as previously asked. I can't commit to this branch, since it's owned by @Bryce1010, so these updates are in #15428.

@DarkLight1337
Copy link
Member

@Bryce1010 @DarkLight1337 Since it has been a few weeks without an update, and I would love to have this feature, I took the initiative to rebase this PR onto main and fix the pre-commit errors as previously asked. I can't commit to this branch, since it's owned by @Bryce1010, so these updates are in #15428.

That sounds good, thanks for picking this up!

@leonardtang
Copy link

@Bryce1010 @DarkLight1337 Since it has been a few weeks without an update, and I would love to have this feature, I took the initiative to rebase this PR onto main and fix the pre-commit errors as previously asked. I can't commit to this branch, since it's owned by @Bryce1010, so these updates are in #15428.

My goat

@DarkLight1337
Copy link
Member

Closing in favor of #15428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

9 participants