-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] generate from input embeds #6869
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
@WoosukKwon @ywang96 @robertgshaw2-neuralmagic the failed tests with other than that, this is ready for review |
This is due to a recent change from transformers of deprecating default chat template, and it should have been fixed by #7238. Can you merge your branch with main again? |
3f79009
to
dfd9301
Compare
Is the idea to call |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
yes, correct. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
I have moved the |
assert input_ids is not None, msg | ||
|
||
hidden_states = embeddings_module(input_ids) | ||
hidden_states[inputs_embeds_masks] = inputs_embeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is not compatible with compilation, the shape of inputs_embeds
does not match hidden_states
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically, in-place operations aren't allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in-place is fine, but we need all operations to have the same batchsize.
the requirement for compilation is: during the lifespan of the model, the computation graph does not change. if see #9946 for example of how to make vision language model compatible with compilation. in vision-language model, this part of logic is not in compile-region. we only compile language tower. |
Excited to use this feature! |
Hmm, I see. I'm quite against changing the semantics of |
@DarkLight1337 It seems that this feature is almost ready to be used? |
It is incompatible with the recent changes to vLLM internals. I'm waiting for the dust to settle for V2 engine before picking this back up. |
Can I run it directly using your branch? |
No, the current branch is broken because I have already merged in those changes from |
I think the latest working commit is 49fe3f7 |
Umm.. Is there a working commit I can reset back to? |
Perhaps an option is to move the embedding logic to |
@DarkLight1337 Do you guys have ETA of this feature? |
No ETA yet, see when V2 re-arch is done. |
I think that this PR will eventually be superseded by the follow-up work to #10374 which should make it easy to support embedding inputs. |
Hi there, if there is any update for this PR? |
Please read the above comment. |
This pull request has merge conflicts that must be resolved before it can be |
@DarkLight1337 since #10374 has landed, is this pr going to be updated so that this work can be finished, or is there a new PR you can reference for tracking? |
Thanks for your interest. At the moment, V1 isn't stable enough to implement embedding inputs yet. I would point you to #8779 to check the progress, but that RFC is somewhat outdated. Instead, you can search for recent PRs with the [V1] tag. |
Thanks for the reference, @DarkLight1337 . Is it possible also to add this to the roadmap for additional visibility as well? |
Hey @CandiedCode! Thanks for following up on this. IMO supporting embeddings as input as is does not have technical difficulty but we do want to be careful with the design to make it work with all other features we want to natively support on vLLM, especially now that we're going through re-architecture. I have discussed briefly with @WoosukKwon in #11032 (comment) about it. In particular, some issues we still need to design and address:
|
adds support for passing
prompt_embeds
toLLM.generate
asor
this enables use cases when only the embedding layer is finetuned, and have the same model backend support multiple custom tuned embedding layers
FIX #416
FIX #8323
inspired by #1265 which is very outdated