-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[Core] Set pooling params based on task and model #21128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Set pooling params based on task and model #21128
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
| # req_index -> bad_words_token_ids | ||
| self.bad_words_token_ids: dict[int, list[list[int]]] = {} | ||
|
|
||
| self.logits_processing_needs_token_ids = np.zeros(max_num_reqs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this can be set on a per-request basis instead of being fixed at startup time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors how pooling parameters are handled by moving the logic from the API layer to the model/worker layer, which is a great improvement for flexibility and encapsulation. The changes are well-implemented across the codebase.
I've found one critical issue in vllm/entrypoints/llm.py that will lead to a runtime ValueError due to a task mismatch during verification. Please see the specific comment for details on the issue and how to resolve it. Other than that, the changes look solid.
| def get_model(self) -> nn.Module: | ||
| raise NotImplementedError | ||
|
|
||
| def get_supported_pooling_tasks(self) -> list[PoolingTask]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not technically used in V0 but I put it here to avoid forgetting to include this method when we implement the base class for V1 model runners
maxdebayser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a suggestion, but otherwise it LGTM
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM!
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
| **kwargs, | ||
| ) -> None: | ||
| # Use full attention for pooling | ||
| # Use full attention for pooling (this is why V1 is not supported yet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this refactoring, the pooler implementation of GritLM becomes compatible with V1 model runner now (note that logits_processing_needs_token_ids=Truerequires_token_ids=True). However none of the V1 attention backends support bidirectional attention yet, so this model is still marked V0 only
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.cc @maxdebayser @noooop @22quinn
Purpose
Set
pooling_params.use_cross_encoderandpooling_params.logits_processing_needs_token_idsbased on the task and model, replacing the static logic at API level. This provides more flexibility for models to respond differently based on the task being used.Implements #21058 (comment)
Test Plan
The existing tests should pass
Test Result
(Optional) Documentation Update