-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
EVS Support (Video tokens pruning) #22980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVS Support (Video tokens pruning) #22980
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Efficient Video Sampling (EVS) by introducing a new interface for models to return custom embeddings and positions, which enables video token pruning. While the overall direction is good, there are several critical issues that need to be addressed. The new interface signature in interfaces.py is inconsistent with its usage in gpu_model_runner.py. More importantly, the logic for updating request states with the pruned positions appears to be incorrect, as it applies the same update to all requests in a batch. Additionally, there are several leftover debugging statements and commented-out code that should be cleaned up before merging.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
3ad3321 to
5e784b0
Compare
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…e/evs-support-clean
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> # Conflicts: # vllm/v1/worker/gpu_model_runner.py
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…e/evs-support-clean
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, let's get this merged
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
Enable use of Efficient Video Sampling (EVS) for redundant video tokens pruning:
EVS reduces TTFT and ITL by pruning less important vision tokens from the LLM:
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.