-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Model][VLM] Support Bee-8B Model #27012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Signed-off-by: uyzhang <yi.zhang.4096@gmail.com>
|
Documentation preview: https://vllm--27012.org.readthedocs.build/en/27012/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the Bee-8B model. The implementation correctly integrates the model into vLLM by inheriting from existing LLaVA-like model classes and providing a model-specific multimodal projector and processing logic. The changes also include documentation updates, examples, and tests. I've found one critical compatibility issue in the implementation that needs to be addressed.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com>
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! LGTM
Merged 8 commits from origin/main including: - PR vllm-project#26586: Eagle rejection sampler fix (previously cherry-picked) - LoRA CUDA graph specialization (vllm-project#25914) - Bee-8B VLM model support (vllm-project#27012) - Utilities reorganization (network_utils, async_utils, etc.) - Multiple bug fixes and improvements In-Tree Modifications: - Removed Eagle rejection sampler cherry-pick (now in upstream) - Kept Qwen3 tool parser fix (still needed, line 523) - Only 1 active in-tree modification remaining Plugin Compatibility: - All 10 plugin patches load successfully - No target class changes required - Clean merge with no conflicts Documentation Updates: - Updated IN_TREE_MODIFICATIONS.md (moved Eagle fix to Removed/Obsolete) - Updated CLAUDE.md merge history - Verified clean diff with origin/main (3 files, all documented) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com> Signed-off-by: Yi Zhang <zhangyi970819@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
Support Bee-8B-RL and Bee-8B-SFT
Example Serving Command
vllm serve \ Open-Bee/Bee-8B-RL \ --served-model-name bee-8b-rl \ --tensor-parallel-size 8 \ --gpu-memory-utilization 0.8 \ --host 0.0.0.0 \ --port 8000 \ --trust-remote-codeExample Offline Inference