-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Model] Add Ovis2.5 PP support #23405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds Pipeline Parallelism (PP) support for the Ovis2.5 model and introduces Tensor Parallelism (TP) for its Siglip2Navit vision backbone. The changes are extensive, replacing standard nn.Linear
layers with vLLM's parallel equivalents and updating the model architecture to be compatible with distributed execution. Overall, the implementation looks solid, but I've found a critical issue in the weight loading logic that appears to be a copy-paste error and could lead to incorrect behavior.
Thanks, can you run the example script with PP=1 and PP=2 to check the correctness? |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
not is_flash_attn_2_available(), | ||
reason="HF model needs `flash_attn` installed" | ||
)], | ||
hf_model_kwargs={"revision": "refs/pr/5"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this revision, we can run the model test without flash-attn
installed now:
tests/models/multimodal/generation/test_common.py::test_video_models[ovis2_5-test_case3]
/kaggle/working/vllm/tests/models/multimodal/generation/vlm_utils/core.py:154: UserWarning: Test1:
Matched tokens: [151667, 198, 20002, 99601, 85106, 101042, 100678, 99487, 87140, 103027, 1773, 101140, 50930, 102650, 5122, 102833, 100469, 103645, 100811, 3837, 108391, 105666, 104433, 104972, 3837, 102196, 33108, 102936, 99165, 100243, 116434, 1773, 101889]
hf: '<think>\n用户现在需要分析为什么这个视频有趣。首先看画面:婴儿戴着眼镜,模仿大人读书的样子,动作和表情很滑稽。然后分解元素:\n\n1. 婴儿的“阅读”行为:婴儿模仿大人读书,动作笨拙但可爱,比如翻页、专注的样子,和成人读书的场景形成反差,很幽默。\n2. 眼镜的拟人化:婴儿戴眼镜,像是在认真阅读,这种拟人化的表现很有趣,因为婴儿戴眼镜是现实中不太常见的,加上模仿阅读,强化了喜剧效果。\n3. �' {107799: -1.5044023990631104, 104449: -2.0981523990631104, 50930: -2.2387773990631104, 100062: -2.8325273990631104, 30534: -2.9419023990631104, 99172: -3.0044023990631104, 20412: -3.1762773990631104, 104107: -3.6137773990631104, 3837: -3.7856523990631104, 101348: -3.7856523990631104}
vllm: '<think>\n用户现在需要分析为什么这个视频有趣。首先看画面:婴儿戴着眼镜,模仿大人读书的样子,动作和表情很滑稽。然后细节:婴儿的动作(翻书、抬手)像在认真阅读,眼镜的拟人化,还有环境(床上、背景的家具)营造的居家氛围,加上婴儿的天真可爱,模仿成人行为的反差萌,这些元素结合起来让视频有幽默感。\n\n首先,**拟人化与模仿**:婴儿戴着眼镜,模仿大人读书,这种“成人化”的行为在婴儿身上显得滑稽,因为婴儿本' {104449: Logprob(logprob=-1.8062855005264282, rank=1, decoded_token='细节'), 107799: Logprob(logprob=-2.4156603813171387, rank=2, decoded_token='分解'), 30534: Logprob(logprob=-2.6187853813171387, rank=3, decoded_token='要'), 100374: Logprob(logprob=-2.6187853813171387, rank=4, decoded_token='结合'), 50930: Logprob(logprob=-2.7281603813171387, rank=5, decoded_token='看'), 20412: Logprob(logprob=-2.8531603813171387, rank=6, decoded_token='是'), 102122: Logprob(logprob=-2.9156603813171387, rank=7, decoded_token='场景'), 99719: Logprob(logprob=-3.0562853813171387, rank=8, decoded_token='环境'), 99172: Logprob(logprob=-3.4156603813171387, rank=9, decoded_token='想'), 3837: Logprob(logprob=-3.5250353813171387, rank=10, decoded_token=',')}
comparator(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================== 12 passed, 302 deselected, 29 warnings in 1048.85s (0:17:28) ================================================
"openbmb/MiniCPM-Llama3-V-2_5": PPTestSettings.fast(), | ||
"allenai/Molmo-7B-D-0924": PPTestSettings.fast(), | ||
"AIDC-AI/Ovis2-1B": PPTestSettings.fast(), | ||
"AIDC-AI/Ovis2.5-2B": PPTestSettings.fast(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have confirmed this test set can pass after increasing max_model_len to 8192.
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Purpose
use_data_parallel
for ViT to support data parallel in the future after [Core] Allow disabling TP sharding for parallel Linear layer #23024Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.