-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Migrate LlavaImageInputs to TensorSchema #21770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR migrates LlavaImageInputs to TensorSchema for better input validation. An edge case where an empty list of pixel_values could cause a server crash was identified and should be addressed.
vllm/model_executor/models/llava.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When pixel_values is an empty list, flatten_bn(pixel_values, concat=True) on line 597 will raise an exception because it calls torch.cat([]). This can crash the server if a request has an empty list for pixel_values. Add a check for an empty pixel_values list before this block to prevent this.
if pixel_values:
expected_h = expected_w = self.config.vision_config.image_size
return LlavaImagePixelInputs(
type="pixel_values",
pixel_values=flatten_bn(pixel_values, concat=True),
resolve_bindings={
"h": expected_h,
"w": expected_w
},
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid concern, but I'd like to avoid introducing new enforcement with the migration. Happy to update if others feel it'd be helpful.
vllm/model_executor/models/llava.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite this comment about varying height/width, I'm adding enforcement to match existing behaviors in _validate_pixel_values. Please feel free to correct. @DarkLight1337 @Isotr0py
def _validate_pixel_values(self, data: torch.Tensor) -> torch.Tensor:
h = w = self.config.vision_config.image_size
expected_dims = (3, h, w)
actual_dims = tuple(data.shape[1:])
if actual_dims != expected_dims:
expected_expr = ("batch_size", *map(str, expected_dims))
raise ValueError(
f"The expected shape of pixel values is {expected_expr}. "
f"You supplied {tuple(data.shape)}.")
return data
vllm/model_executor/models/llava.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No enforcement was previously applied, so I skipped adding validations against (num_channels, height, width). Feel free to let me know if there's other preferences. cc @DarkLight1337 @Isotr0py
Head branch was pushed to by a user without write access
|
Can you merge from main? It should fix the CI |
Head branch was pushed to by a user without write access
Able to reproduce the main branch CI failures locally. These appear unrelated to this PR. Will rebase once the upstream issue is fixed. |
|
Retrying MM test |
Sorry I missed that. Will take a closer look. |
|
Took a closer look, but the MM test failure seems to happen with latest on main. It seems related to downloading image for Pixtral, so unrelated to these changes? |
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
Retrying |
Signed-off-by: Benji Beck <benjibeck@meta.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Benji Beck <benjibeck@meta.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: Benji Beck <benjibeck@meta.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: Benji Beck <benjibeck@meta.com>
Purpose
This PR migrates LlavaImageInputs from a TypedDict-based definition to a structured TensorSchema model with runtime shape validation. This brings it in line with recent changes to Phi3VImagePixelInputs, and is part of a broader effort to improve input contract enforcement and debug-ability across multi-modal models.
Test Plan
Confirm validation works via standalone tests in tests/standalone_test/test_tensor_schema.py and rely on CI to check integration.
Test Result