-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix] fix bf16 multimodal model hash #23623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug where hashing torch.bfloat16 tensors would fail. The proposed fix is to convert these tensors to a different data type before converting to a NumPy array. However, the chosen torch.float16 data type has a more limited range than bfloat16, which can lead to overflow and result in hash collisions for large numerical values. I've provided a critical comment with a suggestion to convert to torch.float32 instead, which avoids this issue. It would also be beneficial to add a new test case to verify the hashing of bfloat16 tensors and prevent future regressions.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
|
@DarkLight1337 Would you mind helping skip the failed pre-commit mypy CI tasks? |
|
You should fix the |
I think the CI CD error existed before the PR. I could help to fix it if you could guide me. |
Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Head branch was pushed to by a user without write access
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Head branch was pushed to by a user without write access
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
|
Should be good to go now, thanks |
Head branch was pushed to by a user without write access
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work now
| "original_dtype": str(tensor_dtype), | ||
| "original_shape": tuple(tensor_obj.shape), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late to the party. This is still the 1D shape of the uint8 tensor since tensor_obj is overwritten above. This should be obj.shape instead of tensor_obj.shape.
Could you extend the tests to verify the bfloat16 code path as well:
vllm/tests/multimodal/test_hasher.py
Lines 57 to 63 in 7ea22e4
| def test_hash_collision_array_shape(): | |
| # The hash should be different though the data is the same when flattened | |
| arr1 = np.zeros((5, 10, 20, 3)) | |
| arr2 = np.zeros((10, 20, 5, 3)) | |
| hasher = MultiModalHasher | |
| assert hasher.hash_kwargs(data=arr1) != hasher.hash_kwargs(data=arr2) |
Otherwise it would cause another CVE similar to #17378
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, let me open another PR to fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
This PR fixed a bug when multimodal model and its embedding input is torch.bfloat16 tensor. Bfloat16 tensors can't be converted to numpy() directly.