-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix] Fix incorrect original shape in hashing #23672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix incorrect original shape in hashing #23672
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug where the original shape of a bfloat16 tensor was being incorrectly determined during the hashing process, which could lead to hash collisions. While this fix is necessary, I've identified a pre-existing critical issue in the same code block that handles bfloat16 tensors. The method used to convert the tensor to its byte representation is incorrect and will result in a runtime error. I have provided a review comment with a suggested fix for this critical bug.
| tensor_obj = tensor_obj.contiguous() | ||
| tensor_obj = tensor_obj.view( | ||
| (tensor_obj.numel(), )).view(torch.uint8) | ||
|
|
||
| return cls.item_to_bytes( | ||
| "tensor", { | ||
| "original_dtype": str(tensor_dtype), | ||
| "original_shape": tuple(tensor_obj.shape), | ||
| "data": tensor_obj.numpy() | ||
| "original_shape": tuple(tensor_shape), | ||
| "data": tensor_obj.numpy(), | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method used to get the byte representation of a bfloat16 tensor is incorrect and will cause a TypeError at runtime. The torch.Tensor.view() method expects a shape as input, not a dtype. Additionally, a bfloat16 tensor (2 bytes per element) cannot be directly viewed as a uint8 tensor (1 byte per element) by reinterpreting memory this way.
A more robust approach is to first view the bfloat16 tensor as a type of the same size (e.g., uint16), then convert it to a NumPy array, and finally view that array as uint8 to get the raw bytes for hashing.
| tensor_obj = tensor_obj.contiguous() | |
| tensor_obj = tensor_obj.view( | |
| (tensor_obj.numel(), )).view(torch.uint8) | |
| return cls.item_to_bytes( | |
| "tensor", { | |
| "original_dtype": str(tensor_dtype), | |
| "original_shape": tuple(tensor_obj.shape), | |
| "data": tensor_obj.numpy() | |
| "original_shape": tuple(tensor_shape), | |
| "data": tensor_obj.numpy(), | |
| }) | |
| tensor_obj = tensor_obj.contiguous() | |
| # To correctly get the byte representation of a bfloat16 tensor, | |
| # it should be viewed as a type of the same size (e.g., uint16), | |
| # then converted to a numpy array, and finally viewed as uint8. | |
| data_np = tensor_obj.view(torch.uint16).numpy().view(np.uint8) | |
| return cls.item_to_bytes( | |
| "tensor", { | |
| "original_dtype": str(tensor_dtype), | |
| "original_shape": tuple(tensor_shape), | |
| "data": data_np, | |
| }) |
Add test for bfloat16 hash collision
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Purpose
FIX https://github.com/vllm-project/vllm/pull/23623/files#r2301482573
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.