- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[Frontend][3/N] Improve all pooling task | Support binary embedding response #27066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
32e102c    to
    92a6d2e      
    Compare
  
    223eb21    to
    e54fe8b      
    Compare
  
    Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
| This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: wang.yuqi <noooop@126.com>
| Documentation preview: https://vllm--27066.org.readthedocs.build/en/27066/ | 
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
| /gemini review | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an excellent feature for improving performance by supporting binary embedding responses, including a new bytes encoding format and endianness customization. The refactoring of serialization logic into vllm/utils/serial_utils.py is a positive structural change.
However, my review has identified a critical data corruption bug in the new serialization logic specifically for bfloat16 tensors. The current implementation incorrectly reinterprets bfloat16 bit patterns as float16, which leads to corrupted data. This issue is not caught by the existing tests because they only verify round-trip consistency, where the symmetric corruption cancels itself out. I've provided a detailed comment with a code suggestion to fix this critical bug.
Signed-off-by: wang.yuqi <noooop@126.com>
| Are there any more modifications needed for this PR? | 
…esponse (vllm-project#27066) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…esponse (vllm-project#27066) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…esponse (vllm-project#27066) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…esponse (vllm-project#27066) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…esponse (vllm-project#27066) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
TL;DR
Improve all pooling task
These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?
Purpose
Thanks @uasan for this cool idea
Fix #27063
cc @christian-pinto @maxdebayser @DarkLight1337
Test Plan
tests/utils_/test_serial_utils.py
tests/entrypoints/pooling/openai/test_embedding.py
tests/entrypoints/pooling/openai/test_pooling.py
Test Result
pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.