- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
          [Bugfix] Token type and position embeddings fail to be applied to inputs_embeds
          #25922
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
  
    [Bugfix] Token type and position embeddings fail to be applied to inputs_embeds
  
  #25922
              Conversation
…puts_embeds` Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
inputs_embeds
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an issue where inputs_embeds was not being handled properly in BertModel and RobertaModel, causing position and token type embeddings to be ignored. The changes align the behavior with the Hugging Face Transformers library.
However, I've identified a critical side effect introduced by these changes. The _decode_token_type_ids function, which is now called even when inputs_embeds are provided, modifies the input_ids tensor in-place. This can lead to silent data corruption and incorrect behavior if the input_ids tensor is reused. I've provided suggestions in both bert.py and roberta.py to fix this by avoiding the in-place modification when inputs_embeds are used. Addressing this is crucial for ensuring the correctness and predictability of the model's behavior.
| cc @qthequartermasterman can you weigh in on whether this is the intended definition of  | 
| I'll take a look later today. I'll need to review the transformers implementation. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks right to me! Thanks.
…puts_embeds` (vllm-project#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…puts_embeds` (#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…puts_embeds` (vllm-project#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…puts_embeds` (vllm-project#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…puts_embeds` (vllm-project#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…puts_embeds` (vllm-project#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>
…puts_embeds` (vllm-project#25922) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
This PR makes sure that the semantics of
input_embedsforBertEmbeddingandRobertaEmbeddingmatches the definition in Transformers library.Note: This by itself is not sufficient to enable prompt embedding inputs (i.e. user inputs
inputs_embedsdirectly), because we still need the information oftoken_type_ids.cc @maxdebayser @noooop
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.