- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.8k
Support embedding models in V1 #16188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
        
      
    
  
     Merged
                    Changes from all commits
      Commits
    
    
            Show all changes
          
          
            98 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      f36c4f9
              
                Remove guardrails that prevent V1 from trying to run embedding models
              
              
                maxdebayser acf4638
              
                hack v1 flash_attn to support encoder_only
              
              
                maxdebayser b13bbc0
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 8debea0
              
                Revert changes to disable kv caching for encoder-only models
              
              
                maxdebayser 8d97b9c
              
                Add pooling support in v1
              
              
                maxdebayser d60b22b
              
                First end-to-end working version of Bert embeddings in V1
              
              
                maxdebayser 6bebbb8
              
                Support warmup for pooling models in V1
              
              
                maxdebayser 6dafd71
              
                address review comments
              
              
                maxdebayser e2724a2
              
                address review comments
              
              
                maxdebayser 56ff6cd
              
                remove debug prints
              
              
                maxdebayser fc57edd
              
                address review comments
              
              
                maxdebayser 64a0e62
              
                Fix cross encoder models in V1 and enable tests for pooling models
              
              
                maxdebayser 4014d41
              
                address review comments
              
              
                maxdebayser 87a95a8
              
                Merge branch 'main' into v1_embeddings
              
              
                maxdebayser 902c129
              
                address review comments
              
              
                maxdebayser 2c68855
              
                re-enable large embedding models
              
              
                maxdebayser 8afd8f5
              
                address review comments
              
              
                maxdebayser 7762976
              
                Merge branch 'main' into v1_embeddings
              
              
                maxdebayser d7537ae
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser a9e7747
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 17520bd
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 90c611a
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser dec2441
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser a5e83f4
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 187f69b
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 69a0332
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser a9f1721
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 4b066a3
              
                fix merge problems
              
              
                maxdebayser 43a26dc
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser ca34513
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser bf3033d
              
                Fix missing qwen embedding model param
              
              
                maxdebayser 67bf727
              
                Make pooling params reach the pooling in V1
              
              
                maxdebayser 93b6361
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser d916b88
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser bad4211
              
                fix merge problems
              
              
                maxdebayser 35d9bd9
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser dcc6100
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser a4f85b5
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser a5f328a
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 7c5be88
              
                fix merge problem
              
              
                maxdebayser 29b75c9
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 6aa204c
              
                backport changes from the other PR
              
              
                maxdebayser e81470c
              
                fix merge errors
              
              
                maxdebayser 20e7140
              
                address review comments
              
              
                maxdebayser 6bc1e3d
              
                address review comments
              
              
                maxdebayser 22825bd
              
                simplify PR
              
              
                maxdebayser c889b2e
              
                fix mistake
              
              
                maxdebayser 24462e4
              
                workaround qwen model test issue
              
              
                maxdebayser b5f21f2
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 79d1b95
              
                revert unecessary change
              
              
                maxdebayser b3a0491
              
                remove duplicated code
              
              
                maxdebayser b4ab556
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 1a82e56
              
                remove encoder model support to simplify PR
              
              
                maxdebayser a66801b
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 660dd9c
              
                fix several tests
              
              
                maxdebayser 808c996
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser cdd70c9
              
                Fix test
              
              
                maxdebayser 0832115
              
                disable bert test
              
              
                maxdebayser 10bbf74
              
                fix tests
              
              
                maxdebayser ee892aa
              
                limit context length to fit test GPU
              
              
                maxdebayser 2e12eba
              
                limit context length to fit test GPU
              
              
                maxdebayser 14fcf24
              
                fix test
              
              
                maxdebayser 0624435
              
                fix test
              
              
                maxdebayser 706fdb2
              
                Merge branch 'main' into v1_embeddings
              
              
                22quinn 051f6d4
              
                Fix _construct_cached_request_state
              
              
                22quinn 214cf06
              
                Fix v1 tests
              
              
                22quinn 8193bd0
              
                Merge pull request #1 from 22quinn/v1_embeddings
              
              
                maxdebayser 65b8377
              
                fix test
              
              
                maxdebayser 33d7f74
              
                Merge branch 'v1_embeddings' of github.com:maxdebayser/vllm into v1_e…
              
              
                maxdebayser 4ee822a
              
                reduce max_model_len to fit in test gpu
              
              
                maxdebayser 7242731
              
                fix test
              
              
                maxdebayser a4f460b
              
                fix test
              
              
                maxdebayser 35ca640
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 17f6177
              
                fix test
              
              
                maxdebayser 3f0d42e
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 74d73cc
              
                use torch.split
              
              
                maxdebayser e6a66dc
              
                enable cuda graphs
              
              
                maxdebayser 4cca774
              
                fix unecessary config.py changes
              
              
                maxdebayser 8ef1982
              
                fix error message
              
              
                maxdebayser 28d00d1
              
                remove unused import
              
              
                maxdebayser e634f60
              
                fix docstring
              
              
                maxdebayser 053475c
              
                revert unnecessary code changes
              
              
                maxdebayser 6228f64
              
                remove debug prints
              
              
                maxdebayser 42c802a
              
                fix refactoring bug
              
              
                maxdebayser f771a19
              
                fix refactoring bug
              
              
                maxdebayser 02c47ad
              
                Fix default chunked prefill for pooling models
              
              
                maxdebayser 1fd252c
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser c5c0d97
              
                Revert handling of case that can never happen
              
              
                maxdebayser acfc9cc
              
                fix small bug
              
              
                maxdebayser 225b808
              
                fix small bugs
              
              
                maxdebayser 2b86c13
              
                fix silly mistake
              
              
                maxdebayser 2983252
              
                reduce memory usage for small ci gpus
              
              
                maxdebayser 58c556d
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 878d56a
              
                enable chunked prefill by default for models that support it
              
              
                maxdebayser 2db273f
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser 114af27
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser bc0219d
              
                address review comments
              
              
                maxdebayser 221f013
              
                Merge branch 'upstream_main' into v1_embeddings
              
              
                maxdebayser File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -68,6 +68,7 @@ def _run_incremental_decode(tokenizer, | |
| None, | ||
| params, | ||
| None, | ||
| None, | ||
| 0.0, | ||
| None, | ||
| cache_salt=None, | ||
|  | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.