320320}
321321</style >
322322
323- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
323+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
324324| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
325325| ` AquilaForCausalLM ` | Aquila, Aquila2 | ` BAAI/Aquila-7B ` , ` BAAI/AquilaChat-7B ` , etc. | ✅︎ | ✅︎ | ✅︎ |
326326| ` ArceeForCausalLM ` | Arcee (AFM) | ` arcee-ai/AFM-4.5B-Base ` , etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -426,7 +426,7 @@ See [this page](./pooling_models.md) for more information on how to use pooling
426426
427427These models primarily support the [ ` LLM.embed ` ] ( ./pooling_models.md#llmembed ) API.
428428
429- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
429+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
430430| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
431431| ` BertModel ` <sup >C</sup > | BERT-based | ` BAAI/bge-base-en-v1.5 ` , ` Snowflake/snowflake-arctic-embed-xs ` , etc. | | | |
432432| ` Gemma2Model ` <sup >C</sup > | Gemma 2-based | ` BAAI/bge-multilingual-gemma2 ` , etc. | ✅︎ | | ✅︎ |
@@ -466,7 +466,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
466466
467467These models primarily support the [ ` LLM.classify ` ] ( ./pooling_models.md#llmclassify ) API.
468468
469- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
469+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
470470| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
471471| ` JambaForSequenceClassification ` | Jamba | ` ai21labs/Jamba-tiny-reward-dev ` , etc. | ✅︎ | ✅︎ | |
472472| ` GPT2ForSequenceClassification ` | GPT2 | ` nie3e/sentiment-polish-gpt2-small ` | | | ✅︎ |
@@ -483,7 +483,7 @@ If your model is not in the above list, we will try to automatically convert the
483483Cross-encoder and reranker models are a subset of classification models that accept two prompts as input.
484484These models primarily support the [ ` LLM.score ` ] ( ./pooling_models.md#llmscore ) API.
485485
486- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
486+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
487487| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
488488| ` BertForSequenceClassification ` | BERT-based | ` cross-encoder/ms-marco-MiniLM-L-6-v2 ` , etc. | | | |
489489| ` GemmaForSequenceClassification ` | Gemma-based | ` BAAI/bge-reranker-v2-gemma ` (see note), etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -521,7 +521,7 @@ These models primarily support the [`LLM.score`](./pooling_models.md#llmscore) A
521521
522522These models primarily support the [ ` LLM.reward ` ] ( ./pooling_models.md#llmreward ) API.
523523
524- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
524+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
525525| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
526526| ` InternLM2ForRewardModel ` | InternLM2-based | ` internlm/internlm2-1_8b-reward ` , ` internlm/internlm2-7b-reward ` , etc. | ✅︎ | ✅︎ | ✅︎ |
527527| ` LlamaForCausalLM ` <sup >C</sup > | Llama-based | ` peiyi9979/math-shepherd-mistral-7b-prm ` , etc. | ✅︎ | ✅︎ | ✅︎ |
@@ -594,7 +594,7 @@ See [this page](generative_models.md) for more information on how to use generat
594594
595595These models primarily accept the [ ` LLM.generate ` ] ( ./generative_models.md#llmgenerate ) API. Chat/Instruct models additionally support the [ ` LLM.chat ` ] ( ./generative_models.md#llmchat ) API.
596596
597- | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
597+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
598598| --------------| --------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
599599| ` AriaForConditionalGeneration ` | Aria | T + I<sup >+</sup > | ` rhymes-ai/Aria ` | | | ✅︎ |
600600| ` AyaVisionForConditionalGeneration ` | Aya Vision | T + I<sup >+</sup > | ` CohereForAI/aya-vision-8b ` , ` CohereForAI/aya-vision-32b ` , etc. | | ✅︎ | ✅︎ |
@@ -647,7 +647,7 @@ These models primarily accept the [`LLM.generate`](./generative_models.md#llmgen
647647
648648Some models are supported only via the [ Transformers backend] ( #transformers ) . The purpose of the table below is to acknowledge models which we officially support in this way. The logs will say that the Transformers backend is being used, and you will see no warning that this is fallback behaviour. This means that, if you have issues with any of the models listed below, please [ make an issue] ( https://github.com/vllm-project/vllm/issues/new/choose ) and we'll do our best to fix it!
649649
650- | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
650+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
651651| --------------| --------| --------| -------------------| -----------------------------| -----------------------------------------| ---------------------|
652652| ` Emu3ForConditionalGeneration ` | Emu3 | T + I | ` BAAI/Emu3-Chat-hf ` | ✅︎ | ✅︎ | ✅︎ |
653653
@@ -726,7 +726,7 @@ Some models are supported only via the [Transformers backend](#transformers). Th
726726
727727Speech2Text models trained specifically for Automatic Speech Recognition.
728728
729- | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
729+ | Architecture | Models | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
730730| --------------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
731731| ` WhisperForConditionalGeneration ` | Whisper | ` openai/whisper-small ` , ` openai/whisper-large-v3-turbo ` , etc. | | | |
732732| ` VoxtralForConditionalGeneration ` | Voxtral (Mistral format) | ` mistralai/Voxtral-Mini-3B-2507 ` , ` mistralai/Voxtral-Small-24B-2507 ` , etc. | | ✅︎ | ✅︎ |
@@ -744,7 +744,7 @@ These models primarily support the [`LLM.embed`](./pooling_models.md#llmembed) A
744744
745745The following table lists those that are tested in vLLM.
746746
747- | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/distributed_serving .md ) | [ V1] ( gh-issue:8779 ) |
747+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] ( ../features/lora.md ) | [ PP] ( ../serving/parallelism_scaling .md ) | [ V1] ( gh-issue:8779 ) |
748748| --------------| --------| --------| -------------------| ----------------------| ---------------------------| ---------------------|
749749| ` LlavaNextForConditionalGeneration ` <sup >C</sup > | LLaVA-NeXT-based | T / I | ` royokong/e5-v ` | | | |
750750| ` Phi3VForCausalLM ` <sup >C</sup > | Phi-3-Vision-based | T + I | ` TIGER-Lab/VLM2Vec-Full ` | 🚧 | ✅︎ | |
@@ -760,7 +760,7 @@ The following table lists those that are tested in vLLM.
760760Cross-encoder and reranker models are a subset of classification models that accept two prompts as input.
761761These models primarily support the [ ` LLM.score ` ] ( ./pooling_models.md#llmscore ) API.
762762
763- | Architecture | Models | Inputs | Example HF Models | [ LoRA] [ lora-adapter ] | [ PP] [ distributed-serving ] | [ V1] ( gh-issue:8779 ) |
763+ | Architecture | Models | Inputs | Example HF Models | [ LoRA] [ lora-adapter ] | [ PP] [ parallelism-scaling ] | [ V1] ( gh-issue:8779 ) |
764764| -------------------------------------| --------------------| ----------| --------------------------| ------------------------| -----------------------------| -----------------------|
765765| ` JinaVLForSequenceClassification ` | JinaVL-based | T + I<sup >E+</sup > | ` jinaai/jina-reranker-m0 ` , etc. | | | ✅︎ |
766766
0 commit comments