You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy file name to clipboardExpand all lines: docs/source/models/supported_models.md
+39-2Lines changed: 39 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -263,10 +263,15 @@ See [this page](#generative-models) for more information on how to use generativ
263
263
* ✅︎
264
264
* ✅︎
265
265
-*`Gemma2ForCausalLM`
266
-
*Gemma2
266
+
*Gemma 2
267
267
*`google/gemma-2-9b`, `google/gemma-2-27b`, etc.
268
268
* ✅︎
269
269
* ✅︎
270
+
-*`Gemma3ForCausalLM`
271
+
* Gemma 3
272
+
*`google/gemma-3-1b-it`, etc.
273
+
* ✅︎
274
+
* ✅︎
270
275
-*`GlmForCausalLM`
271
276
* GLM-4
272
277
*`THUDM/glm-4-9b-chat-hf`, etc.
@@ -504,7 +509,7 @@ you should explicitly specify the task type to ensure that the model is used in
504
509
*
505
510
*
506
511
-*`Gemma2Model`
507
-
*Gemma2-based
512
+
*Gemma 2-based
508
513
*`BAAI/bge-multilingual-gemma2`, etc.
509
514
*
510
515
* ✅︎
@@ -752,6 +757,13 @@ See [this page](#generative-models) for more information on how to use generativ
752
757
*
753
758
* ✅︎
754
759
* ✅︎
760
+
-*`Gemma3ForConditionalGeneration`
761
+
* Gemma 3
762
+
* T + I<sup>+</sup>
763
+
*`google/gemma-3-4b-it`, `google/gemma-3-27b-it`, etc.
764
+
* ✅︎
765
+
* ✅︎
766
+
* ✅︎\*
755
767
-*`GLM4VForCausalLM`<sup>^</sup>
756
768
* GLM-4V
757
769
* T + I
@@ -937,6 +949,31 @@ For more details, please see: <gh-pr:4087#issuecomment-2250397630>
937
949
To use Qwen2.5-VL series models, you have to install Hugging Face Transformers library from source via `pip install git+https://github.com/huggingface/transformers`.
938
950
:::
939
951
952
+
:::{note}
953
+
To use Gemma3 series models, you have to install Hugging Face Transformers library from source via
The earliest commit that supports this is [`50d3530aa04e7a7d003e6b255a98f79fd0447357`](https://github.com/huggingface/transformers/commit/50d3530aa04e7a7d003e6b255a98f79fd0447357).
956
+
957
+
Both V0 and V1 support `Gemma3ForConditionalGeneration` for text-only inputs.
958
+
However, there are differences in how they handle text + image inputs:
959
+
960
+
V0 correctly implements the model's attention pattern:
961
+
- Uses bidirectional attention between the image tokens corresponding to the same image
962
+
- Uses causal attention for other tokens
963
+
- Implemented via (naive) PyTorch SDPA with masking tensors
964
+
- Note: May use significant memory for long prompts with image
965
+
966
+
V1 currently uses a simplified attention pattern:
967
+
- Uses causal attention for all tokens, including image tokens
968
+
- Generates reasonable outputs but does not match the original model's attention for text + image inputs
969
+
- Will be updated in the future to support the correct behavior
970
+
971
+
This limitation exists because the model's mixed attention pattern (bidirectional for images, causal otherwise) is not yet supported by vLLM's attention backends.
972
+
973
+
Additionally, vLLM's current Gemma 3 implementation does not support the pan-and-scan image pre-processing algorithm, which helps handle images with skewed aspect ratios by intelligently cropping them into multiple views.
974
+
Without this feature, model performance may degrade when processing images that deviate significantly from square dimensions.
975
+
:::
976
+
940
977
### Pooling Models
941
978
942
979
See [this page](pooling-models) for more information on how to use pooling models.
0 commit comments