[Model] Support TP/PP/mamba2 kernel for PLaMo2 (#19674)

Alnusjaponica · nopperl · Cecilwang · web-flow · commit c7ffe93d9c13 · 2025-07-28T05:00:47.000Z
Signed-off-by: Shinichi Hemmi &lt;shemmi@preferred.jp&gt;
Signed-off-by: Shinichi Hemmi &lt;50256998+Alnusjaponica@users.noreply.github.com&gt;
Co-authored-by: Calvin Metzger &lt;metzger@preferred.jp&gt;
Co-authored-by: Sixue Wang &lt;cecilwang@preferred.jp&gt;
diff --git a/docs/models/supported_models.md b/docs/models/supported_models.md
@@ -389,7 +389,7 @@ th {
 | `PhiMoEForCausalLM` | Phi-3.5-MoE | `microsoft/Phi-3.5-MoE-instruct`, etc. | ✅︎ | ✅︎ | ✅︎ |
 | `Phi4FlashForCausalLM` | Phi-4-mini-flash-reasoning | `microsoft/microsoft/Phi-4-mini-instruct`, etc. | | | |
 | `PersimmonForCausalLM` | Persimmon | `adept/persimmon-8b-base`, `adept/persimmon-8b-chat`, etc. | | ✅︎ | ✅︎ |
-| `Plamo2ForCausalLM` | PLaMo2 | `pfnet/plamo-2-1b`, `pfnet/plamo-2-8b`, etc. | | | |
+| `Plamo2ForCausalLM` | PLaMo2 | `pfnet/plamo-2-1b`, `pfnet/plamo-2-8b`, etc. | | ✅︎ | |
 | `QWenLMHeadModel` | Qwen | `Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc. | ✅︎ | ✅︎ | ✅︎ |
 | `Qwen2ForCausalLM` | QwQ, Qwen2 | `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc. | ✅︎ | ✅︎ | ✅︎ |
 | `Qwen2MoeForCausalLM` | Qwen2MoE | `Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc. | ✅︎ | ✅︎ | ✅︎ |
diff --git a/tests/distributed/test_pipeline_parallel.py b/tests/distributed/test_pipeline_parallel.py
@@ -175,6 +175,7 @@ def iter_params(self, model_id: str):
     "internlm/internlm2-chat-7b": PPTestSettings.fast(),
     "inceptionai/jais-13b-chat": PPTestSettings.fast(),
     "ai21labs/Jamba-tiny-dev": PPTestSettings.fast(),
+    "pfnet/plamo-2-1b": PPTestSettings.fast(),
     "meta-llama/Llama-3.2-1B-Instruct": PPTestSettings.detailed(),
     # Tests TransformersForCausalLM
     "hmellor/Ilama-3.2-1B": PPTestSettings.fast(),
diff --git a/tests/quantization/test_experts_int8.py b/tests/quantization/test_experts_int8.py
@@ -9,7 +9,7 @@
 
 from tests.quantization.utils import is_quant_method_supported
 
-MODELS = ["ai21labs/Jamba-tiny-random"]
+MODELS = ["ai21labs/Jamba-tiny-random", "pfnet/plamo-2-1b"]
 
 
 @pytest.mark.skipif(not is_quant_method_supported("experts_int8"),
diff --git a/vllm/model_executor/models/plamo2.py b/vllm/model_executor/models/plamo2.py