Revert "Fix non-contiguous input passed to Marlin kernel (vllm-project#15319)" (vllm-project#15398)

tlrmchlsmth · Mu Huai · commit aa2fc04768fe · 2025-05-12T19:19:08.000+08:00
Signed-off-by: Mu Huai &lt;tianbowen.tbw@antgroup.com&gt;
diff --git a/vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py b/vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py
@@ -115,10 +115,6 @@ def apply_weights(self,
                       layer: torch.nn.Module,
                       x: torch.Tensor,
                       bias: Optional[torch.Tensor] = None) -> torch.Tensor:
-        # marlin requires contiguous memory layout
-        # prefix caching may cause x to be non-contiguous
-        x = x.contiguous()  # no-op if already contiguous
-
         c = self.config
         w_q, w_s, w_zp, w_gidx = self._get_weight_params(layer)