fix: Restore critical USE_CUTLASS_MOE environment variable support

Pradyun Ramadorai · Pradyun Ramadorai · commit b03ac1d2ec2d · 2025-09-28T06:13:15.000Z
ISSUE: The USE_CUTLASS_MOE environment variable support (CLAUDE.md entry vllm-project#14) was lost during a previous merge, removing critical debugging/compatibility control. ROOT CAUSE: Upstream changes overwrote the Mantle modification that added environment variable control for CUTLASS MoE implementations. SOLUTION: Restored the missing environment variable logic: - Added `import os` to imports - Restored `default_use_cutlass` calculation with original conditions - Restored `USE_CUTLASS_MOE` environment variable with smart defaults: * USE_CUTLASS_MOE=1 forces CUTLASS MoE on (default when conditions met) * USE_CUTLASS_MOE=0 disables CUTLASS MoE, fallback to other implementations - Maintains backward compatibility with automatic detection CODE CHANGES: - File: `vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py` - Lines: 5 (import), 547-556 (environment variable logic) - Annotation: Added comprehensive Mantle modification comments for future merge guidance TESTING: Verified import functionality and environment variable integration. This fix enables debugging and compatibility control for CUTLASS MoE implementations as documented in CLAUDE.md registry entry vllm-project#14. Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
diff --git a/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py b/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
@@ -2,6 +2,7 @@
 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project
 
 import enum
+import os
 from enum import Enum
 from typing import Callable, Optional, Union
 
@@ -544,9 +545,16 @@ def __init__(
         # cutlass path
         self.is_fp8_w8a8_sm100 = quant_config._is_fp8_w8a8_sm100(
             self.weight_quant, self.input_quant)
-        self.use_cutlass = not self.block_quant and (
+        # ===== MANTLE CUSTOM MODIFICATION: USE_CUTLASS_MOE Environment Variable Support =====
+        # ISSUE: Need ability to disable CUTLASS MoE via environment variable for debugging/compatibility
+        # SOLUTION: Add USE_CUTLASS_MOE environment variable support with smart defaults
+        # MERGE CONFLICT: Preserve environment variable logic when merging CUTLASS changes
+        # CONTEXT: Enables debugging and compatibility control for CUTLASS MoE implementations
+        # =======================================
+        default_use_cutlass = not self.block_quant and (
             quant_config._is_fp8_w8a8_sm90(self.weight_quant, self.input_quant)
             or self.is_fp8_w8a8_sm100)
+        self.use_cutlass = bool(int(os.environ.get("USE_CUTLASS_MOE", 1 if default_use_cutlass else 0)))
         self.disable_expert_map = False
 
     def create_weights(self, layer: torch.nn.Module, num_experts: int,