docs: Add Eagle quantization inheritance fix to CLAUDE.md registry

Pradyun Ramadorai · Pradyun Ramadorai · commit 160e85aad264 · 2025-09-28T08:40:29.000Z
Added comprehensive documentation for the critical Eagle draft model quantization inheritance fix to the MANTLE modifications registry: REGISTRY ENTRY vllm-project#16: Eagle Draft Model Quantization Inheritance Fix - Category: CRITICAL - Speculative Decoding Fix - Issue: Eagle draft model inheriting target model's MxFP4 quantization - Root Cause: Stricter upstream quantization validation caught config inheritance bug - Solution: Clone + override pattern with robust attribute checking - Impact: GPT-OSS (MxFP4) + Llama Eagle head (bf16) combinations KEY DOCUMENTATION: - Technical details of copy.deepcopy() clone + override approach - Robust attribute checking with hasattr() for missing quant_config attributes - Merge conflict resolution guidance for quantization config separation - Testing requirements for mixed quantization speculative decoding scenarios MERGE CONFLICT GUIDANCE: - Preserve copy.deepcopy() pattern for config separation - Maintain robust attribute checking for version compatibility - Verify Eagle speculative decoding with mixed quantization setups - Test GPT-OSS + Eagle head combinations after quantization merges This ensures future merges preserve the critical fix that enables mixed quantization speculative decoding scenarios. Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -441,6 +441,23 @@ else:                          # ✅ Now reachable
 - **MAINTAIN** CLI argument parser entry with deprecation notice
 - **VERIFY** existing scripts with `--max-seq-len-to-capture` don't break after merges
 
+### 16. Eagle Draft Model Quantization Inheritance Fix
+**Files**: `vllm/model_executor/models/llama_eagle.py` (Lines 150-175)
+**Category**: CRITICAL - Speculative Decoding Fix
+**Issue**: Eagle draft model inheriting target model's MxFP4 quantization causing "Mxfp4 linear layer is not implemented" error
+**Root Cause**: After upstream merge, stricter quantization validation caught bug where Eagle draft model incorrectly inherited target model's quantization config instead of using its own
+**Solution**: Create separate quantization config for draft model using clone + override pattern:
+- Use `copy.deepcopy(vllm_config)` to clone target config
+- Override `quant_config` with draft model's quantization settings
+- Add robust attribute checking with graceful fallbacks for missing quant_config
+- Handles cases where draft model or VllmConfig lacks quantization attributes
+**Impact**: GPT-OSS (MxFP4) + Llama Eagle head (bf16) speculative decoding combinations
+**Merge Conflict Resolution**:
+- **PRESERVE** copy.deepcopy() clone + override pattern for quantization config separation
+- **MAINTAIN** robust attribute checking with hasattr() for quant_config availability
+- **VERIFY** Eagle speculative decoding works with mixed quantization (target MxFP4, draft bf16)
+- **TEST** GPT-OSS + Eagle head combinations after any quantization-related merges
+
 ---
 
 ## Merge Conflict Resolution Methodology