Skip to content

Commit 160e85a

Browse files
author
Pradyun Ramadorai
committed
docs: Add Eagle quantization inheritance fix to CLAUDE.md registry
Added comprehensive documentation for the critical Eagle draft model quantization inheritance fix to the MANTLE modifications registry: REGISTRY ENTRY vllm-project#16: Eagle Draft Model Quantization Inheritance Fix - Category: CRITICAL - Speculative Decoding Fix - Issue: Eagle draft model inheriting target model's MxFP4 quantization - Root Cause: Stricter upstream quantization validation caught config inheritance bug - Solution: Clone + override pattern with robust attribute checking - Impact: GPT-OSS (MxFP4) + Llama Eagle head (bf16) combinations KEY DOCUMENTATION: - Technical details of copy.deepcopy() clone + override approach - Robust attribute checking with hasattr() for missing quant_config attributes - Merge conflict resolution guidance for quantization config separation - Testing requirements for mixed quantization speculative decoding scenarios MERGE CONFLICT GUIDANCE: - Preserve copy.deepcopy() pattern for config separation - Maintain robust attribute checking for version compatibility - Verify Eagle speculative decoding with mixed quantization setups - Test GPT-OSS + Eagle head combinations after quantization merges This ensures future merges preserve the critical fix that enables mixed quantization speculative decoding scenarios. Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
1 parent ac39ddb commit 160e85a

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

CLAUDE.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,23 @@ else: # ✅ Now reachable
441441
- **MAINTAIN** CLI argument parser entry with deprecation notice
442442
- **VERIFY** existing scripts with `--max-seq-len-to-capture` don't break after merges
443443

444+
### 16. Eagle Draft Model Quantization Inheritance Fix
445+
**Files**: `vllm/model_executor/models/llama_eagle.py` (Lines 150-175)
446+
**Category**: CRITICAL - Speculative Decoding Fix
447+
**Issue**: Eagle draft model inheriting target model's MxFP4 quantization causing "Mxfp4 linear layer is not implemented" error
448+
**Root Cause**: After upstream merge, stricter quantization validation caught bug where Eagle draft model incorrectly inherited target model's quantization config instead of using its own
449+
**Solution**: Create separate quantization config for draft model using clone + override pattern:
450+
- Use `copy.deepcopy(vllm_config)` to clone target config
451+
- Override `quant_config` with draft model's quantization settings
452+
- Add robust attribute checking with graceful fallbacks for missing quant_config
453+
- Handles cases where draft model or VllmConfig lacks quantization attributes
454+
**Impact**: GPT-OSS (MxFP4) + Llama Eagle head (bf16) speculative decoding combinations
455+
**Merge Conflict Resolution**:
456+
- **PRESERVE** copy.deepcopy() clone + override pattern for quantization config separation
457+
- **MAINTAIN** robust attribute checking with hasattr() for quant_config availability
458+
- **VERIFY** Eagle speculative decoding works with mixed quantization (target MxFP4, draft bf16)
459+
- **TEST** GPT-OSS + Eagle head combinations after any quantization-related merges
460+
444461
---
445462

446463
## Merge Conflict Resolution Methodology

0 commit comments

Comments
 (0)