[Bug]: EAGLE incompatible w/ Compressed Tensors Quantized Target Model

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

When using an EAGLE head with a compressed tensors quantized model the acceptance rate silently fails to zero and therefore the perf completely degrades. 

**Root Cause**: EAGLE layers are registered as further layers in the target model. But in a compressed tensors checkpoint those are not part of the `ignore` list. Hence vLLM thinks the EAGLE layers are quantized, but they aren't. Surprisingly this does not give an error, but the acceptance rate essentially drops to zero as the EAGLE head is nor rubbish.

**The structural problem**: The bigger problem IMO is that the `EagleProposer` simply takes the `VllmConfig` from the target model. This is not robust whenever the draft model has some different configurations.. In principle that is also the root cause why we required the hacky fix here https://github.com/vllm-project/vllm/pull/25667 to use a non-mm drafter for a mm model.

**When did this happen?**
[Update]: The PR from which on the bug happens is here https://github.com/vllm-project/vllm/pull/24982 

## Steps to reproduce:
I am serving Llama 3.18B with the same EAGLE head and switching between BF16 model and the compressed tensors model.

```
# MODEL_PATH=meta-llama/Llama-3.1-8B-Instruct
MODEL_PATH=RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic# Hugging Face cache location

python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_PATH \
    --tensor-parallel-size 1 \
    --max-num-seqs 8 \
    --port 8088 \
    --served-model-name llama-3.1-EAGLE \
    --enforce-eager \
    --speculative_config '{"method": "eagle", "model": "yuhuili/EAGLE-LLaMA3.1-Instruct-8B", "num_speculative_tokens": 5}' \
    --no-enable-prefix-caching
```

I then invoke with some lines of Sonnet dataset and look at the logs.

### Desired behavior (BF16)
```
SpecDecoding metrics: 
  Mean acceptance length: 1.99, 
  Accepted throughput: 0.17 tokens/s, 
  Drafted throughput: 0.88 tokens/s, 
  Accepted: 162 tokens, 
  Drafted: 820 tokens, 
  Per-position acceptance rate: 0.616, 0.244, 0.104, 0.018, 0.006, 
  Avg Draft acceptance rate: 19.8%
```

### Bug (using compressed tensors model)

```
SpecDecoding metrics: 
  Mean acceptance length: 1.00, 
  Accepted throughput: 0.00 tokens/s, 
  Drafted throughput: 13.23 tokens/s, 
  Accepted: 0 tokens, 
  Drafted: 1990 tokens, 
  Per-position acceptance rate: 0.000, 0.000, 0.000, 0.000, 0.000, 
  Avg Draft acceptance rate: 0.0%
```

### Hacky fix (using the compressed tensors model)
We can fix this in a hacky way (just to illustrate the root cause) by adding 
```python
        # extract layer id
        layer_idx = int(layer_name.split("layers.")[-1].split(".")[0])
        if layer_idx >= 32:
            logger.warning_once(
                f"Setting {layer_name} because it seems part of EAGLE"
            )
            return None
```
into https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py#L633-L647 

then we obtain good acceptance rates again.

```
SpecDecoding metrics: 
  Mean acceptance length: 2.52, 
  Accepted throughput: 22.40 tokens/s, 
  Drafted throughput: 73.50 tokens/s, 
  Accepted: 224 tokens, 
  Drafted: 735 tokens, 
  Per-position acceptance rate: 0.714, 0.401, 0.224, 0.116, 0.068, 
  Avg Draft acceptance rate: 30.5%
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: EAGLE incompatible w/ Compressed Tensors Quantized Target Model #26402

Your current environment

🐛 Describe the bug

Steps to reproduce:

Desired behavior (BF16)

Bug (using compressed tensors model)

Hacky fix (using the compressed tensors model)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: EAGLE incompatible w/ Compressed Tensors Quantized Target Model #26402

Description

Your current environment

🐛 Describe the bug

Steps to reproduce:

Desired behavior (BF16)

Bug (using compressed tensors model)

Hacky fix (using the compressed tensors model)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions