You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable engine-level arguments with speculators models
This commit implements enhanced engine layer detection for speculators models,
allowing users to apply engine arguments directly using simplified syntax:
```bash
vllm serve --seed 42 --tensor-parallel-size 4 "speculators-model"
```
Instead of verbose explicit configuration:
```bash
vllm serve --seed 42 --tensor-parallel-size 4 "target-model" \
--speculative-config '{"model": "speculators-model", "method": "eagle3", ...}'
```
## Key Changes
### Enhanced Engine Layer (`vllm/engine/arg_utils.py`)
- Modified `create_speculative_config()` to return tuple of (ModelConfig, SpeculativeConfig)
- Added automatic speculators model detection at model creation time
- Implemented proper model resolution: speculators model → target model
- Engine arguments now correctly applied to target model instead of speculators model
### Complete Algorithm Processing (`vllm/transformers_utils/configs/speculators/base.py`)
- Added `get_vllm_config()` method with full algorithm-specific processing
- Includes Eagle3 fields like draft_vocab_size, target_hidden_size
- Leverages existing validation and transformation infrastructure
## Benefits
- ✅ Proper architectural layering (engine layer handles model configuration)
- ✅ Complete algorithm-specific field processing
- ✅ Backward compatibility (existing workflows unchanged)
- ✅ Simplified user experience
- ✅ Single point of truth for speculative model logic
## Testing
- ✅ Speculators model: Auto-detection and target model resolution
- ✅ Regular model: No regression, normal serving unaffected
- ✅ Engine arguments correctly applied in both cases
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
0 commit comments