-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
- Hardware: sg2044
- OS: LEulixOS 3.0
- Python: 3.11
- PyTorch: 2.8.0
- GCC: 15.1
- vLLM: commit 13dd93c
🐛 Describe the bug
vLLM fails to run on RISC-V architecture due to multiple unconditional imports of Intel Extension for PyTorch throughout the CPU backend, even though IPEX is Intel x86-specific and not available on RISC-V architectures.
export VLLM_ENABLE_V1_MULTIPROCESSING=0 VLLM_COMPILATION_LEVEL=0
vllm bench throughput \
--model Qwen/Qwen1.5-0.5B \
--input-len 128 \
--output-len 128 \
--enforce-eager --dtype float16 --max_model_len 4096 --max_num_batched_tokens 4096
Error message:[rank0]: File "/AI/hebo/vllm/vllm/v1/attention/backends/cpu_attn.py", line 595, in forward
[rank0]: import intel_extension_for_pytorch.llm.modules as ipex_modules
[rank0]: ModuleNotFoundError: No module named 'intel_extension_for_pytorch'
Root cause analysis:
1. Architecture detection works correctly: The RISC-V architecture is now properly detected after recent additions to CpuArchEnum.RISCV.
2. IPEX availability check exists but is inconsistently applied: The code correctly checks IPEX availability at module level cpu_attn.py and sets _use_ipex = False when IPEX is not available.
3. Platform configuration assumes IPEX: CPU platform configuration checks IPEX availability but doesn't prevent IPEX-dependent code paths from executing
Expected behavior:
The CPU backend should conditionally use IPEX features only when:
- IPEX is available _use_ipex = True
- The CPU architecture supports it primarily x86
- Gracefully fall back to non-IPEX implementations on other architectures
Current workarounds attempted all failed:
These configurations still trigger IPEX imports:
--enable-chunked-prefill=False
--quantization=None
--enforce-eager
Impact:
This affects all non-Intel CPU architectures where IPEX is not available, including:
- RISC-V architectures
- Some ARM implementations without IPEX support
- Other emerging CPU architectures
Suggested fixes:
1. Conditional IPEX imports: Wrap all IPEX imports with availability checks
2. Architecture-aware fallbacks: Implement non-IPEX code paths for non-x86 architectures
3. Platform-specific configuration: Disable IPEX-dependent features automatically on unsupported architectures
4. Consistent availability checking: Ensure _use_ipex flag is respected throughout the codebase
Additional context:
The issue is more widespread than initially thought - even with chunked prefill disabled, IPEX dependencies are triggered through quantization modules, MoE layers, and other CPU backend components. This suggests a systemic issue where the CPU backend assumes Intel architecture and IPEX availability.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working