-
Notifications
You must be signed in to change notification settings - Fork 543
Open
Labels
guideguide noteguide note
Description
Official Issue Index
This issue tracks all the useful info from vllm-ascend project. You can find more info below. Please note that the info is mainly for developers, you can click to related issue for the detail. For end users, the official doc may be more useful.
Model support: #1608
CI index
| Name | Results | Owner |
|---|---|---|
| e2e test / basic | @Potabk | |
| e2e test / doctest | @Yikun | |
| Benchmarks / Performance | @Potabk | |
| Accuracy test | @zhangxinyuehfad |
Useful Infomation
Q3 RoadMap Feedback
RFC
- [PD Disaggregation] [RFC]: P/D Disaggregation Support #841
- [Lora and MultiLora] [RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396
- [Custom Ops] [RFC]: Add support for custom ops #156
- [W8A8 Quantization] [RFC]: Add w8a8 Quantization #453
- [DP] [RFC]: Support Multi-node Server for Data Parallel Attention on V1 #649
- [Multi Step Scheduler] [RFC]: Custom Ascendc Kernel Of 'Prepare Input' in Multi-Step Feature. #807
- [Governance] [RFC]: vLLM Ascend Governance | Mechanics #828
- [E2E Test] [RFC]: E2E CI test for key features #413
- [Unit Test] [RFC]: Unit test coverage improvement #1298
- [Doc Improvement] [RFC]: Doc enhancement #1248
Feature Guide
- [Prefix Cache and Chunked Prefill] [Feature]: prefix cache and chunk prefill #323 [Guide]: Usage on auto prefix caching #732
- [Guided Decode] [Feature]: Add Support for Guided Decoding (Structured Output) #177
- [V1 Engine] [Guide] V1 Engine #414
- [sleep mode] [Guide]: Sleep mode feature guide #733
- [spec decode and MTP] [Guide]: Usage on Speculative Decoding and MTP #734
- [graph mode] [Guide]: Usage on Graph mode #767
- [PD Disaggregation] [Guide]: How to use disaggregated_prefill #857
Performance guide
- [v0.9.0rc2 performance] [Guide]: Benchmark on v0.9.0rc2 #1167
- [v0.7.3.post1 performance] [Guide][Performance]: vLLM Ascend v0.7.3.post1 benchmark for Qwen3 #1025
- [0.7.3 performance] [Guide][Performance]: vllm-ascend v0.7.3 release performance benchmark #776
- [0.7.3 accuracy] [Accuracy]: vllm-ascend v0.7.3 release accuarcy report #790
- Benmark results (vllm-ascend / mindie-turbo / optimized torch): [Doc][0.7.3] Add performance tuning docs #878 (comment)
- vLLM Perf Guide: [Guide]: How to quickly run a perf benchmark to determine if performance has improved #864
- vLLM Ascend Benmark: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/evaluation/performance_benchmark.html
Workflow
- [2025 Q2 RoadMap] vLLM Ascend Roadmap Q2 2025 #448
Release Feedback
- [v0.7.3] [v0.7.3] FAQ / Feedback | 问题/反馈 #848
- [v0.8.5rc1] [v0.8.5rc1] FAQ / Feedback | 问题/反馈 #754
Help Wanted
Ask for Model
- [Pooling Model] [New Model]: BAAI/bge-m3 #235
- [rhymes-ai/Aria] [New Model]: Add support for rhymes-ai/Aria #337
- [GLM4] [Bug]: Start failed for using glm-4-9b-chat #309 [Feature]: GLM-4-32B-0414 #686
- [mistralai/Mistral-Small-3.1] [New Model]: mistralai/Mistral-Small-3.1-24B-Instruct-2503 #358
- [google/gemma-3] [New Model]: google/gemma-3-27b-it #359
- [MiniMax/MiniMax-Text-01] [New Model]: MiniMax/MiniMax-Text-01 #360
- [LLaVA 1.6] [Model]: LLaVA 1.6 #553
- [ChatGLM] [Model]: ChatGLM #554
- [Florence2] [New Model]: Florence2 #742
- [Baichuan for 0.7.3 v1 engine][Bug]: vllm 0.7.3 v1 engine do not support Baichuan model #866
Ask for feature
- [Quantization] [Feature]: vllm-ascend不支持llmcompressor生成的量化权重 #547 [Feature]: Supporting W8A16 and W4A16 weight-only quantization #524
- [hccl] [Feature]: Co-locating NPU support for GRPO training with trl #467 [Bug]: HCCL runtime error while GRPO training with co-locating vllm inference across multiple NPUs. #486
- [yarn rope] [Feature]: supported yarn rope-scaling #760
Ask for fix
- [MultiModal] [Deprecation]: Remove legacy input mapper/processor from V0 #673
- [eval doc] [Doc]: developer_guide/evaluation/using_lm_eval.html #607
- [log improve] [Bug]: vllm wrong raised the ERROR : Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C' #703
- [v0 scheduler with v1 engine] [Bug]: V0 Scheduler is incapable of the newest KVCacheManager interface in vllm main branch code #861
gameofdimension, shen-shanshan, TaeyeonWm, Yikun and thincalthincal
Metadata
Metadata
Assignees
Labels
guideguide noteguide note