Skip to content

vLLM Ascend Roadmap Q2 2025 #448

@Yikun

Description

@Yikun

This is a living document!


Our vision is to enable vLLM to run seamlessly on Ascend NPU. We are fully committed to making vLLM one of the best engines for Ascend NPU. In Q1 2025, we have provided initial support for vLLM on Ascend NPU.

In 2025 Q2, we will focus on 4 themes: vLLM Ascend for Production, Performance Optimization, Key Features, Ecosystem Connect.

1. Performance Optimization

We will focus on the performance optimization of dense models (Qwen/Llama/Qwen-VL) and MOE models (DeepSeek V3/R1), users of vLLM Ascend can trust its performance to be competitive for Ascend NPU.

2. vLLM Ascend for Production

Align with vLLM, vLLM Ascend is designed for production, the first official version for vLLM 0.7.3 will be published, we will also actively promote the key features of vLLM v0.8.x/v1 to production availability

3. Key Features

We will focus on the integration and support of key lifecycle workflow of model training (SFT / RL) and inference (singe node / distributed).

3.1 Workflows

Cluster Scale Serving

Core feature support

RLHF

3.2 Models support

3.3 User / Developer Experience

Distributions

Docs

Dashboard

3.4 Hardware support

4. Ecosystem Connect

It is key to seamlessly integrate key lifecycle components with vLLM Ascend, so we are also actively connecting with the ecosystem.


If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap: #71

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions