[Docs] Add official doc index (#29)

wangxiyuan · web-flow · commit 51eadc68b91f · 2025-02-11T12:00:27.000+08:00
Add official doc index. Move the release content to the right place.

Signed-off-by: wangxiyuan &lt;wangxiyuan1007@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -31,20 +31,11 @@ This plugin is the recommended approach for supporting the Ascend backend within
 By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
 
 ## Prerequisites
-### Support Devices
-- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
-- Atlas 800I A2 Inference series (Atlas 800I A2)
 
-### Dependencies
-| Requirement | Supported version | Recommended version | Note                                     |
-|-------------|-------------------| ----------- |------------------------------------------|
-| vLLM        | main              | main | Required for vllm-ascend                 |
-| Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) | Required for vllm                        |
-| CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu   |
-| torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | Required for vllm-ascend                 |
-| torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      | Required for torch-npu and vllm |
+- Hardware: Atlas 800I A2 Inference series, Atlas A2 Training series
+- Software: vLLM (the same version as vllm-ascned), Python >= 3.9, CANN >= 8.0.RC2, PyTorch >= 2.4.0, torch-npu >= 2.4.0
 
-Find more about how to setup your environment in [here](docs/environment.md).
+Find more about how to setup your environment step by step in [here](docs/installation.md).
 
 ## Getting Started
 
@@ -73,78 +64,14 @@ Run the following command to start the vLLM server with the [Qwen/Qwen2.5-0.5B-I
 vllm serve Qwen/Qwen2.5-0.5B-Instruct
 curl http://localhost:8000/v1/models
 ```
-
-Please refer to [vLLM Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for more details.
-
-## Building
-
-#### Build Python package from source
-
-```bash
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-pip install -e .
-```
-
-#### Build container image from source
-```bash
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-docker build -t vllm-ascend-dev-image -f ./Dockerfile .
-```
-
-See [Building and Testing](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.
-
-## Feature Support Matrix
-| Feature | Supported | Note |
-|---------|-----------|------|
-| Chunked Prefill | ✗ | Plan in 2025 Q1 |
-| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 |
-| LoRA | ✗ | Plan in 2025 Q1 |
-| Prompt adapter | ✅ ||
-| Speculative decoding | ✅ | Impore accuracy in 2025 Q1|
-| Pooling | ✗ | Plan in 2025 Q1 |
-| Enc-dec | ✗ | Plan in 2025 Q1 |
-| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
-| LogProbs | ✅ ||
-| Prompt logProbs | ✅ ||
-| Async output | ✅ ||
-| Multi step scheduler | ✅ ||
-| Best of | ✅ ||
-| Beam search | ✅ ||
-| Guided Decoding | ✗ | Plan in 2025 Q1 |
-
-## Model Support Matrix
-
-The list here is a subset of the supported models. See [supported_models](docs/supported_models.md) for more details:
-| Model | Supported | Note |
-|---------|-----------|------|
-| Qwen 2.5 | ✅ ||
-| Mistral |  | Need test |
-| DeepSeek v2.5 | |Need test |
-| LLama3.1/3.2 | ✅ ||
-| Gemma-2 |  |Need test|
-| baichuan |  |Need test|
-| minicpm |  |Need test|
-| internlm | ✅ ||
-| ChatGLM | ✅ ||
-| InternVL 2.5 | ✅ ||
-| Qwen2-VL | ✅ ||
-| GLM-4v |  |Need test|
-| Molomo | ✅ ||
-| LLaVA 1.5 | ✅ ||
-| Mllama |  |Need test|
-| LLaVA-Next |  |Need test|
-| LLaVA-Next-Video |  |Need test|
-| Phi-3-Vison/Phi-3.5-Vison |  |Need test|
-| Ultravox |  |Need test|
-| Qwen2-Audio | ✅ ||
+**Please refer to [Official Docs](./docs/index.md) for more details.**
 
 ## Contributing
+See [CONTRIBUTING](./CONTRIBUTING.md) for more details, which is a step-by-step guide to help you set up development environment, build and test.
+
 We welcome and value any contributions and collaborations:
 - Please feel free comments [here](https://github.com/vllm-project/vllm-ascend/issues/19) about your usage of vLLM Ascend Plugin.
 - Please let us know if you encounter a bug by [filing an issue](https://github.com/vllm-project/vllm-ascend/issues).
-- Please see the guidance on how to contribute in [CONTRIBUTING.md](./CONTRIBUTING.md).
 
 ## License
 
diff --git a/README.zh.md b/README.zh.md
@@ -30,21 +30,12 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的
 
 使用 vLLM 昇腾插件，可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。
 
-## 前提
-### 支持的设备
-- Atlas A2 训练系列 (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
-- Atlas 800I A2 推理系列 (Atlas 800I A2)
-
-### 依赖
-| 需求 | 支持的版本 | 推荐版本 | 注意                                     |
-|-------------|-------------------| ----------- |------------------------------------------|
-| vLLM        | main              | main |  vllm-ascend 依赖                 |
-| Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) |  vllm 依赖                       |
-| CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) |  vllm-ascend and torch-npu 依赖  |
-| torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | vllm-ascend 依赖                |
-| torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      |  torch-npu and vllm 依赖 |
-
-在[此处](docs/environment.zh.md)了解更多如何配置您环境的信息。
+## 准备
+
+- 硬件：Atlas 800I A2 Inference系列、Atlas A2 Training系列
+- 软件：vLLM（与vllm-ascn​​ed版本相同），Python >= 3.9，CANN >= 8.0.RC2，PyTorch >= 2.4.0，torch-npu >= 2.4.0
+
+在[此处](docs/installation.md) 中查找有关如何逐步设置环境的更多信息。
 
 ## 开始使用
 
@@ -74,78 +65,14 @@ vllm serve Qwen/Qwen2.5-0.5B-Instruct
 curl http://localhost:8000/v1/models
 ```
 
-请参阅 [vLLM 快速入门](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)以获取更多详细信息。
-
-## 构建
-
-#### 从源码构建Python包
-
-```bash
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-pip install -e .
-```
-
-#### 构建容器镜像
-```bash
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-docker build -t vllm-ascend-dev-image -f ./Dockerfile .
-```
-
-查看[构建和测试](./CONTRIBUTING.zh.md)以获取更多详细信息，其中包含逐步指南，帮助您设置开发环境、构建和测试。
-
-## 特性支持矩阵
-| Feature | Supported | Note |
-|---------|-----------|------|
-| Chunked Prefill | ✗ | Plan in 2025 Q1 |
-| Automatic Prefix Caching | ✅ | Imporve performance in 2025 Q1 |
-| LoRA | ✗ | Plan in 2025 Q1 |
-| Prompt adapter | ✅ ||
-| Speculative decoding | ✅ | Impore accuracy in 2025 Q1|
-| Pooling | ✗ | Plan in 2025 Q1 |
-| Enc-dec | ✗ | Plan in 2025 Q1 |
-| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
-| LogProbs | ✅ ||
-| Prompt logProbs | ✅ ||
-| Async output | ✅ ||
-| Multi step scheduler | ✅ ||
-| Best of | ✅ ||
-| Beam search | ✅ ||
-| Guided Decoding | ✗ | Plan in 2025 Q1 |
-
-## 模型支持矩阵
-
-此处展示了部分受支持的模型。有关更多详细信息，请参阅 [supported_models](docs/supported_models.md)：
-| Model | Supported | Note |
-|---------|-----------|------|
-| Qwen 2.5 | ✅ ||
-| Mistral |  | Need test |
-| DeepSeek v2.5 | |Need test |
-| LLama3.1/3.2 | ✅ ||
-| Gemma-2 |  |Need test|
-| baichuan |  |Need test|
-| minicpm |  |Need test|
-| internlm | ✅ ||
-| ChatGLM | ✅ ||
-| InternVL 2.5 | ✅ ||
-| Qwen2-VL | ✅ ||
-| GLM-4v |  |Need test|
-| Molomo | ✅ ||
-| LLaVA 1.5 | ✅ ||
-| Mllama |  |Need test|
-| LLaVA-Next |  |Need test|
-| LLaVA-Next-Video |  |Need test|
-| Phi-3-Vison/Phi-3.5-Vison |  |Need test|
-| Ultravox |  |Need test|
-| Qwen2-Audio | ✅ ||
-
+**请参阅 [官方文档](./docs/index.md)以获取更多详细信息**
 
 ## 贡献
+有关更多详细信息，请参阅 [CONTRIBUTING](./CONTRIBUTING.md)，可以更详细的帮助您部署开发环境、构建和测试。
+
 我们欢迎并重视任何形式的贡献与合作：
 - 您可以在[这里](https://github.com/vllm-project/vllm-ascend/issues/19)反馈您的使用体验。
 - 请通过[提交问题](https://github.com/vllm-project/vllm-ascend/issues)来告知我们您遇到的任何错误。
-- 请参阅 [CONTRIBUTING.zh.md](./CONTRIBUTING.zh.md) 中的贡献指南。
 
 ## 许可证
 
diff --git a/docs/environment.zh.md b/docs/environment.zh.md
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,15 @@
+# Ascend plugin for vLLM
+vLLM Ascend plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU.
+
+This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162), providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM.
+
+By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
+
+## Contents
+
+- [Quick Start](./quick_start.md)
+- [Installation](./installation.md)
+- Usage
+  - [Running vLLM with Ascend](./usage/running_vllm_with_ascend.md)
+  - [Feature Support](./usage/feature_support.md)
+  - [Supported Models](./usage/supported_models.md)
diff --git a/docs/installation.md b/docs/installation.md
@@ -1,3 +1,23 @@
+# Installation
+
+
+## Building
+
+#### Build Python package from source
+
+```bash
+git clone https://github.com/vllm-project/vllm-ascend.git
+cd vllm-ascend
+pip install -e .
+```
+
+#### Build container image from source
+```bash
+git clone https://github.com/vllm-project/vllm-ascend.git
+cd vllm-ascend
+docker build -t vllm-ascend-dev-image -f ./Dockerfile .
+```
+
 ### Prepare Ascend NPU environment
 
 ### Dependencies
diff --git a/docs/quick_start.md b/docs/quick_start.md
@@ -0,0 +1,17 @@
+# Quick Start
+
+## Prerequisites
+### Support Devices
+- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
+- Atlas 800I A2 Inference series (Atlas 800I A2)
+
+### Dependencies
+| Requirement | Supported version | Recommended version | Note                                     |
+|-------------|-------------------| ----------- |------------------------------------------|
+| vLLM        | main              | main | Required for vllm-ascend                 |
+| Python      | >= 3.9            | [3.10](https://www.python.org/downloads/) | Required for vllm                        |
+| CANN        | >= 8.0.RC2        | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu   |
+| torch-npu   | >= 2.4.0          | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1)    | Required for vllm-ascend                 |
+| torch       | >= 2.4.0          | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1)      | Required for torch-npu and vllm |
+
+Find more about how to setup your environment in [here](docs/environment.md).
diff --git a/docs/supported_models.md b/docs/supported_models.md
diff --git a/docs/usage/feature_support.md b/docs/usage/feature_support.md
@@ -0,0 +1,19 @@
+# Feature Support
+
+| Feature | Supported | Note |
+|---------|-----------|------|
+| Chunked Prefill | ✗ | Plan in 2025 Q1 |
+| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q1 |
+| LoRA | ✗ | Plan in 2025 Q1 |
+| Prompt adapter | ✅ ||
+| Speculative decoding | ✅ | Improve accuracy in 2025 Q1|
+| Pooling | ✗ | Plan in 2025 Q1 |
+| Enc-dec | ✗ | Plan in 2025 Q1 |
+| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
+| LogProbs | ✅ ||
+| Prompt logProbs | ✅ ||
+| Async output | ✅ ||
+| Multi step scheduler | ✅ ||
+| Best of | ✅ ||
+| Beam search | ✅ ||
+| Guided Decoding | ✗ | Plan in 2025 Q1 |
diff --git a/docs/usage/running_vllm_with_ascend.md b/docs/usage/running_vllm_with_ascend.md
@@ -0,0 +1 @@
+# Running vLLM with Ascend
diff --git a/docs/usage/supported_models.md b/docs/usage/supported_models.md
@@ -0,0 +1,24 @@
+# Supported Models
+
+| Model | Supported | Note |
+|---------|-----------|------|
+| Qwen 2.5 | ✅ ||
+| Mistral |  | Need test |
+| DeepSeek v2.5 | |Need test |
+| LLama3.1/3.2 | ✅ ||
+| Gemma-2 |  |Need test|
+| baichuan |  |Need test|
+| minicpm |  |Need test|
+| internlm | ✅ ||
+| ChatGLM | ✅ ||
+| InternVL 2.5 | ✅ ||
+| Qwen2-VL | ✅ ||
+| GLM-4v |  |Need test|
+| Molomo | ✅ ||
+| LLaVA 1.5 | ✅ ||
+| Mllama |  |Need test|
+| LLaVA-Next |  |Need test|
+| LLaVA-Next-Video |  |Need test|
+| Phi-3-Vison/Phi-3.5-Vison |  |Need test|
+| Ultravox |  |Need test|
+| Qwen2-Audio | ✅ ||