rebase, improve doc, set CANN 8.1.rc1

Yikun · Yikun · commit 6435154c46f1 · 2025-05-01T13:22:02.000+08:00
Signed-off-by: Yikun Jiang &lt;yikunkero@gmail.com&gt;
diff --git a/.github/workflows/vllm_ascend_test.yaml b/.github/workflows/vllm_ascend_test.yaml
@@ -45,7 +45,8 @@ jobs:
     name: vLLM Ascend test v0.7.3-dev
     runs-on: linux-arm64-npu-1
     container:
-      image: quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
+      # TODO(yikun): Remove daocloud prefix after infra ready
+      image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
       env:
         HF_ENDPOINT: https://hf-mirror.com
         HF_TOKEN: ${{ secrets.HF_TOKEN }}
diff --git a/Dockerfile b/Dockerfile
@@ -15,7 +15,8 @@
 # limitations under the License.
 #
 
-FROM quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
+# TODO(yikun): Remove daocloud prefix after infra ready
+FROM m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
 
 ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
 
diff --git a/Dockerfile.openEuler b/Dockerfile.openEuler
@@ -15,7 +15,8 @@
 # This file is a part of the vllm-ascend project.
 #
 
-FROM quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
+# TODO(yikun): Remove daocloud prefix after infra ready
+FROM m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-openeuler22.03-py3.10
 
 ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
 
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -72,6 +72,10 @@
     # This value should be updated when cut down release.
     'pip_vllm_ascend_version': "0.7.3rc2",
     'pip_vllm_version': "0.7.3",
+    # The maching MindIE Turbo for vLLM Ascend
+    # TODO(yikun): confirm the version, the release version is 2.0.rc1,
+    # but I recommand the PyPi version follow PEP440
+    'pip_mindie_turbo_version': "2.0rc1",
     # CANN image tag
     'cann_image_tag': "8.0.0-910b-ubuntu22.04-py3.10",
 }
diff --git a/docs/source/installation.md b/docs/source/installation.md
@@ -68,10 +68,6 @@ docker run --rm \
 :animate: fade-in-slide-down
 You can also install CANN manually:
 
-```{note}
-This guide takes aarch64 as an example. If you run on x86, you need to replace `aarch64` with `x86_64` for the package name shown below.
-```
-
 ```bash
 # Create a virtual environment
 python -m venv vllm-ascend-env
@@ -138,15 +134,6 @@ pip install vllm==|pip_vllm_version|
 pip install vllm-ascend==|pip_vllm_ascend_version| --extra-index https://download.pytorch.org/whl/cpu/
 ```
 
-**Optional**
-Install MindIE Turbo for Performance acceleration:
-
-```{code-block} bash
-   :substitutions:
-# Install MindIE Turbo
-pip install vllm-ascend[mindie_turbo]==|pip_vllm_ascend_version| --extra-index https://download.pytorch.org/whl/cpu/
-```
-
 :::{dropdown} Click here to see "Build from source code"
 or build from **source code**:
 
@@ -163,15 +150,6 @@ git clone  --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-proj
 cd vllm-ascend
 pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
 ```
-
-**Optional**
-Install MindIE Turbo for Performance acceleration:
-
-```{code-block} bash
-   :substitutions:
-# Install MindIE Turbo
-pip install mindie_turbo
-```
 :::
 
 ::::
@@ -212,18 +190,20 @@ docker run --rm \
     -it $IMAGE bash
 ```
 
-```{note}
-1. vLLM and vLLM Ascend code are placed in `/vllm-workspace` in the docker image. And they are installed in develop mode so that the developer could easily modify the code.
+The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/vllm-workspace` and installed in [development mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip install -e`) to help developer immediately take place changes without requiring a new installation.
+::::
 
-2. The entrypath of the docker container is `/workspace`.
+:::::
 
-3. **Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
+## (Optional) Install MindIE Turbo
 
-```
+Install MindIE Turbo for performance acceleration:
 
-::::
+```{code-block} bash
+   :substitutions:
 
-:::::
+pip install mindie_turbo==|pip_vllm_ascend_version|
+```
 
 ## Extra information
 
diff --git a/docs/source/quick_start.md b/docs/source/quick_start.md
@@ -31,13 +31,16 @@ docker run --rm \
 -it $IMAGE bash
 ```
 
-```{note}
-1. vLLM and vLLM Ascend code are placed in `/vllm-workspace` in the docker image. And they are installed in develop mode so that the developer could easily modify the code.
+The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/vllm-workspace` and installed in [development mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip install -e`) to help developer immediately take place changes without requiring a new installation.
 
-2. The entrypath of the docker container is `/workspace`.
+## (Optional) Install MindIE Turbo
 
-3. **Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
+Install MindIE Turbo for performance acceleration:
 
+```{code-block} bash
+   :substitutions:
+
+pip install mindie_turbo==|pip_vllm_ascend_version|
 ```
 
 ## Usage
diff --git a/docs/source/tutorials/multi_node.md b/docs/source/tutorials/multi_node.md
@@ -30,10 +30,13 @@ docker run --rm \
 -it quay.io/ascend/vllm-ascend:|vllm_ascend_version| bash
 ```
 
-```{note}
-**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
-```
+(Optional) Install MindIE Turbo for performance acceleration:
 
+```{code-block} bash
+   :substitutions:
+
+pip install mindie_turbo==|pip_mindie_turbo_version|
+```
 
 Choose one machine as head node, the other are worker nodes, then start ray on each machine:
 
diff --git a/docs/source/tutorials/multi_npu.md b/docs/source/tutorials/multi_npu.md
@@ -27,8 +27,12 @@ docker run --rm \
 -it $IMAGE bash
 ```
 
-```{note}
-**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
+(Optional) Install MindIE Turbo for performance acceleration:
+
+```{code-block} bash
+   :substitutions:
+
+pip install mindie_turbo==|pip_mindie_turbo_version|
 ```
 
 Setup environment variables:
diff --git a/docs/source/tutorials/single_npu.md b/docs/source/tutorials/single_npu.md
@@ -26,8 +26,12 @@ docker run --rm \
 -it $IMAGE bash
 ```
 
-```{note}
-**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
+(Optional) Install MindIE Turbo for performance acceleration:
+
+```{code-block} bash
+   :substitutions:
+
+pip install mindie_turbo==|pip_mindie_turbo_version|
 ```
 
 Setup environment variables:
@@ -94,7 +98,20 @@ docker run --rm \
 -p 8000:8000 \
 -e VLLM_USE_MODELSCOPE=True \
 -e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
--it $IMAGE \
+-it $IMAGE bash
+```
+
+(Optional) Install MindIE Turbo for performance acceleration:
+
+```{code-block} bash
+   :substitutions:
+
+pip install mindie_turbo==|pip_mindie_turbo_version|
+```
+
+Run the following script to start the vLLM server:
+
+```
 vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
 ```
 
diff --git a/docs/source/tutorials/single_npu_multimodal.md b/docs/source/tutorials/single_npu_multimodal.md
@@ -26,10 +26,13 @@ docker run --rm \
 -it $IMAGE bash
 ```
 
-```{note}
-**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
-```
+(Optional) Install MindIE Turbo for performance acceleration:
+
+```{code-block} bash
+   :substitutions:
 
+pip install mindie_turbo==|pip_mindie_turbo_version|
+```
 
 Setup environment variables:
 
@@ -148,7 +151,20 @@ docker run --rm \
 -p 8000:8000 \
 -e VLLM_USE_MODELSCOPE=True \
 -e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
--it $IMAGE \
+-it $IMAGE bash
+```
+
+(Optional) Install MindIE Turbo for performance acceleration:
+
+```{code-block} bash
+   :substitutions:
+
+pip install mindie_turbo==|pip_mindie_turbo_version|
+```
+
+Run the following script to start the vLLM server:
+
+```
 vllm serve Qwen/Qwen2.5-VL-7B-Instruct --dtype bfloat16 --max_model_len 16384 --max-num-batched-tokens 16384
 ```
 

Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,8 @@`
`15`	`15`	`# limitations under the License.`
`16`	`16`	`#`
`17`	`17`
`18`		`-FROM quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10`
	`18`	`+# TODO(yikun): Remove daocloud prefix after infra ready`
	`19`	`+FROM m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10`
`19`	`20`
`20`	`21`	`ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"`
`21`	`22`