Skip to content

Commit 6435154

Browse files
committed
rebase, improve doc, set CANN 8.1.rc1
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1 parent d199c22 commit 6435154

File tree

10 files changed

+78
-48
lines changed

10 files changed

+78
-48
lines changed

.github/workflows/vllm_ascend_test.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ jobs:
4545
name: vLLM Ascend test v0.7.3-dev
4646
runs-on: linux-arm64-npu-1
4747
container:
48-
image: quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
48+
# TODO(yikun): Remove daocloud prefix after infra ready
49+
image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
4950
env:
5051
HF_ENDPOINT: https://hf-mirror.com
5152
HF_TOKEN: ${{ secrets.HF_TOKEN }}

Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
18+
# TODO(yikun): Remove daocloud prefix after infra ready
19+
FROM m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
1920

2021
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2122

Dockerfile.openEuler

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# This file is a part of the vllm-ascend project.
1616
#
1717

18-
FROM quay.io/ascend/cann:8.1.rc1.beta1-910b-ubuntu22.04-py3.10
18+
# TODO(yikun): Remove daocloud prefix after infra ready
19+
FROM m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-openeuler22.03-py3.10
1920

2021
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
2122

docs/source/conf.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,10 @@
7272
# This value should be updated when cut down release.
7373
'pip_vllm_ascend_version': "0.7.3rc2",
7474
'pip_vllm_version': "0.7.3",
75+
# The maching MindIE Turbo for vLLM Ascend
76+
# TODO(yikun): confirm the version, the release version is 2.0.rc1,
77+
# but I recommand the PyPi version follow PEP440
78+
'pip_mindie_turbo_version': "2.0rc1",
7579
# CANN image tag
7680
'cann_image_tag': "8.0.0-910b-ubuntu22.04-py3.10",
7781
}

docs/source/installation.md

Lines changed: 9 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,6 @@ docker run --rm \
6868
:animate: fade-in-slide-down
6969
You can also install CANN manually:
7070

71-
```{note}
72-
This guide takes aarch64 as an example. If you run on x86, you need to replace `aarch64` with `x86_64` for the package name shown below.
73-
```
74-
7571
```bash
7672
# Create a virtual environment
7773
python -m venv vllm-ascend-env
@@ -138,15 +134,6 @@ pip install vllm==|pip_vllm_version|
138134
pip install vllm-ascend==|pip_vllm_ascend_version| --extra-index https://download.pytorch.org/whl/cpu/
139135
```
140136

141-
**Optional**
142-
Install MindIE Turbo for Performance acceleration:
143-
144-
```{code-block} bash
145-
:substitutions:
146-
# Install MindIE Turbo
147-
pip install vllm-ascend[mindie_turbo]==|pip_vllm_ascend_version| --extra-index https://download.pytorch.org/whl/cpu/
148-
```
149-
150137
:::{dropdown} Click here to see "Build from source code"
151138
or build from **source code**:
152139

@@ -163,15 +150,6 @@ git clone --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-proj
163150
cd vllm-ascend
164151
pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
165152
```
166-
167-
**Optional**
168-
Install MindIE Turbo for Performance acceleration:
169-
170-
```{code-block} bash
171-
:substitutions:
172-
# Install MindIE Turbo
173-
pip install mindie_turbo
174-
```
175153
:::
176154

177155
::::
@@ -212,18 +190,20 @@ docker run --rm \
212190
-it $IMAGE bash
213191
```
214192

215-
```{note}
216-
1. vLLM and vLLM Ascend code are placed in `/vllm-workspace` in the docker image. And they are installed in develop mode so that the developer could easily modify the code.
193+
The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/vllm-workspace` and installed in [development mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip install -e`) to help developer immediately take place changes without requiring a new installation.
194+
::::
217195

218-
2. The entrypath of the docker container is `/workspace`.
196+
:::::
219197

220-
3. **Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
198+
## (Optional) Install MindIE Turbo
221199

222-
```
200+
Install MindIE Turbo for performance acceleration:
223201

224-
::::
202+
```{code-block} bash
203+
:substitutions:
225204
226-
:::::
205+
pip install mindie_turbo==|pip_vllm_ascend_version|
206+
```
227207

228208
## Extra information
229209

docs/source/quick_start.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,16 @@ docker run --rm \
3131
-it $IMAGE bash
3232
```
3333

34-
```{note}
35-
1. vLLM and vLLM Ascend code are placed in `/vllm-workspace` in the docker image. And they are installed in develop mode so that the developer could easily modify the code.
34+
The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/vllm-workspace` and installed in [development mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip install -e`) to help developer immediately take place changes without requiring a new installation.
3635

37-
2. The entrypath of the docker container is `/workspace`.
36+
## (Optional) Install MindIE Turbo
3837

39-
3. **Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
38+
Install MindIE Turbo for performance acceleration:
4039

40+
```{code-block} bash
41+
:substitutions:
42+
43+
pip install mindie_turbo==|pip_vllm_ascend_version|
4144
```
4245

4346
## Usage

docs/source/tutorials/multi_node.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,13 @@ docker run --rm \
3030
-it quay.io/ascend/vllm-ascend:|vllm_ascend_version| bash
3131
```
3232

33-
```{note}
34-
**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
35-
```
33+
(Optional) Install MindIE Turbo for performance acceleration:
3634

35+
```{code-block} bash
36+
:substitutions:
37+
38+
pip install mindie_turbo==|pip_mindie_turbo_version|
39+
```
3740

3841
Choose one machine as head node, the other are worker nodes, then start ray on each machine:
3942

docs/source/tutorials/multi_npu.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,12 @@ docker run --rm \
2727
-it $IMAGE bash
2828
```
2929

30-
```{note}
31-
**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
30+
(Optional) Install MindIE Turbo for performance acceleration:
31+
32+
```{code-block} bash
33+
:substitutions:
34+
35+
pip install mindie_turbo==|pip_mindie_turbo_version|
3236
```
3337

3438
Setup environment variables:

docs/source/tutorials/single_npu.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,12 @@ docker run --rm \
2626
-it $IMAGE bash
2727
```
2828

29-
```{note}
30-
**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
29+
(Optional) Install MindIE Turbo for performance acceleration:
30+
31+
```{code-block} bash
32+
:substitutions:
33+
34+
pip install mindie_turbo==|pip_mindie_turbo_version|
3135
```
3236

3337
Setup environment variables:
@@ -94,7 +98,20 @@ docker run --rm \
9498
-p 8000:8000 \
9599
-e VLLM_USE_MODELSCOPE=True \
96100
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
97-
-it $IMAGE \
101+
-it $IMAGE bash
102+
```
103+
104+
(Optional) Install MindIE Turbo for performance acceleration:
105+
106+
```{code-block} bash
107+
:substitutions:
108+
109+
pip install mindie_turbo==|pip_mindie_turbo_version|
110+
```
111+
112+
Run the following script to start the vLLM server:
113+
114+
```
98115
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
99116
```
100117

docs/source/tutorials/single_npu_multimodal.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,13 @@ docker run --rm \
2626
-it $IMAGE bash
2727
```
2828

29-
```{note}
30-
**Optional**: Install MindIE Turbo for Performance acceleration: `pip install mindie_turbo==|pip_vllm_ascend_version|`
31-
```
29+
(Optional) Install MindIE Turbo for performance acceleration:
30+
31+
```{code-block} bash
32+
:substitutions:
3233
34+
pip install mindie_turbo==|pip_mindie_turbo_version|
35+
```
3336

3437
Setup environment variables:
3538

@@ -148,7 +151,20 @@ docker run --rm \
148151
-p 8000:8000 \
149152
-e VLLM_USE_MODELSCOPE=True \
150153
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
151-
-it $IMAGE \
154+
-it $IMAGE bash
155+
```
156+
157+
(Optional) Install MindIE Turbo for performance acceleration:
158+
159+
```{code-block} bash
160+
:substitutions:
161+
162+
pip install mindie_turbo==|pip_mindie_turbo_version|
163+
```
164+
165+
Run the following script to start the vLLM server:
166+
167+
```
152168
vllm serve Qwen/Qwen2.5-VL-7B-Instruct --dtype bfloat16 --max_model_len 16384 --max-num-batched-tokens 16384
153169
```
154170

0 commit comments

Comments
 (0)