Skip to content

Commit 20876ca

Browse files
committed
Update installation and tutorial doc
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1 parent be9e3e8 commit 20876ca

File tree

2 files changed

+14
-8
lines changed

2 files changed

+14
-8
lines changed

docs/source/installation.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ First install system dependencies:
124124

125125
```bash
126126
apt update -y
127-
apt install -y gcc g++ cmake libnuma-dev wget
127+
apt install -y gcc g++ cmake libnuma-dev wget git
128128
```
129129

130130
**[Optinal]** Config the extra-index of `pip` if you are working on a **x86** machine, so that the torch with cpu could be found:
@@ -138,8 +138,14 @@ Then you can install `vllm` and `vllm-ascend` from **pre-built wheel**:
138138
```{code-block} bash
139139
:substitutions:
140140
141-
# Install vllm-project/vllm from pypi (v0.8.4 aarch64 is unsupported see detail in below note)
142-
pip install vllm==|pip_vllm_version|
141+
# Install vllm-project/vllm from pypi
142+
# (v0.8.4 aarch64 is unsupported see detail in below note)
143+
# pip install vllm==|pip_vllm_version|
144+
# Install vLLM
145+
git clone --depth 1 --branch |vllm_version| https://github.com/vllm-project/vllm
146+
cd vllm
147+
VLLM_TARGET_DEVICE=empty pip install -v -e .
148+
cd ..
143149
144150
# Install vllm-project/vllm-ascend from pypi.
145151
pip install vllm-ascend==|pip_vllm_ascend_version|

docs/source/tutorials/single_npu.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Single NPU (Qwen2.5 7B)
1+
# Single NPU (Qwen3 8B)
22

33
## Run vllm-ascend on Single NPU
44

@@ -50,7 +50,7 @@ prompts = [
5050
"The future of AI is",
5151
]
5252
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
53-
llm = LLM(model="Qwen/Qwen2.5-7B-Instruct", max_model_len=26240)
53+
llm = LLM(model="Qwen/Qwen3-8B", max_model_len=26240)
5454

5555
outputs = llm.generate(prompts, sampling_params)
5656
for output in outputs:
@@ -91,7 +91,7 @@ docker run --rm \
9191
-e VLLM_USE_MODELSCOPE=True \
9292
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
9393
-it $IMAGE \
94-
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
94+
vllm serve Qwen/Qwen3-8B --max_model_len 26240
9595
```
9696

9797
:::{note}
@@ -112,7 +112,7 @@ Once your server is started, you can query the model with input prompts:
112112
curl http://localhost:8000/v1/completions \
113113
-H "Content-Type: application/json" \
114114
-d '{
115-
"model": "Qwen/Qwen2.5-7B-Instruct",
115+
"model": "Qwen/Qwen3-8B",
116116
"prompt": "The future of AI is",
117117
"max_tokens": 7,
118118
"temperature": 0
@@ -122,7 +122,7 @@ curl http://localhost:8000/v1/completions \
122122
If you query the server successfully, you can see the info shown below (client):
123123

124124
```bash
125-
{"id":"cmpl-b25a59a2f985459781ce7098aeddfda7","object":"text_completion","created":1739523925,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"text":" here. It’s not just a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7,"prompt_tokens_details":null}}
125+
{"id":"cmpl-b25a59a2f985459781ce7098aeddfda7","object":"text_completion","created":1739523925,"model":"Qwen/Qwen/Qwen3-8B","choices":[{"index":0,"text":" here. It’s not just a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7,"prompt_tokens_details":null}}
126126
```
127127

128128
Logs of the vllm server:

0 commit comments

Comments
 (0)