Update installation and tutorial doc

Yikun · Yikun · commit 20876ca4b391 · 2025-04-28T21:10:44.000+08:00
Signed-off-by: Yikun Jiang &lt;yikunkero@gmail.com&gt;
diff --git a/docs/source/installation.md b/docs/source/installation.md
@@ -124,7 +124,7 @@ First install system dependencies:
 
 ```bash
 apt update  -y
-apt install -y gcc g++ cmake libnuma-dev wget
+apt install -y gcc g++ cmake libnuma-dev wget git
 ```
 
 **[Optinal]** Config the extra-index of `pip` if you are working on a **x86** machine, so that the torch with cpu could be found:
@@ -138,8 +138,14 @@ Then you can install `vllm` and `vllm-ascend` from **pre-built wheel**:
 ```{code-block} bash
    :substitutions:
 
-# Install vllm-project/vllm from pypi (v0.8.4 aarch64 is unsupported see detail in below note)
-pip install vllm==|pip_vllm_version|
+# Install vllm-project/vllm from pypi
+# (v0.8.4 aarch64 is unsupported see detail in below note)
+# pip install vllm==|pip_vllm_version|
+# Install vLLM
+git clone --depth 1 --branch |vllm_version| https://github.com/vllm-project/vllm
+cd vllm
+VLLM_TARGET_DEVICE=empty pip install -v -e .
+cd ..
 
 # Install vllm-project/vllm-ascend from pypi.
 pip install vllm-ascend==|pip_vllm_ascend_version|
diff --git a/docs/source/tutorials/single_npu.md b/docs/source/tutorials/single_npu.md
@@ -1,4 +1,4 @@
-# Single NPU (Qwen2.5 7B)
+# Single NPU (Qwen3 8B)
 
 ## Run vllm-ascend on Single NPU
 
@@ -50,7 +50,7 @@ prompts = [
     "The future of AI is",
 ]
 sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
-llm = LLM(model="Qwen/Qwen2.5-7B-Instruct", max_model_len=26240)
+llm = LLM(model="Qwen/Qwen3-8B", max_model_len=26240)
 
 outputs = llm.generate(prompts, sampling_params)
 for output in outputs:
@@ -91,7 +91,7 @@ docker run --rm \
 -e VLLM_USE_MODELSCOPE=True \
 -e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
 -it $IMAGE \
-vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
+vllm serve Qwen/Qwen3-8B --max_model_len 26240
 ```
 
 :::{note}
@@ -112,7 +112,7 @@ Once your server is started, you can query the model with input prompts:
 curl http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
-        "model": "Qwen/Qwen2.5-7B-Instruct",
+        "model": "Qwen/Qwen3-8B",
         "prompt": "The future of AI is",
         "max_tokens": 7,
         "temperature": 0
@@ -122,7 +122,7 @@ curl http://localhost:8000/v1/completions \
 If you query the server successfully, you can see the info shown below (client):
 
 ```bash
-{"id":"cmpl-b25a59a2f985459781ce7098aeddfda7","object":"text_completion","created":1739523925,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"text":" here. It’s not just a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7,"prompt_tokens_details":null}}
+{"id":"cmpl-b25a59a2f985459781ce7098aeddfda7","object":"text_completion","created":1739523925,"model":"Qwen/Qwen/Qwen3-8B","choices":[{"index":0,"text":" here. It’s not just a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7,"prompt_tokens_details":null}}
 ```
 
 Logs of the vllm server: