You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What this PR does / why we need it?
Update tutorials.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
no.
---------
Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
> Add `--max_model_len` option to avoid ValueError that the Qwen2.5-7B model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (26240).
93
91
92
+
If your service start successfully, you can see the info shown below:
93
+
94
+
```bash
95
+
INFO: Started server process [6873]
96
+
INFO: Waiting for application startup.
97
+
INFO: Application startup complete.
98
+
```
99
+
94
100
Once your server is started, you can query the model with input prompts:
95
101
96
102
```bash
@@ -146,7 +152,6 @@ Setup environment variables:
146
152
```bash
147
153
# Use Modelscope mirror to speed up model download
148
154
export VLLM_USE_MODELSCOPE=True
149
-
export MODELSCOPE_CACHE=/root/.cache/
150
155
151
156
# To avoid NPU out of memory, set `max_split_size_mb` to any value lower than you need to allocate for Qwen2.5-7B-Instruct
0 commit comments