Skip to content

Commit 395426e

Browse files
committed
model-impl
1 parent 7b5a367 commit 395426e

File tree

1 file changed

+11
-12
lines changed

1 file changed

+11
-12
lines changed

docs/source/en/serving.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -46,20 +46,19 @@ Many features like quantization, LoRA adapters, and distributed inference and se
4646
> [!TIP]
4747
> Refer to the [Transformers fallback](https://docs.vllm.ai/en/latest/models/supported_models.html#transformers-fallback) section for more details.
4848
49-
Run the code below, and if it prints `TransformersModel`, then it means the model is a Transformers implementatioon.
49+
By default, vLLM serves the native implementation and if it doesn't exist, it falls back on the Transformers implementation. But you can also set `--model-impl transformers` to explicitly use the Transformers model implementation.
5050

51-
```py
52-
from vllm import LLM
53-
54-
llm = LLM(model="...", task="generate")
55-
llm.apply_model(lambda model: print(type(model)))
51+
```shell
52+
vllm serve Qwen/Qwen2.5-1.5B-Instruct \
53+
--task generate \
54+
--model-impl transformers \
5655
```
5756

58-
Add the `trust_remote_code` parameter to enable loading a remote code model.
59-
60-
```py
61-
from vllm import LLM
57+
Add the `trust-remote-code` parameter to enable loading a remote code model.
6258

63-
llm = LLM(model="...", task="generate", trust_remote_code=True)
64-
llm.apply_model(lambda model: print(type(model)))
59+
```shell
60+
vllm serve Qwen/Qwen2.5-1.5B-Instruct \
61+
--task generate \
62+
--model-impl transformers \
63+
--trust-remote-code \
6564
```

0 commit comments

Comments
 (0)