You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/getting_started/quickstart.md
+54-20Lines changed: 54 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,32 +12,56 @@ This guide will help you quickly get started with vLLM to perform:
12
12
13
13
## Installation
14
14
15
-
If you are using NVIDIA GPUs, you can install vLLM using [pip](https://pypi.org/project/vllm/) directly.
15
+
=== "NVIDIA CUDA"
16
16
17
-
It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:
17
+
If you are using NVIDIA GPUs, you can install vLLM using [pip](https://pypi.org/project/vllm/) directly.
18
18
19
-
```bash
20
-
uv venv --python 3.12 --seed
21
-
source .venv/bin/activate
22
-
uv pip install vllm --torch-backend=auto
23
-
```
19
+
It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:
24
20
25
-
`uv` can [automatically select the appropriate PyTorch index at runtime](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection) by inspecting the installed CUDA driver version via `--torch-backend=auto` (or `UV_TORCH_BACKEND=auto`). To select a specific backend (e.g., `cu126`), set `--torch-backend=cu126` (or `UV_TORCH_BACKEND=cu126`).
21
+
```bash
22
+
uv venv --python 3.12 --seed
23
+
source .venv/bin/activate
24
+
uv pip install vllm --torch-backend=auto
25
+
```
26
26
27
-
Another delightful way is to use `uv run` with `--with [dependency]` option, which allows you to run commands such as `vllm serve` without creating any permanent environment:
27
+
`uv` can [automatically select the appropriate PyTorch index at runtime](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection) by inspecting the installed CUDA driver version via `--torch-backend=auto` (or `UV_TORCH_BACKEND=auto`). To select a specific backend (e.g., `cu126`), set `--torch-backend=cu126` (or `UV_TORCH_BACKEND=cu126`).
28
28
29
-
```bash
30
-
uv run --with vllm vllm --help
31
-
```
29
+
Another delightful way is to use `uv run` with `--with [dependency]` option, which allows you to run commands such as `vllm serve` without creating any permanent environment:
32
30
33
-
You can also use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create and manage Python environments. You can install `uv` to the conda environment through `pip` if you want to manage it within the environment.
31
+
```bash
32
+
uv run --with vllm vllm --help
33
+
```
34
34
35
-
```bash
36
-
conda create -n myenv python=3.12 -y
37
-
conda activate myenv
38
-
pip install --upgrade uv
39
-
uv pip install vllm --torch-backend=auto
40
-
```
35
+
You can also use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create and manage Python environments. You can install `uv` to the conda environment through `pip` if you want to manage it within the environment.
36
+
37
+
```bash
38
+
conda create -n myenv python=3.12 -y
39
+
conda activate myenv
40
+
pip install --upgrade uv
41
+
uv pip install vllm --torch-backend=auto
42
+
```
43
+
44
+
=== "AMD ROCm"
45
+
46
+
Use a pre-built docker image from Docker Hub. The public stable image is [rocm/vllm:latest](https://hub.docker.com/r/rocm/vllm). There is also a development image at [rocm/vllm-dev](https://hub.docker.com/r/rocm/vllm-dev).
47
+
48
+
The `-v` flag in the `docker run` command below mounts a local directory into the container. Replace `<path/to/your/models>` with the path on your host machine to the directory containing your models. The models will then be accessible inside the container at `/app/models`.
49
+
50
+
???+ console "Commands"
51
+
```bash
52
+
docker pull rocm/vllm-dev:nightly # to get the latest image
53
+
docker run -it --rm \
54
+
--network=host \
55
+
--group-add=video \
56
+
--ipc=host \
57
+
--cap-add=SYS_PTRACE \
58
+
--security-opt seccomp=unconfined \
59
+
--device /dev/kfd \
60
+
--device /dev/dri \
61
+
-v <path/to/your/models>:/app/models \
62
+
-e HF_HOME="/app/models" \
63
+
rocm/vllm-dev:nightly
64
+
```
41
65
42
66
!!! note
43
67
For more detail and non-CUDA platforms, please refer [here](installation/README.md) for specific instructions on how to install vLLM.
@@ -246,7 +270,17 @@ Alternatively, you can use the `openai` Python package:
246
270
247
271
Currently, vLLM supports multiple backends for efficient Attention computation across different platforms and accelerator architectures. It automatically selects the most performant backend compatible with your system and model specifications.
248
272
249
-
If desired, you can also manually set the backend of your choice by configuring the environment variable `VLLM_ATTENTION_BACKEND` to one of the following options: `FLASH_ATTN`, `FLASHINFER` or `XFORMERS`.
273
+
If desired, you can also manually set the backend of your choice by configuring the environment variable `VLLM_ATTENTION_BACKEND` to one of the following options:
274
+
275
+
- On NVIDIA CUDA: `FLASH_ATTN`, `FLASHINFER` or `XFORMERS`.
276
+
- On AMD ROCm: `TRITON_ATTN`, `ROCM_ATTN`, `ROCM_AITER_FA` or `ROCM_AITER_UNIFIED_ATTN`.
277
+
278
+
For AMD ROCm, you can futher control the specific Attention implementation using the following variables:
There are no pre-built vllm wheels containing Flash Infer, so you must install it in your environment first. Refer to the [Flash Infer official docs](https://docs.flashinfer.ai/) or see [docker/Dockerfile](../../docker/Dockerfile) for instructions on how to install it.
0 commit comments