Skip to content

Commit 58d22c3

Browse files
vllmellmrtourgeman
authored andcommitted
[DOC] [ROCm] Add ROCm quickstart guide (vllm-project#26505)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
1 parent e2f755c commit 58d22c3

File tree

1 file changed

+54
-20
lines changed

1 file changed

+54
-20
lines changed

docs/getting_started/quickstart.md

Lines changed: 54 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,32 +12,56 @@ This guide will help you quickly get started with vLLM to perform:
1212

1313
## Installation
1414

15-
If you are using NVIDIA GPUs, you can install vLLM using [pip](https://pypi.org/project/vllm/) directly.
15+
=== "NVIDIA CUDA"
1616

17-
It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:
17+
If you are using NVIDIA GPUs, you can install vLLM using [pip](https://pypi.org/project/vllm/) directly.
1818

19-
```bash
20-
uv venv --python 3.12 --seed
21-
source .venv/bin/activate
22-
uv pip install vllm --torch-backend=auto
23-
```
19+
It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:
2420

25-
`uv` can [automatically select the appropriate PyTorch index at runtime](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection) by inspecting the installed CUDA driver version via `--torch-backend=auto` (or `UV_TORCH_BACKEND=auto`). To select a specific backend (e.g., `cu126`), set `--torch-backend=cu126` (or `UV_TORCH_BACKEND=cu126`).
21+
```bash
22+
uv venv --python 3.12 --seed
23+
source .venv/bin/activate
24+
uv pip install vllm --torch-backend=auto
25+
```
2626

27-
Another delightful way is to use `uv run` with `--with [dependency]` option, which allows you to run commands such as `vllm serve` without creating any permanent environment:
27+
`uv` can [automatically select the appropriate PyTorch index at runtime](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection) by inspecting the installed CUDA driver version via `--torch-backend=auto` (or `UV_TORCH_BACKEND=auto`). To select a specific backend (e.g., `cu126`), set `--torch-backend=cu126` (or `UV_TORCH_BACKEND=cu126`).
2828

29-
```bash
30-
uv run --with vllm vllm --help
31-
```
29+
Another delightful way is to use `uv run` with `--with [dependency]` option, which allows you to run commands such as `vllm serve` without creating any permanent environment:
3230

33-
You can also use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create and manage Python environments. You can install `uv` to the conda environment through `pip` if you want to manage it within the environment.
31+
```bash
32+
uv run --with vllm vllm --help
33+
```
3434

35-
```bash
36-
conda create -n myenv python=3.12 -y
37-
conda activate myenv
38-
pip install --upgrade uv
39-
uv pip install vllm --torch-backend=auto
40-
```
35+
You can also use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create and manage Python environments. You can install `uv` to the conda environment through `pip` if you want to manage it within the environment.
36+
37+
```bash
38+
conda create -n myenv python=3.12 -y
39+
conda activate myenv
40+
pip install --upgrade uv
41+
uv pip install vllm --torch-backend=auto
42+
```
43+
44+
=== "AMD ROCm"
45+
46+
Use a pre-built docker image from Docker Hub. The public stable image is [rocm/vllm:latest](https://hub.docker.com/r/rocm/vllm). There is also a development image at [rocm/vllm-dev](https://hub.docker.com/r/rocm/vllm-dev).
47+
48+
The `-v` flag in the `docker run` command below mounts a local directory into the container. Replace `<path/to/your/models>` with the path on your host machine to the directory containing your models. The models will then be accessible inside the container at `/app/models`.
49+
50+
???+ console "Commands"
51+
```bash
52+
docker pull rocm/vllm-dev:nightly # to get the latest image
53+
docker run -it --rm \
54+
--network=host \
55+
--group-add=video \
56+
--ipc=host \
57+
--cap-add=SYS_PTRACE \
58+
--security-opt seccomp=unconfined \
59+
--device /dev/kfd \
60+
--device /dev/dri \
61+
-v <path/to/your/models>:/app/models \
62+
-e HF_HOME="/app/models" \
63+
rocm/vllm-dev:nightly
64+
```
4165

4266
!!! note
4367
For more detail and non-CUDA platforms, please refer [here](installation/README.md) for specific instructions on how to install vLLM.
@@ -246,7 +270,17 @@ Alternatively, you can use the `openai` Python package:
246270

247271
Currently, vLLM supports multiple backends for efficient Attention computation across different platforms and accelerator architectures. It automatically selects the most performant backend compatible with your system and model specifications.
248272

249-
If desired, you can also manually set the backend of your choice by configuring the environment variable `VLLM_ATTENTION_BACKEND` to one of the following options: `FLASH_ATTN`, `FLASHINFER` or `XFORMERS`.
273+
If desired, you can also manually set the backend of your choice by configuring the environment variable `VLLM_ATTENTION_BACKEND` to one of the following options:
274+
275+
- On NVIDIA CUDA: `FLASH_ATTN`, `FLASHINFER` or `XFORMERS`.
276+
- On AMD ROCm: `TRITON_ATTN`, `ROCM_ATTN`, `ROCM_AITER_FA` or `ROCM_AITER_UNIFIED_ATTN`.
277+
278+
For AMD ROCm, you can futher control the specific Attention implementation using the following variables:
279+
280+
- Triton Unified Attention: `VLLM_ROCM_USE_AITER=0 VLLM_V1_USE_PREFILL_DECODE_ATTENTION=0 VLLM_ROCM_USE_AITER_MHA=0`
281+
- AITER Unified Attention: `VLLM_ROCM_USE_AITER=1 VLLM_USE_AITER_UNIFIED_ATTENTION=1 VLLM_V1_USE_PREFILL_DECODE_ATTENTION=0 VLLM_ROCM_USE_AITER_MHA=0`
282+
- Triton Prefill-Decode Attention: `VLLM_ROCM_USE_AITER=1 VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 VLLM_ROCM_USE_AITER_MHA=0`
283+
- AITER Multi-head Attention: `VLLM_ROCM_USE_AITER=1 VLLM_V1_USE_PREFILL_DECODE_ATTENTION=0 VLLM_ROCM_USE_AITER_MHA=1`
250284

251285
!!! warning
252286
There are no pre-built vllm wheels containing Flash Infer, so you must install it in your environment first. Refer to the [Flash Infer official docs](https://docs.flashinfer.ai/) or see [docker/Dockerfile](../../docker/Dockerfile) for instructions on how to install it.

0 commit comments

Comments
 (0)