Skip to content

Commit ee607bd

Browse files
committed
update readme
1 parent c5b03b9 commit ee607bd

File tree

1 file changed

+50
-8
lines changed

1 file changed

+50
-8
lines changed

multimodal/vl2l/README.md

Lines changed: 50 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,11 @@ Install `mlperf-inf-mm-vl2l` and the development tools with:
4242

4343
- On Bash
4444
```bash
45-
pip install multimodal/vl2l/[dev]
45+
pip install -e multimodal/vl2l/[dev]
4646
```
4747
- On Zsh
4848
```zsh
49-
pip install multimodal/vl2l/"[dev]"
49+
pip install -e multimodal/vl2l/"[dev]"
5050
```
5151

5252
### Post VL2L benchmarking CLI installation
@@ -63,7 +63,8 @@ You can enable shell autocompletion for `mlperf-inf-mm-vl2l` with:
6363
mlperf-inf-mm-vl2l --install-completion
6464
```
6565

66-
> NOTE: Shell auto-completion will take effect once you restart the terminal.
66+
> [!NOTE]
67+
> Shell auto-completion will take effect once you restart the terminal.
6768
6869
### Start an inference endpoint on your local host machine with vLLM
6970

@@ -108,6 +109,12 @@ Accuracy only mode:
108109
mlperf-inf-mm-vl2l benchmark endpoint --settings.test.scenario server --settings.test.mode accuracy_only
109110
```
110111

112+
### Evalute the response quality
113+
114+
```bash
115+
mlperf-inf-mm-vl2l evaluate --filename output/mlperf_log_accuracy.json
116+
```
117+
111118
## Docker
112119

113120
[docker/](docker/) provides examples of Dockerfiles that install the VL2L benchmarking
@@ -117,6 +124,30 @@ for example, in a situation where you must use a GPU cluster managed by
117124
[Slurm](https://slurm.schedmd.com/) with [enroot](https://github.com/nvidia/enroot) and
118125
[pyxis](https://github.com/NVIDIA/pyxis).
119126

127+
As an illustrative example, assuming that you are at the root directory of the MLPerf
128+
Inference repo:
129+
130+
1. You can build a container image against the vLLM's
131+
`vllm/vllm-openai:v0.12.0` release by
132+
133+
```bash
134+
docker build \
135+
--build-arg BASE_IMAGE_URL=vllm/vllm-openai:v0.12.0 \
136+
--build-arg MLPERF_INF_MM_VL2L_INSTALL_URL=multimodal/vl2l \
137+
-f multimodal/vl2l/docker/vllm-cuda.Dockerfile \
138+
-t mlperf-inf-mm-vl2l:vllm-openai-v0.12.0 \
139+
.
140+
```
141+
> [!NOTE]
142+
> `MLPERF_INF_MM_VL2L_INSTALL_URL` can also take in a remote GitHub location, such as
143+
> `git+https://github.com/mlcommons/inference.git#subdirectory=multimodal/vl2l/`.
144+
145+
2. Afterwards, you can start the container in the interactive mode by
146+
147+
```bash
148+
docker run --rm -it --gpus all -v ~/.cache:/root/.cache --ipc=host mlperf-inf-mm-vl2l:vllm-openai-v0.12.0
149+
```
150+
120151
### Benchmark against vLLM inside the container
121152

122153
If you are running `mlperf-inf-mm-vl2l` inside a local environment that has access to
@@ -128,16 +159,27 @@ vLLM (such as inside a container that was created using the
128159
2. Wait for the endpoint to be healthy.
129160
3. Run the benchmark against that endpoint.
130161

131-
For example, inside the container, you can run the Offline scenario Performance only
162+
For example, inside the container, you can run the Offline scenario Accuracy only
132163
mode with:
133164

134165
```bash
135166
mlperf-inf-mm-vl2l benchmark vllm \
136-
--vllm.model.repo_id Qwen/Qwen3-VL-235B-A22B-Instruct \
137-
--vllm.arg=--tensor-parallel-size=8 \
138-
--vllm.arg=--limit-mm-per-prompt.video=0 \
139167
--settings.test.scenario offline \
140-
--settings.test.mode performance_only
168+
--settings.test.mode accuracy_only \
169+
--dataset.token ... \
170+
--vllm.cli=--async-scheduling \
171+
--vllm.cli=--max-model-len=32768 \
172+
--vllm.cli=--max-num-seqs=1024 \
173+
--vllm.cli=--compilation-config='{
174+
"cudagraph_capture_sizes": [
175+
1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128,
176+
136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248,
177+
256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480,
178+
496, 512, 1024, 1536, 2048, 3072, 4096, 6144, 8192, 12288, 16384, 24576, 32768
179+
]
180+
}' \
181+
--vllm.cli=--limit-mm-per-prompt.video=0 \
182+
--vllm.cli=--tensor-parallel-size=8
141183
```
142184

143185
## Developer Guide

0 commit comments

Comments
 (0)