Skip to content

Commit 403923a

Browse files
author
guanbao
committed
make format happy
Signed-off-by: guanbao <gyu@amd.com>
1 parent aca5b41 commit 403923a

File tree

1 file changed

+51
-26
lines changed

1 file changed

+51
-26
lines changed

evaluation/README.md

Lines changed: 51 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,32 @@
1-
# Set Enviroment
1+
# Guideline
22

3-
1. Docker image:
4-
```
3+
## Set Enviroment
4+
5+
1. Docker image:
6+
7+
```shell
58
rocm/ali-private:ubuntu22.04_rocm7.0.1.42_vllm_5b842c2_aiter_6b586ae_torch2.8.0_20250917
69
```
7-
2. Upgrade PyBind:
8-
```
10+
11+
2. Upgrade PyBind:
12+
13+
```shell
914
pip install --upgrade pybind11
1015
```
16+
1117
3. Install Aiter dev/perf branch:
12-
```
18+
19+
```shell
1320
pip uninstall aiter
1421
git clone -b dev/perf git@github.com:ROCm/aiter.git
1522
cd aiter
1623
git submodule sync && git submodule update --init --recursive
1724
python3 setup.py install
1825
```
26+
1927
4. Install Rocm/vLLM dev/perf branch:
20-
```
28+
29+
```shell
2130
pip uninstall vllm
2231
git clone -b dev/perf git@github.com:ROCm/vllm.git
2332
cd vllm
@@ -26,21 +35,27 @@
2635
python3 setup.py develop
2736
```
2837

29-
# Launch server
38+
## Launch server
39+
3040
1. deepseek-r1 PTPC FP8
31-
- download weight: https://huggingface.co/EmbeddedLLM/deepseek-r1-FP8-Dynamic
32-
```
41+
42+
- download weight: <https://huggingface.co/EmbeddedLLM/deepseek-r1-FP8-Dynamic>
43+
44+
```shell
3345
huggingface-cli download EmbeddedLLM/deepseek-r1-FP8-Dynamic --local-dir EmbeddedLLM/deepseek-r1-FP8-Dynamic
3446
```
47+
3548
- launch server:
36-
```
49+
50+
```shell
3751
bash launch_deepseekr1_ptpc_fp8.sh
3852
```
3953

4054
We currently use pure tp8 since it gives better performance than TP8 + EP8, which is subject to change as optimization continues.
4155

4256
The example command:
43-
```
57+
58+
```shell
4459
export VLLM_USE_V1=1
4560
export SAFETENSORS_FAST_GPU=1
4661
export VLLM_ROCM_USE_AITER=1
@@ -69,25 +84,30 @@
6984
--block-size 1
7085
```
7186

72-
# Curl request
87+
## Curl request
88+
7389
1. curl a single request to quickly check the functionality
7490

75-
```
91+
```shell
7692
curl -X POST "http://localhost:8000/v1/completions" \
7793
-H "Content-Type: application/json" \
7894
-d '{
7995
"prompt": "The capital of China", "temperature": 0, "top_p": 1, "top_k": 0, "repetition_penalty": 1.0, "presence_penalty": 0, "frequency_penalty": 0, "stream": false, "ignore_eos": false, "n": 1, "seed": 123
8096
}'
8197
```
98+
8299
The result should be:
83-
```
100+
101+
```shell
84102
{"id":"cmpl-026a60769119489587e46d571b6ebb6a","object":"text_completion","created":1760272161,"model":"/mnt/raid0/zhangguopeng/deepseek-r1-FP8-Dynamic/","choices":[{"index":0,
85103
"text":" is Beijing, and Shanghai is its most populous city by urban area population. China","logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"prompt_logprobs":null,"prompt_token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":5,"total_tokens":21,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}
86104
```
87105

88-
# Benchmark
106+
## Benchmark
107+
89108
1. Take deepseek as example, you can use the following command to benchmark serve.
90-
```
109+
110+
```shell
91111
model="/path-to-model/deepseek-r1-FP8-Dynamic/"
92112
vllm bench serve \
93113
--host localhost \
@@ -106,16 +126,16 @@
106126
2>&1 | tee log.client.log
107127
```
108128

109-
# Evaluation
129+
## Evaluation
110130

111-
## Text Model Evaluation
131+
### Text Model Evaluation
112132

113133
Text model is evaluated using lm-eval (<https://github.com/EleutherAI/lm-evaluation-harness.git>).
114134

115135
1. Install dependencies. `python3 -m pip install lm_eval tenacity`.
116136
2. Start lm-eval. Example:
117137

118-
```bash
138+
```shell
119139
#!/bin/bash
120140
model="/path-to-model/deepseek-r1-FP8-Dynamic/"
121141
lm_eval \
@@ -124,27 +144,32 @@ Text model is evaluated using lm-eval (<https://github.com/EleutherAI/lm-evaluat
124144
--model_args model=${model},base_url=http://127.0.0.1:8000/v1/completions \
125145
--batch_size 100
126146
```
147+
127148
The eager-mode result should be:
128-
```
149+
150+
```shell
129151
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
130152
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
131153
|gsm8k| 3|flexible-extract| 5|exact_match||0.9522|± |0.0059|
132154
| | |strict-match | 5|exact_match||0.9530|± |0.0058|
133155
```
156+
134157
The FULL_AND_PIECEWISE graph-mode result should be:
135-
```
158+
159+
```shell
136160
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
137161
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
138162
|gsm8k| 3|flexible-extract| 5|exact_match||0.9500|± |0.0060|
139163
| | |strict-match | 5|exact_match||0.9477|± |0.0061|
140164
```
165+
141166
**Take notes:**
142167

143168
- It is required to set --batch_size to larger value as the default value is 1.
144169
Setting --batch_size > 1 to evaluate if the batching logic is correctly implemented or not.
145170
- Extra details: lm-eval send seed requests. Thus, in vLLM sampling class, it will use the per-request sampling.
146171

147-
## Visual Model Evaluation
172+
### Visual Model Evaluation
148173

149174
Vision Language Model accuracy evualuation is done using the tool from
150175
<https://github.com/EmbeddedLLM/mistral-evals.git> (it is modified from
@@ -153,7 +178,7 @@ Vision Language Model accuracy evualuation is done using the tool from
153178
1. Install dependency. `python3 -m pip install fire`
154179
2. Launch vLLM server. Example:
155180

156-
```bash
181+
```shell
157182
#!/bin/bash
158183
rm -rf /root/.cache/vllm
159184
export GPU_ARCHS=gfx942
@@ -170,7 +195,7 @@ Vision Language Model accuracy evualuation is done using the tool from
170195

171196
3. Start evaluation. (Recommended chartqa dataset as the variance of the score is smaller). Example:
172197

173-
```bash
198+
```shell
174199
#!/bin/bash
175200
pushd ./mistral-evals
176201
python3 -m eval.run eval_vllm \
@@ -184,7 +209,7 @@ Vision Language Model accuracy evualuation is done using the tool from
184209

185210
**Take notes:** The batch size is hard coded to 32 in the repository.
186211

187-
## Helper script
212+
### Helper script
188213

189214
The launch scripts are attached to give an idea what are the configuration that was validated
190215
at some point in time that works.

0 commit comments

Comments
 (0)