1- # Set Enviroment
1+ # Guideline
22
3- 1 . Docker image:
4- ```
3+ ## Set Enviroment
4+
5+ 1 . Docker image:
6+
7+ ``` shell
58 rocm/ali-private:ubuntu22.04_rocm7.0.1.42_vllm_5b842c2_aiter_6b586ae_torch2.8.0_20250917
69 ```
7- 2 . Upgrade PyBind:
8- ```
10+
11+ 2 . Upgrade PyBind:
12+
13+ ``` shell
914 pip install --upgrade pybind11
1015 ```
16+
11173 . Install Aiter dev/perf branch:
12- ```
18+
19+ ``` shell
1320 pip uninstall aiter
1421 git clone -b dev/perf git@github.com:ROCm/aiter.git
1522 cd aiter
1623 git submodule sync && git submodule update --init --recursive
1724 python3 setup.py install
1825 ```
26+
19274 . Install Rocm/vLLM dev/perf branch:
20- ```
28+
29+ ``` shell
2130 pip uninstall vllm
2231 git clone -b dev/perf git@github.com:ROCm/vllm.git
2332 cd vllm
2635 python3 setup.py develop
2736 ```
2837
29- # Launch server
38+ ## Launch server
39+
30401 . deepseek-r1 PTPC FP8
31- - download weight: https://huggingface.co/EmbeddedLLM/deepseek-r1-FP8-Dynamic
32- ```
41+
42+ - download weight: < https://huggingface.co/EmbeddedLLM/deepseek-r1-FP8-Dynamic >
43+
44+ ``` shell
3345 huggingface-cli download EmbeddedLLM/deepseek-r1-FP8-Dynamic --local-dir EmbeddedLLM/deepseek-r1-FP8-Dynamic
3446 ```
47+
3548- launch server:
36- ```
49+
50+ ` ` ` shell
3751 bash launch_deepseekr1_ptpc_fp8.sh
3852 ` ` `
3953
4054 We currently use pure tp8 since it gives better performance than TP8 + EP8, which is subject to change as optimization continues.
4155
4256 The example command:
43- ```
57+
58+ ` ` ` shell
4459 export VLLM_USE_V1=1
4560 export SAFETENSORS_FAST_GPU=1
4661 export VLLM_ROCM_USE_AITER=1
6984 --block-size 1
7085 ` ` `
7186
72- # Curl request
87+ # # Curl request
88+
73891. curl a single request to quickly check the functionality
7490
75- ```
91+ ` ` ` shell
7692 curl -X POST " http://localhost:8000/v1/completions" \
7793 -H " Content-Type: application/json" \
7894 -d ' {
7995 "prompt": "The capital of China", "temperature": 0, "top_p": 1, "top_k": 0, "repetition_penalty": 1.0, "presence_penalty": 0, "frequency_penalty": 0, "stream": false, "ignore_eos": false, "n": 1, "seed": 123
8096 }'
8197 ` ` `
98+
8299 The result should be:
83- ```
100+
101+ ` ` ` shell
84102 {" id" :" cmpl-026a60769119489587e46d571b6ebb6a" ," object" :" text_completion" ," created" :1760272161," model" :" /mnt/raid0/zhangguopeng/deepseek-r1-FP8-Dynamic/" ," choices" :[{" index" :0,
85103 " text" :" is Beijing, and Shanghai is its most populous city by urban area population. China" ," logprobs" :null," finish_reason" :" length" ," stop_reason" :null," token_ids" :null," prompt_logprobs" :null," prompt_token_ids" :null}]," service_tier" :null," system_fingerprint" :null," usage" :{" prompt_tokens" :5," total_tokens" :21," completion_tokens" :16," prompt_tokens_details" :null}," kv_transfer_params" :null}
86104 ` ` `
87105
88- # Benchmark
106+ # # Benchmark
107+
891081. Take deepseek as example, you can use the following command to benchmark serve.
90- ```
109+
110+ ` ` ` shell
91111 model=" /path-to-model/deepseek-r1-FP8-Dynamic/"
92112 vllm bench serve \
93113 --host localhost \
106126 2>&1 | tee log.client.log
107127 ` ` `
108128
109- # Evaluation
129+ # # Evaluation
110130
111- ## Text Model Evaluation
131+ # ## Text Model Evaluation
112132
113133Text model is evaluated using lm-eval (< https://github.com/EleutherAI/lm-evaluation-harness.git> ).
114134
1151351. Install dependencies. ` python3 -m pip install lm_eval tenacity` .
1161362. Start lm-eval. Example:
117137
118- ```bash
138+ ` ` ` shell
119139 #! /bin/bash
120140 model=" /path-to-model/deepseek-r1-FP8-Dynamic/"
121141 lm_eval \
@@ -124,27 +144,32 @@ Text model is evaluated using lm-eval (<https://github.com/EleutherAI/lm-evaluat
124144 --model_args model=${model} ,base_url=http://127.0.0.1:8000/v1/completions \
125145 --batch_size 100
126146 ` ` `
147+
127148 The eager-mode result should be:
128- ```
149+
150+ ` ` ` shell
129151 | Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
130152 | -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
131153 | gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.9522| ± | 0.0059|
132154 | | | strict-match | 5| exact_match| ↑ | 0.9530| ± | 0.0058|
133155 ` ` `
156+
134157 The FULL_AND_PIECEWISE graph-mode result should be:
135- ```
158+
159+ ` ` ` shell
136160 | Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
137161 | -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
138162 | gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.9500| ± | 0.0060|
139163 | | | strict-match | 5| exact_match| ↑ | 0.9477| ± | 0.0061|
140164 ` ` `
165+
141166 ** Take notes:**
142167
143168 - It is required to set --batch_size to larger value as the default value is 1.
144169 Setting --batch_size > 1 to evaluate if the batching logic is correctly implemented or not.
145170 - Extra details: lm-eval send seed requests. Thus, in vLLM sampling class, it will use the per-request sampling.
146171
147- ## Visual Model Evaluation
172+ # ## Visual Model Evaluation
148173
149174Vision Language Model accuracy evualuation is done using the tool from
150175< https://github.com/EmbeddedLLM/mistral-evals.git> (it is modified from
@@ -153,7 +178,7 @@ Vision Language Model accuracy evualuation is done using the tool from
1531781. Install dependency. ` python3 -m pip install fire`
1541792. Launch vLLM server. Example:
155180
156- ```bash
181+ ` ` ` shell
157182 #! /bin/bash
158183 rm -rf /root/.cache/vllm
159184 export GPU_ARCHS=gfx942
@@ -170,7 +195,7 @@ Vision Language Model accuracy evualuation is done using the tool from
170195
1711963. Start evaluation. (Recommended chartqa dataset as the variance of the score is smaller). Example:
172197
173- ```bash
198+ ` ` ` shell
174199 #! /bin/bash
175200 pushd ./mistral-evals
176201 python3 -m eval.run eval_vllm \
@@ -184,7 +209,7 @@ Vision Language Model accuracy evualuation is done using the tool from
184209
185210 ** Take notes:** The batch size is hard coded to 32 in the repository.
186211
187- ## Helper script
212+ # ## Helper script
188213
189214The launch scripts are attached to give an idea what are the configuration that was validated
190215at some point in time that works.
0 commit comments