Skip to content

Commit fb31d2c

Browse files
committed
[Doc] Modify review comments.
Signed-off-by: menogrey <1299267905@qq.com>
1 parent 4026eea commit fb31d2c

File tree

3 files changed

+218
-150
lines changed

3 files changed

+218
-150
lines changed

docs/source/developer_guide/evaluation/using_ais_bench.md

Lines changed: 102 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,13 @@ docker run --rm \
2929
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
3030
-it $IMAGE \
3131
/bin/bash
32-
vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 4096 &
32+
vllm serve Qwen/Qwen2.5-0.5B-Instruct --max_model_len 35000 &
3333
```
3434

35+
:::{note}
36+
`--max_model_len` should be greater than `35000`, this will be suitable for most datasets. Otherwise the accuracy evaluation may be affected.
37+
:::
38+
3539
The vLLM server is started successfully, if you see logs as below:
3640

3741
```
@@ -40,7 +44,7 @@ INFO: Waiting for application startup.
4044
INFO: Application startup complete.
4145
```
4246

43-
### 2. Run C-Eval dataset using AISBench
47+
### 2. Run different dataset using AISBench
4448

4549
#### Install AISBench
4650

@@ -64,6 +68,10 @@ Run `ais_bench -h` to check the installation.
6468

6569
#### Download Dataset
6670

71+
You can choose one or multiple datasets to execute accuracy evaluation.
72+
73+
1. `C-Eval` dataset.
74+
6775
Take `C-Eval` dataset as an example. And you can refer to [Datasets](https://gitee.com/aisbench/benchmark/tree/master/ais_bench/benchmark/configs/datasets) for more datasets. Every datasets have a `README.md` for detailed download and installation process.
6876

6977
Download dataset and install it to specific path.
@@ -78,15 +86,70 @@ unzip ceval-exam.zip
7886
rm ceval-exam.zip
7987
```
8088

81-
#### Update Model Config Python File
89+
2. `MMLU` dataset.
90+
91+
```shell
92+
cd ais_bench/datasets
93+
wget http://opencompass.oss-cn-shanghai.aliyuncs.com/datasets/data/mmlu.zip
94+
unzip mmlu.zip
95+
rm mmlu.zip
96+
```
97+
98+
3. `GPQA` dataset.
99+
100+
```shell
101+
cd ais_bench/datasets
102+
wget http://opencompass.oss-cn-shanghai.aliyuncs.com/datasets/data/gpqa.zip
103+
unzip gpqa.zip
104+
rm gpqa.zip
105+
```
106+
107+
4. `MATH` dataset.
108+
109+
```shell
110+
cd ais_bench/datasets
111+
wget http://opencompass.oss-cn-shanghai.aliyuncs.com/datasets/data/math.zip
112+
unzip math.zip
113+
rm math.zip
114+
```
115+
116+
5. `LiveCodeBench` dataset.
117+
118+
```shell
119+
cd ais_bench/datasets
120+
git lfs install
121+
git clone https://huggingface.co/datasets/livecodebench/code_generation_lite
122+
```
123+
124+
6. `AIME 2024` dataset.
125+
126+
```shell
127+
cd ais_bench/datasets
128+
mkdir aime/
129+
cd aime/
130+
wget http://opencompass.oss-cn-shanghai.aliyuncs.com/datasets/data/aime.zip
131+
unzip aime.zip
132+
rm aime.zip
133+
```
134+
135+
7. `GSM8K` dataset.
136+
137+
```shell
138+
cd ais_bench/datasets
139+
wget http://opencompass.oss-cn-shanghai.aliyuncs.com/datasets/data/gsm8k.zip
140+
unzip gsm8k.zip
141+
rm gsm8k.zip
142+
```
143+
144+
#### Configuration
82145

83146
Update the file `benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py`.
84147
There are several arguments that you should update according to your environment.
85148

86149
- `path`: Update to your model weight path.
87150
- `model`: Update to your model name in vLLM.
88151
- `host_ip` and `host_port`: Update to your vLLM server ip and port.
89-
- `max_out_len`: Note `max_out_len` + LLM input length should be less than `max-model-len`(config in your vllm server).
152+
- `max_out_len`: Note `max_out_len` + LLM input length should be less than `max-model-len`(config in your vllm server), `32768` will be suitable for most datasets.
90153
- `batch_size`: Update according to your dataset.
91154
- `temperature`: Update inference argument.
92155

@@ -123,13 +186,30 @@ models = [
123186

124187
#### Execute Accuracy Evaluation
125188

126-
Run the following code to execute the accuracy evaluation.
189+
Run the following code to execute different accuracy evaluation.
127190

128191
```shell
192+
# run C-Eval dataset
129193
ais_bench --models vllm_api_general_chat --datasets ceval_gen_0_shot_cot_chat_prompt.py --mode all --dump-eval-details --merge-ds
194+
195+
# run MMLU dataset
196+
ais_bench --models vllm_api_general_chat --datasets mmlu_gen_0_shot_cot_chat_prompt.py --mode all --dump-eval-details --merge-ds
197+
198+
# run GPQA dataset
199+
ais_bench --models vllm_api_general_chat --datasets gpqa_gen_0_shot_str.py --mode all --dump-eval-details --merge-ds
200+
201+
# run MATH-500 dataset
202+
ais_bench --models vllm_api_general_chat --datasets math500_gen_0_shot_cot_chat_prompt.py --mode all --dump-eval-details --merge-ds
203+
204+
# run LiveCodeBench dataset
205+
ais_bench --models vllm_api_general_chat --datasets livecodebench_code_generate_lite_gen_0_shot_chat.py --mode all --dump-eval-details --merge-ds
206+
207+
# run AIME 2024 dataset
208+
ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_prompt.py --mode all --dump-eval-details --merge-ds
209+
130210
```
131211

132-
After execution, you can get the result from saved files such as `outputs/default/20250628_151326`, there is an example as follows:
212+
After each dataset execution, you can get the result from saved files such as `outputs/default/20250628_151326`, there is an example as follows:
133213

134214
```
135215
20250628_151326/
@@ -157,7 +237,23 @@ After execution, you can get the result from saved files such as `outputs/defaul
157237
#### Execute Performance Evaluation
158238

159239
```shell
240+
# run C-Eval dataset
160241
ais_bench --models vllm_api_general_chat --datasets ceval_gen_0_shot_cot_chat_prompt.py --summarizer default_perf --mode perf
242+
243+
# run MMLU dataset
244+
ais_bench --models vllm_api_general_chat --datasets mmlu_gen_0_shot_cot_chat_prompt.py --summarizer default_perf --mode perf
245+
246+
# run GPQA dataset
247+
ais_bench --models vllm_api_general_chat --datasets gpqa_gen_0_shot_str.py --summarizer default_perf --mode perf
248+
249+
# run MATH-500 dataset
250+
ais_bench --models vllm_api_general_chat --datasets math500_gen_0_shot_cot_chat_prompt.py --summarizer default_perf --mode perf
251+
252+
# run LiveCodeBench dataset
253+
ais_bench --models vllm_api_general_chat --datasets livecodebench_code_generate_lite_gen_0_shot_chat.py --summarizer default_perf --mode perf
254+
255+
# run AIME 2024 dataset
256+
ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_prompt.py --summarizer default_perf --mode perf
161257
```
162258

163259
After execution, you can get the result from saved files, there is an example as follows:

docs/source/installation.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -297,12 +297,14 @@ Prompt: 'The future of AI is', Generated text: ' not bright\n\nThere is no doubt
297297
## Multi-node Deployment
298298
### Verify Multi-Node Communication
299299

300+
First, check physical layer connectivity, then verify each node, and finally verify the inter-node connectivity.
301+
300302
#### Physical Layer Requirements:
301303

302304
- The physical machines must be located on the same WLAN, with network connectivity.
303305
- All NPUs are connected with optical modules, and the connection status must be normal.
304306

305-
#### Verification Process:
307+
#### Each Node Verification:
306308

307309
Execute the following commands on each node in sequence. The results must all be `success` and the status must be `UP`:
308310

@@ -345,7 +347,7 @@ Execute the following commands on each node in sequence. The results must all be
345347
::::
346348
:::::
347349

348-
#### NPU Interconnect Verification:
350+
#### Interconnect Verification:
349351
##### 1. Get NPU IP Addresses
350352
:::::{tab-set}
351353
::::{tab-item} A2 series

0 commit comments

Comments
 (0)