-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Longbench: NV code to ipex-llm #11662
Conversation
Please add the license (change the adapt file link) for all python files: |
@@ -0,0 +1,93 @@ | |||
# LongBench Benchmark Test | |||
|
|||
LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from xxxxx(https://github.com/THUDM/LongBench) and xxxx(https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench)
|
||
Before running, make sure to have [ipex-llm](../../../../../README.md) installed. | ||
|
||
## Dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment Preparation
|
||
LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory. | ||
|
||
Before running, make sure to have [ipex-llm](../../../../../README.md) installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to env preparation.
# - "chatglm4-9b" | ||
# - "qwen2-7b-instruct" | ||
|
||
# whether or not to test the full-kv score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether test the full-kv
|
||
# whether or not to test the full-kv score | ||
full_kv: True | ||
# whether or not to open optimize_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether apply model optimization
|
||
#### About compress-kv | ||
|
||
The rest json files are about compress-kv. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest JSON files are compress-kv test configurations.
pip install fuzzywuzzy | ||
pip install rouge | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Load Data section
You can download and load the LongBench data through the Hugging Face datasets (🤗 HF Repo):
from datasets import load_dataset
datasets = ["narrativeqa", "qasper", "multifieldqa_en", "multifieldqa_zh", "hotpotqa", "2wikimqa", "musique", \
"dureader", "gov_report", "qmsum", "multi_news", "vcsum", "trec", "triviaqa", "samsum", "lsht", \
"passage_count", "passage_retrieval_en", "passage_retrieval_zh", "lcc", "repobench-p"]
for dataset in datasets:
data = load_dataset('THUDM/LongBench', dataset, split='test')
data = load_dataset('THUDM/LongBench', f"{dataset}_e", split='test')
|
||
## Run | ||
|
||
There are two python files for users' call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Configure the config.yaml and run
pred.py
and you can obtain the output of the model underpred/
folder corresponding to the model name. - Run the evaluation code
eval.py
, you can get the evaluation results on all datasets inresult.json
.
|
||
> [!Note] | ||
> | ||
> To test the models and get the score in a row, please run `test_and_eval.sh` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add citation section
@article{bai2023longbench,
title={LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding},
author={Bai, Yushi and Lv, Xin and Zhang, Jiajie and Lyu, Hongchang and Tang, Jiankai and Huang, Zhidian and Du, Zhengxiao and Liu, Xiao and Zeng, Aohan and Hou, Lei and Dong, Yuxiao and Tang, Jie and Li, Juanzi},
journal={arXiv preprint arXiv:2308.14508},
year={2023}
}
@@ -1,10 +1,13 @@ | |||
# LongBench Benchmark Test | |||
|
|||
LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from [THUDM/LongBench](https://github.com/THUDM/LongBench) and [experiments/LongBench](https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapKV/experiments/LongBench
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add .
# limitations under the License. | ||
# | ||
# This file is adapted from | ||
# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the corresponding file path.
# limitations under the License. | ||
# | ||
# This file is adapted from | ||
# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the corresponding file path.
# limitations under the License. | ||
# | ||
# This file is adapted from | ||
# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the corresponding file path.
@@ -1,10 +1,13 @@ | |||
# LongBench Benchmark Test | |||
|
|||
LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from [THUDM/LongBench](https://github.com/THUDM/LongBench) and [experiments/LongBench](https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench) | |||
|
|||
LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line.
# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py | ||
# | ||
|
||
# This code is for evaluating the results of LongBench. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the original file adapted from.
Longbench: NV code to ipex-llm