Longbench: NV code to ipex-llm #11662

ATMxsp01 · 2024-07-26T03:00:11Z

Longbench: NV code to ipex-llm

python/llm/dev/benchmark/LongBench/config/model2path.json

python/llm/dev/benchmark/LongBench/pred_snap.py

python/llm/dev/benchmark/LongBench/config/model2maxlen.json

python/llm/dev/benchmark/LongBench/config.yaml

python/llm/dev/benchmark/LongBench/pred_snap.py

cyita · 2024-09-11T01:44:00Z

Please add the license (change the adapt file link) for all python files:
https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/perplexity/ppl.py#L1-L18

cyita · 2024-09-11T01:46:38Z

python/llm/dev/benchmark/LongBench/README.md

@@ -0,0 +1,93 @@
+# LongBench Benchmark Test
+
+LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory.


LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from xxxxx(https://github.com/THUDM/LongBench) and xxxx(https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench)

cyita · 2024-09-11T01:47:25Z

python/llm/dev/benchmark/LongBench/README.md

+
+Before running, make sure to have [ipex-llm](../../../../../README.md) installed.
+
+## Dependencies


Environment Preparation

cyita · 2024-09-11T01:48:46Z

python/llm/dev/benchmark/LongBench/README.md

+
+LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory.
+
+Before running, make sure to have [ipex-llm](../../../../../README.md) installed.


Move to env preparation.

cyita · 2024-09-11T01:50:57Z

python/llm/dev/benchmark/LongBench/README.md

+ # - "chatglm4-9b"
+ # - "qwen2-7b-instruct"
+
+# whether or not to test the full-kv score


whether test the full-kv

cyita · 2024-09-11T01:51:36Z

python/llm/dev/benchmark/LongBench/README.md

+
+# whether or not to test the full-kv score
+full_kv: True
+# whether or not to open optimize_model


Whether apply model optimization

cyita · 2024-09-11T01:53:40Z

python/llm/dev/benchmark/LongBench/README.md

+
+#### About compress-kv
+
+The rest json files are about compress-kv. 


The rest JSON files are compress-kv test configurations.

cyita · 2024-09-11T02:00:24Z

python/llm/dev/benchmark/LongBench/README.md

+pip install fuzzywuzzy
+pip install rouge
+```
+


Add Load Data section
You can download and load the LongBench data through the Hugging Face datasets (🤗 HF Repo):

from datasets import load_dataset datasets = ["narrativeqa", "qasper", "multifieldqa_en", "multifieldqa_zh", "hotpotqa", "2wikimqa", "musique", \ "dureader", "gov_report", "qmsum", "multi_news", "vcsum", "trec", "triviaqa", "samsum", "lsht", \ "passage_count", "passage_retrieval_en", "passage_retrieval_zh", "lcc", "repobench-p"] for dataset in datasets: data = load_dataset('THUDM/LongBench', dataset, split='test') data = load_dataset('THUDM/LongBench', f"{dataset}_e", split='test')

cyita · 2024-09-11T02:02:53Z

python/llm/dev/benchmark/LongBench/README.md

+
+## Run
+
+There are two python files for users' call.


Configure the config.yaml and run pred.py and you can obtain the output of the model under pred/ folder corresponding to the model name.

Run the evaluation code eval.py, you can get the evaluation results on all datasets in result.json.

cyita · 2024-09-11T02:03:47Z

python/llm/dev/benchmark/LongBench/README.md

+
+> [!Note]
+>
+> To test the models and get the score in a row, please run `test_and_eval.sh`


Add citation section

@article{bai2023longbench, title={LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding}, author={Bai, Yushi and Lv, Xin and Zhang, Jiajie and Lyu, Hongchang and Tang, Jiankai and Huang, Zhidian and Du, Zhengxiao and Liu, Xiao and Zeng, Aohan and Hou, Lei and Dong, Yuxiao and Tang, Jie and Li, Juanzi}, journal={arXiv preprint arXiv:2308.14508}, year={2023} }

cyita · 2024-09-12T10:09:40Z

python/llm/dev/benchmark/LongBench/README.md

@@ -1,10 +1,13 @@
 # LongBench Benchmark Test

+LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from [THUDM/LongBench](https://github.com/THUDM/LongBench) and [experiments/LongBench](https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench)


SnapKV/experiments/LongBench

cyita · 2024-09-12T10:11:29Z

python/llm/dev/benchmark/LongBench/eval.py

+# limitations under the License.
+#
+# This file is adapted from
+# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py


Please modify the corresponding file path.

cyita · 2024-09-12T10:11:33Z

python/llm/dev/benchmark/LongBench/metrics.py

+# limitations under the License.
+#
+# This file is adapted from
+# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py


Please modify the corresponding file path.

cyita · 2024-09-12T10:11:38Z

python/llm/dev/benchmark/LongBench/pred.py

+# limitations under the License.
+#
+# This file is adapted from
+# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py


Please modify the corresponding file path.

cyita · 2024-09-12T10:14:24Z

python/llm/dev/benchmark/LongBench/README.md

@@ -1,10 +1,13 @@
 # LongBench Benchmark Test

+LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from [THUDM/LongBench](https://github.com/THUDM/LongBench) and [experiments/LongBench](https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench)
+
 LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory.


Please remove this line.

cyita · 2024-09-18T03:11:28Z

python/llm/dev/benchmark/LongBench/eval.py

-# https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py
-#
+
+# This code is for evaluating the results of LongBench.


Please add the original file adapted from.

cyita · 2024-09-18T07:24:39Z

PR Validation: https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/10916854839

cyita and others added 6 commits July 23, 2024 11:39

add nv longbench

7f9e54f

LongBench: NV code to ipex-llm

3abce54

ammend

bffdd3f

add more models support

5ddeda9

ammend

6e92d0d

optimize LongBench's user experience

2b3b82c

cyita reviewed Aug 29, 2024

View reviewed changes

python/llm/dev/benchmark/LongBench/config/model2path.json Outdated Show resolved Hide resolved

cyita reviewed Aug 29, 2024

View reviewed changes

python/llm/dev/benchmark/LongBench/pred_snap.py Outdated Show resolved Hide resolved

cyita reviewed Aug 29, 2024

View reviewed changes

python/llm/dev/benchmark/LongBench/config/model2maxlen.json Show resolved Hide resolved

cyita reviewed Aug 29, 2024

View reviewed changes

python/llm/dev/benchmark/LongBench/config.yaml Outdated Show resolved Hide resolved

cyita reviewed Aug 29, 2024

View reviewed changes

python/llm/dev/benchmark/LongBench/pred_snap.py Outdated Show resolved Hide resolved

ammend

6d805c6

cyita reviewed Aug 29, 2024

View reviewed changes

python/llm/dev/benchmark/LongBench/pred_snap.py Outdated Show resolved Hide resolved

ATMxsp01 added 4 commits August 29, 2024 11:49

ammend

02aadc9

fix typo

f120090

ammend

ff0ec40

remove cuda related information & add a readme

76ac859

cyita reviewed Sep 11, 2024

View reviewed changes

add license to python scripts & polish the readme

0aecd8d

cyita reviewed Sep 12, 2024

View reviewed changes

ammend

1cef176

cyita reviewed Sep 18, 2024

View reviewed changes

ammend

55e3a44

cyita approved these changes Sep 18, 2024

View reviewed changes

cyita merged commit ee33b93 into intel-analytics:main Sep 18, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longbench: NV code to ipex-llm #11662

Longbench: NV code to ipex-llm #11662

ATMxsp01 commented Jul 26, 2024

cyita commented Sep 11, 2024 •

edited

Loading

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 11, 2024

cyita Sep 12, 2024

cyita Sep 12, 2024

cyita Sep 12, 2024

cyita Sep 12, 2024

cyita Sep 12, 2024

cyita Sep 12, 2024

cyita Sep 18, 2024

cyita commented Sep 18, 2024

		@@ -0,0 +1,93 @@
		# LongBench Benchmark Test

		LongBench Benchmark allows users to test LongBench benchmark and record them in some json files. Users can provide models and related information in `config.yaml` and `config` directory.


		Before running, make sure to have [ipex-llm](../../../../../README.md) installed.

		## Dependencies


		#### About compress-kv

		The rest json files are about compress-kv.

		@@ -1,10 +1,13 @@
		# LongBench Benchmark Test

		LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. This benchmark implementation is adapted from [THUDM/LongBench](https://github.com/THUDM/LongBench) and [experiments/LongBench](https://github.com/FasterDecoding/SnapKV/tree/main/experiments/LongBench)

Longbench: NV code to ipex-llm #11662

Longbench: NV code to ipex-llm #11662

Conversation

ATMxsp01 commented Jul 26, 2024

cyita commented Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyita commented Sep 18, 2024

cyita commented Sep 11, 2024 •

edited

Loading