LongBench-Chat评测Qwen2.5-72B-Instruct指标不一致问题 #1148
Unanswered
ZayIsAllYouNeed
asked this question in
Q&A
Replies: 1 comment
-
cc: @hzhwcmhf |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
尊敬的作者、同行,您好,我用LongBench-Chat的官方代码(https://github.com/THUDM/LongAlign/tree/main/LongBench_Chat)评测了Qwen2.5-72B-Instruct的128K上下文指标,
(将代码中的transformers推理更改为vllm推理,Qwen的config.json加上YaRN外推128K配置,其他参数均未变),
评估2次结果分别为8.42、8.44,没有达到技术报告上的8.72,请问评测过程有什么需要特别注意的地方吗?
我需要做什么以复现8.72的结果呢?
Beta Was this translation helpful? Give feedback.
All reactions