Can I get a MT-bench evaluation code for reproduction of acceleration? #36

je1lee · 2023-12-19T06:27:06Z

Can I get a MT-bench evaluation code for reproduction of acceleration?

zhisbug · 2023-12-27T05:58:45Z

@Viol2000

Viol2000 · 2023-12-27T12:21:08Z

MT bench scripts uploaded. See applications/run_mtbench.sh for examples.
Note that we currently support greedy search, so I set the temperature to 0

qspang · 2024-01-11T04:03:20Z

llama2-7b-10-24.json
I encountered some problems when trying to evaluate mt_bench. The model used is llama2-7b-chat.l am using RTX3090 and the code used is as follows.The problem is that most of the choices recorded in the answerfile are the same, and the difference after that is to record their own questions, but there is no answer to this question! ! !

The questionfile used is the questionfile of mt_bench, part of which is shown in the picture below.

Partial screenshot of the answerfile generated based on the code.Each choice has the same features as shown below

There was no answer at the end, but the questions were recorded in it, as shown below

Viol2000 · 2024-01-11T05:36:20Z

hi @peoplekillerS , I just ran the script to reproduce your problem. However, I did not observe your situation. This is an answer file I just obtained.
llama2-7b-10-24.json

I was wondering which version of code you are using and if you have modified the code, as this is an uncommon situation.
The question file should be obtained by https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L2, and the chatbot you may use is meta-llama/Llama-2-7b-chat-hf. Hope this can help.

qspang · 2024-01-11T08:48:14Z

Thanks to the author for the reply!!! Because I am using llama2-7b-chat-hf for testing, I am a little late in replying to you, sorry! In fact, I got the same results as you, but you can take a closer look at the json file you sent me above. If you search for 'Provide a variety of craft', you will find that basically every line has this answer. , is this a normal phenomenon? -------------------------------------------------- (Dividing line)---------------------------------------------- ---------------I just use the code of the main branch. The question of mt_bench is downloaded from the link of your run_mtbench.sh file. No code has been changed. The command is the one in the picture above. Commands, differences: 1. Use llama2-7b-chat instead of llama2-7b-chat-hf. 2. The --use-pp parameter is 1, because every time I set it to 0, An error will be reported: NotImplementedError: Cannot copy out of meta tensor; no data! (I am using RTX 3090) The above is the difference. I would like to ask the author if he would consider creating an eval_mtbench version of llama2-7b-chat?And do you know how to solve the problem that occurs when --use pp is set to 0?

Viol2000 · 2024-01-11T09:06:14Z

It is a normal phenonmenon to have the same line in every answer. We use fastchat to generate a conversation template and this line https://github.com/lm-sys/FastChat/blob/6ff8505ec80fc4b04d668f65d229f4f58bc449e0/fastchat/conversation.py#L365 is included in every prompt.

Viol2000 · 2024-01-11T09:11:11Z

Could you provide a more detailed error report you encountered when you set use-pp to 0? It may be a version mismatch. You can use the latest code and install the latest dependencies. I guess the llama2-7b-chat format is not compatible with the huggingface format. Because we use transformers lib, a model weight compatible with transformers lib (i.e., llama2-7b-chat-hf) is needed.

qspang · 2024-01-11T09:25:21Z

Thank you very much again to the author for your reply! ! ! Sorry, I got confused. I just retested llama2-7b-chat-hf. Setting the parameter --use pp to 0, it works normally! The problem should be that when applying the llama2-7b-chat model, when --use pp is set to 0, the error in the picture below will appear.

Viol2000 · 2024-01-11T09:35:23Z

Yeah, using llama2-7b-chat-hf should be correct. llama2-7b-chat model is not compatible with transformers. I guess this is the problem.

qspang · 2024-01-11T09:44:42Z

Will you consider a version of eval_mtbench for llama2-7b-chat?

Viol2000 · 2024-01-11T09:49:36Z

Currently, I am not considering supporting llama2-7b-chat. From its website, we can find that we need to use https://github.com/facebookresearch/llama to support this model weights. While I plan to minimize the maintenance efforts to support most models and, supporting huggingface's transformers is the simplest way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I get a MT-bench evaluation code for reproduction of acceleration? #36

Can I get a MT-bench evaluation code for reproduction of acceleration? #36

je1lee commented Dec 19, 2023

zhisbug commented Dec 27, 2023

Viol2000 commented Dec 27, 2023

qspang commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024 •

edited

Loading

Viol2000 commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

Can I get a MT-bench evaluation code for reproduction of acceleration? #36

Can I get a MT-bench evaluation code for reproduction of acceleration? #36

Comments

je1lee commented Dec 19, 2023

zhisbug commented Dec 27, 2023

Viol2000 commented Dec 27, 2023

qspang commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024 • edited Loading

Viol2000 commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024

Viol2000 commented Jan 11, 2024

qspang commented Jan 11, 2024 •

edited

Loading