Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get a MT-bench evaluation code for reproduction of acceleration? #36

Open
je1lee opened this issue Dec 19, 2023 · 11 comments
Open

Comments

@je1lee
Copy link

je1lee commented Dec 19, 2023

Can I get a MT-bench evaluation code for reproduction of acceleration?

@zhisbug
Copy link
Contributor

zhisbug commented Dec 27, 2023

@Viol2000

@Viol2000
Copy link
Collaborator

MT bench scripts uploaded. See applications/run_mtbench.sh for examples.
Note that we currently support greedy search, so I set the temperature to 0

@qspang
Copy link

qspang commented Jan 11, 2024

llama2-7b-10-24.json
I encountered some problems when trying to evaluate mt_bench. The model used is llama2-7b-chat.l am using RTX3090 and the code used is as follows.The problem is that most of the choices recorded in the answerfile are the same, and the difference after that is to record their own questions, but there is no answer to this question! ! !
微信图片_20240111114320
The questionfile used is the questionfile of mt_bench, part of which is shown in the picture below.
微信图片_20240111114555
Partial screenshot of the answerfile generated based on the code.Each choice has the same features as shown below
微信图片_20240111120131

微信图片_20240111120135
There was no answer at the end, but the questions were recorded in it, as shown below
微信图片_20240111120139

@Viol2000
Copy link
Collaborator

hi @peoplekillerS , I just ran the script to reproduce your problem. However, I did not observe your situation. This is an answer file I just obtained.
llama2-7b-10-24.json

I was wondering which version of code you are using and if you have modified the code, as this is an uncommon situation.
The question file should be obtained by https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L2, and the chatbot you may use is meta-llama/Llama-2-7b-chat-hf. Hope this can help.

@qspang
Copy link

qspang commented Jan 11, 2024

Thanks to the author for the reply!!! Because I am using llama2-7b-chat-hf for testing, I am a little late in replying to you, sorry! In fact, I got the same results as you, but you can take a closer look at the json file you sent me above. If you search for 'Provide a variety of craft', you will find that basically every line has this answer. , is this a normal phenomenon? -------------------------------------------------- (Dividing line)---------------------------------------------- ---------------I just use the code of the main branch. The question of mt_bench is downloaded from the link of your run_mtbench.sh file. No code has been changed. The command is the one in the picture above. Commands, differences: 1. Use llama2-7b-chat instead of llama2-7b-chat-hf. 2. The --use-pp parameter is 1, because every time I set it to 0, An error will be reported: NotImplementedError: Cannot copy out of meta tensor; no data! (I am using RTX 3090) The above is the difference. I would like to ask the author if he would consider creating an eval_mtbench version of llama2-7b-chat?And do you know how to solve the problem that occurs when --use pp is set to 0?

@Viol2000
Copy link
Collaborator

It is a normal phenonmenon to have the same line in every answer. We use fastchat to generate a conversation template and this line https://github.com/lm-sys/FastChat/blob/6ff8505ec80fc4b04d668f65d229f4f58bc449e0/fastchat/conversation.py#L365 is included in every prompt.

@Viol2000
Copy link
Collaborator

Could you provide a more detailed error report you encountered when you set use-pp to 0? It may be a version mismatch. You can use the latest code and install the latest dependencies. I guess the llama2-7b-chat format is not compatible with the huggingface format. Because we use transformers lib, a model weight compatible with transformers lib (i.e., llama2-7b-chat-hf) is needed.

@qspang
Copy link

qspang commented Jan 11, 2024

Thank you very much again to the author for your reply! ! ! Sorry, I got confused. I just retested llama2-7b-chat-hf. Setting the parameter --use pp to 0, it works normally! The problem should be that when applying the llama2-7b-chat model, when --use pp is set to 0, the error in the picture below will appear.image

@Viol2000
Copy link
Collaborator

Yeah, using llama2-7b-chat-hf should be correct. llama2-7b-chat model is not compatible with transformers. I guess this is the problem.

@qspang
Copy link

qspang commented Jan 11, 2024

Will you consider a version of eval_mtbench for llama2-7b-chat?

@Viol2000
Copy link
Collaborator

Currently, I am not considering supporting llama2-7b-chat. From its website, we can find that we need to use https://github.com/facebookresearch/llama to support this model weights. While I plan to minimize the maintenance efforts to support most models and, supporting huggingface's transformers is the simplest way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants