-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code for evaluation of Euros Models #2
Comments
Thanks for your interest! We are working on it right now and we will release the evaluation code soon. |
Thanks @cgq15, in the meantime could you let me know the generation configs used for different task types? From what I have been able to reproduce the numbers are not consistent with Table 3 of your paper (see below), and I think it can be attributed to generation configs such as Reproduction Study: Note: I have loaded the model with 8-bit quantization. |
Hi, we are releasing the eval code today. So please stay tuned. |
Hi @archiki, we have released the eval code. Enjoy! |
Thanks a lot! @lifan-yuan can you clarify if the performance reported on MBPP and HumanEval are on the regular test sets or the EvalPlus suites? TIA! |
Can you add the code for reproducing the main results in the paper for various math and coding datasets, along with their prompts and the data splits used?
The text was updated successfully, but these errors were encountered: