Code for evaluation of Euros Models #2

archiki · 2024-04-09T17:59:06Z

Can you add the code for reproducing the main results in the paper for various math and coding datasets, along with their prompts and the data splits used?

cgq15 · 2024-04-10T01:28:24Z

Thanks for your interest! We are working on it right now and we will release the evaluation code soon.

archiki · 2024-04-16T19:21:33Z

Thanks @cgq15, in the meantime could you let me know the generation configs used for different task types? From what I have been able to reproduce the numbers are not consistent with Table 3 of your paper (see below), and I think it can be attributed to generation configs such as temperature, top_p, top_k, and do_sample as I am using the same prompts as listed in the paper.

Reproduction Study:
Dataset: HumanEval, Reported Performance: 55.5, Performance Obtained: 47.56 (temp=0.2, top_p=0.9, top_k=50)
Dataset: MATH, Reported Performance: 32.6, Performance Obtained: 29.6 (PoT with above parameters)

Note: I have loaded the model with 8-bit quantization.

cgq15 · 2024-04-17T02:07:20Z

Hi, we are releasing the eval code today. So please stay tuned.
For hyperparameters, we set temperature=0, i.e. greedy decoding for all coding and math tasks. Also, we test the models with float16 weights.

lifan-yuan · 2024-04-18T18:00:37Z

Hi @archiki,

we have released the eval code. Enjoy!

archiki · 2024-05-14T22:53:29Z

Thanks a lot! @lifan-yuan can you clarify if the performance reported on MBPP and HumanEval are on the regular test sets or the EvalPlus suites? TIA!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code for evaluation of Euros Models #2

Code for evaluation of Euros Models #2

archiki commented Apr 9, 2024

cgq15 commented Apr 10, 2024

archiki commented Apr 16, 2024

cgq15 commented Apr 17, 2024

lifan-yuan commented Apr 18, 2024

archiki commented May 14, 2024

Code for evaluation of Euros Models #2

Code for evaluation of Euros Models #2

Comments

archiki commented Apr 9, 2024

cgq15 commented Apr 10, 2024

archiki commented Apr 16, 2024

cgq15 commented Apr 17, 2024

lifan-yuan commented Apr 18, 2024

archiki commented May 14, 2024