-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing Llama results from the paper #2
Comments
Hi, Thanks for reaching out. We provide an example template for Llama 3 series in the latest push so try again with |
i see. |
True, but I guess the real question I am asking is if the numbers in the preprint (https://arxiv.org/abs/2406.07835) used no template (i.e., |
@e-tornike Thanks for pointing this out. Here are our new scores on Llama2-7B and Llama3-8B with corresponding templates for reference (Tulu models in the preprint are fine):
This EOS issue won't happen in the recent push with new Thanks again for pointing out. Let me know if you have further questions! |
Hey there,
firstly, thanks for the nice work!
I am attempting to reproduce the results from the paper. I re-ran the experiments with 10 seeds (averaging the results). However, I am only reproducing the numbers for 5 of 7 of the tasks, which do not require an LLM judge.
My results are the following:
I am uncertain what the reason could be that the reproduced results for SciERC and SciFact are different compared to the original. Do you know what could be the cause of this?
There is a slight change in the
--model_args
due to memory issues. I addedgpu_memory_utilization
ormax_model_len
and removedtensor_parallel_size
. I am running the following command:And I am using the following seeds: 42, 1337, 9876, 12345, 999999, 98765, 5555, 2024, 267, 10.
I am running experiments on a single RTX A6000 (48 GB) using CUDA 12.4, driver version 550.90.07, and Python 3.10.13 with the following versions of the packages:
The text was updated successfully, but these errors were encountered: