-
Notifications
You must be signed in to change notification settings - Fork 995
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set Model Temperature to 0 for Consistent Leaderboard Results #500
Comments
What do folks thinks about this? I'm mostly ok as long as we are consistent across all models. |
I fully agree with a much lower temperature. Temperature 0.7 is extremely high. However, if I remember correctly it cannot be exactly zero for some models I have tried but strictly positive. Therefore, I think a positive but much smaller value would be better, for example Temperature=0.01. |
hello. |
Thank you @aastroza and @hexists for weighing in. Ok @HuanzhiMao let's go with a lower temperature then maybe something like 0.1? But this would change fundamentally all numbers in the leaderboard. So, once we land all the existing PRs we can do this? I'll keep this issue open. |
Yea agree. Let's wait till all PR are merged and then we update this. |
The current model response generation script uses a default temperature of 0.7 for inference. This introduces some degree of randomness into the model output generation, leading to potential variability in the evaluation scores from run to run. For benchmarking purposes, we set it to 0.001 for consistency and reliability of the evaluation results. resolves #500 , resolves #562 This will affect the leaderboard score. We will update it shortly. --------- Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>
The current model response generation script uses a default temperature of 0.7 for inference. This introduces some degree of randomness into the model output generation, leading to potential variability in the evaluation scores from run to run. For benchmarking purposes, we set it to 0.001 for consistency and reliability of the evaluation results. resolves ShishirPatil#500 , resolves ShishirPatil#562 This will affect the leaderboard score. We will update it shortly. --------- Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
The current model generation script (model_handlers) uses a default temperature of 0.7 for inference. This introduces some degree of randomness into the model output generation, leading to potential variability in the evaluation scores from run to run.
For benchmarking purposes, we should set it to 0 for consistency and reliability of the evaluation results.
The text was updated successfully, but these errors were encountered: