Set Model Temperature to 0 for Consistent Leaderboard Results #500

HuanzhiMao · 2024-07-05T23:22:59Z

The current model generation script (model_handlers) uses a default temperature of 0.7 for inference. This introduces some degree of randomness into the model output generation, leading to potential variability in the evaluation scores from run to run.
For benchmarking purposes, we should set it to 0 for consistency and reliability of the evaluation results.

ShishirPatil · 2024-07-07T22:08:26Z

What do folks thinks about this? I'm mostly ok as long as we are consistent across all models.

alonsosilvaallende · 2024-07-22T03:49:47Z

I fully agree with a much lower temperature. Temperature 0.7 is extremely high. However, if I remember correctly it cannot be exactly zero for some models I have tried but strictly positive. Therefore, I think a positive but much smaller value would be better, for example Temperature=0.01.

hexists · 2024-07-25T04:17:04Z

hello.
I think lowering the temperature is a good idea.
I think it would be good to set the temperature to 0 or at least a low value close to 0 for reproducibility.
Thank you for providing the BFCL.

ShishirPatil · 2024-07-25T07:10:21Z

Thank you @aastroza and @hexists for weighing in. Ok @HuanzhiMao let's go with a lower temperature then maybe something like 0.1? But this would change fundamentally all numbers in the leaderboard. So, once we land all the existing PRs we can do this? I'll keep this issue open.

HuanzhiMao · 2024-07-25T07:13:07Z

Thank you @aastroza and @hexists for weighing in. Ok @HuanzhiMao let's go with a lower temperature then maybe something like 0.1? But this would change fundamentally all numbers in the leaderboard. So, once we land all the existing PRs we can do this? I'll keep this issue open.

Yea agree. Let's wait till all PR are merged and then we update this.

The current model response generation script uses a default temperature of 0.7 for inference. This introduces some degree of randomness into the model output generation, leading to potential variability in the evaluation scores from run to run. For benchmarking purposes, we set it to 0.001 for consistency and reliability of the evaluation results. resolves #500 , resolves #562 This will affect the leaderboard score. We will update it shortly. --------- Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>

The current model response generation script uses a default temperature of 0.7 for inference. This introduces some degree of randomness into the model output generation, leading to potential variability in the evaluation scores from run to run. For benchmarking purposes, we set it to 0.001 for consistency and reliability of the evaluation results. resolves ShishirPatil#500 , resolves ShishirPatil#562 This will affect the leaderboard score. We will update it shortly. --------- Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>

HuanzhiMao mentioned this issue Jul 6, 2024

[BFCL] Inconsistency in leaderboard scores #493

Closed

HuanzhiMao mentioned this issue Jul 25, 2024

[BFCL] Is this the same data used for the leaderboard? #546

Closed

Repository owner locked and limited conversation to collaborators Jul 29, 2024

ShishirPatil converted this issue into discussion #562 Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Set Model Temperature to 0 for Consistent Leaderboard Results #500

Set Model Temperature to 0 for Consistent Leaderboard Results #500

HuanzhiMao commented Jul 5, 2024

ShishirPatil commented Jul 7, 2024

alonsosilvaallende commented Jul 22, 2024

hexists commented Jul 25, 2024

ShishirPatil commented Jul 25, 2024

HuanzhiMao commented Jul 25, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Set Model Temperature to 0 for Consistent Leaderboard Results #500

Set Model Temperature to 0 for Consistent Leaderboard Results #500

Comments

HuanzhiMao commented Jul 5, 2024

ShishirPatil commented Jul 7, 2024

alonsosilvaallende commented Jul 22, 2024

hexists commented Jul 25, 2024

ShishirPatil commented Jul 25, 2024

HuanzhiMao commented Jul 25, 2024

This issue was moved to a discussion.