You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry that was a typo, and the task is boolq. We did our evaluation in JAX so there could be slide difference due to numerical precisions. Also please note that to correctly evaluate our model in lm-eval-harness, you need to change the lm-eval-harness code to avoid using the huggingface auto-converted fast tokenizer, as that tokenizer produces incorrect tokens sometimes. See this issue for more details.
We cannot find the "ddboolq" in lm-evaluation-harness.
We can only find the boolq task in the task list. And we run the boolq for the open-llama-3b, the result is different.
So want to know what is ddboolq in the evaluation?
The text was updated successfully, but these errors were encountered: