What is ddboolq in the evaluation? We cannot find the "ddboolq" task in lm-evaluation-harness. #47

chi2liu · 2023-06-13T06:56:16Z

We cannot find the "ddboolq" in lm-evaluation-harness.

We can only find the boolq task in the task list. And we run the boolq for the open-llama-3b, the result is different.

So want to know what is ddboolq in the evaluation?

young-geng · 2023-06-13T09:32:37Z

Sorry that was a typo, and the task is boolq. We did our evaluation in JAX so there could be slide difference due to numerical precisions. Also please note that to correctly evaluate our model in lm-eval-harness, you need to change the lm-eval-harness code to avoid using the huggingface auto-converted fast tokenizer, as that tokenizer produces incorrect tokens sometimes. See this issue for more details.

young-geng closed this as completed Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is ddboolq in the evaluation? We cannot find the "ddboolq" task in lm-evaluation-harness. #47

What is ddboolq in the evaluation? We cannot find the "ddboolq" task in lm-evaluation-harness. #47

chi2liu commented Jun 13, 2023

young-geng commented Jun 13, 2023

What is ddboolq in the evaluation? We cannot find the "ddboolq" task in lm-evaluation-harness. #47

What is ddboolq in the evaluation? We cannot find the "ddboolq" task in lm-evaluation-harness. #47

Comments

chi2liu commented Jun 13, 2023

young-geng commented Jun 13, 2023