You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After I run the code with config for llama 7b on ASQA (https://github.com/princeton-nlp/ALCE/blob/main/configs/asqa_llama-7b_shot1_ndoc3_gtr_default.yaml), I get the result as below.
{
"length": 88.25,
"str_em": 24.326652601969055,
"str_hit": 7.59493670886076,
"QA-EM": 10.161744022503516,
"QA-F1": 14.520007320055194,
"QA-Hit": 1.160337552742616,
"mauve": 59.28902389177575
}
It seems the mauve is different from the results reported in Table 19 of the paper with mauve at 69.8. Is there some problem?
Besides, does the Correct (EM Rec.) is the "str_em" here instead of "QA-EM"?
The text was updated successfully, but these errors were encountered:
str_em corresponds to the EM Rec. I'm not sure what it is with Mauve but it is known to be highly unstable (a slight change of environments might lead to drastically different results). We mostly use Mauve as a sanity check so I wouldn't worry too much about it.
Hi,
After I run the code with config for llama 7b on ASQA (https://github.com/princeton-nlp/ALCE/blob/main/configs/asqa_llama-7b_shot1_ndoc3_gtr_default.yaml), I get the result as below.
{
"length": 88.25,
"str_em": 24.326652601969055,
"str_hit": 7.59493670886076,
"QA-EM": 10.161744022503516,
"QA-F1": 14.520007320055194,
"QA-Hit": 1.160337552742616,
"mauve": 59.28902389177575
}
It seems the mauve is different from the results reported in Table 19 of the paper with mauve at 69.8. Is there some problem?
Besides, does the Correct (EM Rec.) is the "str_em" here instead of "QA-EM"?
The text was updated successfully, but these errors were encountered: