Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama results in ASQA #25

Open
Guanyu-Lin opened this issue May 13, 2024 · 1 comment
Open

Llama results in ASQA #25

Guanyu-Lin opened this issue May 13, 2024 · 1 comment

Comments

@Guanyu-Lin
Copy link

Hi,

After I run the code with config for llama 7b on ASQA (https://github.com/princeton-nlp/ALCE/blob/main/configs/asqa_llama-7b_shot1_ndoc3_gtr_default.yaml), I get the result as below.
{
"length": 88.25,
"str_em": 24.326652601969055,
"str_hit": 7.59493670886076,
"QA-EM": 10.161744022503516,
"QA-F1": 14.520007320055194,
"QA-Hit": 1.160337552742616,
"mauve": 59.28902389177575
}
It seems the mauve is different from the results reported in Table 19 of the paper with mauve at 69.8. Is there some problem?

Besides, does the Correct (EM Rec.) is the "str_em" here instead of "QA-EM"?

@gaotianyu1350
Copy link
Member

Hi,

str_em corresponds to the EM Rec. I'm not sure what it is with Mauve but it is known to be highly unstable (a slight change of environments might lead to drastically different results). We mostly use Mauve as a sanity check so I wouldn't worry too much about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants