Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human baseline #3

Open
vlievin opened this issue Jun 1, 2022 · 4 comments
Open

Human baseline #3

vlievin opened this issue Jun 1, 2022 · 4 comments

Comments

@vlievin
Copy link

vlievin commented Jun 1, 2022

Hi! Is there a known human baseline for this dataset (open and closed book)? Or maybe the required score to pass the exam?

@jind11
Copy link
Owner

jind11 commented Jun 3, 2022

Nope, but you can use 60 out f 100 score as a reference score, which is the passing score for the Med Exam.

@vlievin
Copy link
Author

vlievin commented Jun 10, 2022

Thank you for the info! So, just to make sure we are aligned here: does that means 60% answering accuracy for the US, TW and MC datasets?

@jind11
Copy link
Owner

jind11 commented Jun 10, 2022

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

@miraculixx
Copy link

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

Can you elaborate on your reasoning here please? It seems a somewhat flawed heuristic to assume human performance on this particular QA dataset [which to my understanding is generated?] would be approximately equivalent to the outcome of humans being tested in actual exams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants