Human baseline #3

vlievin · 2022-06-01T08:24:07Z

Hi! Is there a known human baseline for this dataset (open and closed book)? Or maybe the required score to pass the exam?

jind11 · 2022-06-03T18:22:19Z

Nope, but you can use 60 out f 100 score as a reference score, which is the passing score for the Med Exam.

vlievin · 2022-06-10T10:04:59Z

Thank you for the info! So, just to make sure we are aligned here: does that means 60% answering accuracy for the US, TW and MC datasets?

jind11 · 2022-06-10T18:42:57Z

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

miraculixx · 2024-05-01T08:20:32Z

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

Can you elaborate on your reasoning here please? It seems a somewhat flawed heuristic to assume human performance on this particular QA dataset [which to my understanding is generated?] would be approximately equivalent to the outcome of humans being tested in actual exams.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Human baseline #3

Human baseline #3

vlievin commented Jun 1, 2022

jind11 commented Jun 3, 2022

vlievin commented Jun 10, 2022

jind11 commented Jun 10, 2022

miraculixx commented May 1, 2024

Human baseline #3

Human baseline #3

Comments

vlievin commented Jun 1, 2022

jind11 commented Jun 3, 2022

vlievin commented Jun 10, 2022

jind11 commented Jun 10, 2022

miraculixx commented May 1, 2024