Errors in answers #11

chrisc36 · 2019-07-01T08:34:31Z

I have encountered a few cases where the "answers" field appears to include erroneous whitepace, or is missing a hyphen, and as a result some questions are impossible to answer.

For example:
RACE.json.gz, for question f69d72de082a4fe6bcebab8301ca52d1 the answers are:

"The positive effects of early- life exercise."

However the passage text only contains the phrase:

".... the positive effects of early-life exercise lasted for only one week"

Or for MrqaBioASQ, question 78f9bca0ee664b74b0be699e63138b9b, the answers are:

["Interferon signature", "IFN signature"]

but the only related passage phrase is:

"...for the IFN-signature as a..."

As a result it looks like it would be impossible to get an EM score one 1 on these questions if using a purely extractive approach. You can still retrieve a valid answer using the character spans, but evaluation script uses the "answer" field so it will fail models on those questions.

So far all the errors of this sort that I have seen are related to hyphens.

The text was updated successfully, but these errors were encountered:

ajfisch · 2019-07-01T09:03:43Z

Hi Chris, thanks for bringing this to our attention! Indeed you're right, it's due to a mismatch in tokenizer for the SQuAD-style eval and spaCy. I'll take a look and see how we can fix that discrepancy best, for full fairness in eval.

ajfisch · 2019-07-10T04:55:25Z

Hi Chris, sorry for the delay! I took a closer look. Across the released dev sets the effect is there but quite small. Here are the ceilings:

BioASQ: 98.2% EM
DuoRC: 99.2% EM
DROP: 100% EM
RelationExtraction: 100% EM
RACE: 99.9% EM
TextbookQA: 99.4% EM

I manually went through all of the discrepancies and verified that the "detected_answer" should indeed be considered as an exact match for the original answer. To reflect this in the scoring, I will directly add the detected_answer span into the answers list. Scores shouldn't change much though.

Thanks for reporting!

ajfisch added a commit that referenced this issue Jul 12, 2019

update readme & data for #11

a7d6cc4

ZHO9504 mentioned this issue Jul 20, 2019

Error found in validating when use 2 gpu(But it'ok when using one gpu ).. #17

Open

chrisc36 closed this as completed Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors in answers #11

Errors in answers #11

chrisc36 commented Jul 1, 2019 •

edited

Loading

ajfisch commented Jul 1, 2019

ajfisch commented Jul 10, 2019

Errors in answers #11

Errors in answers #11

Comments

chrisc36 commented Jul 1, 2019 • edited Loading

ajfisch commented Jul 1, 2019

ajfisch commented Jul 10, 2019

chrisc36 commented Jul 1, 2019 •

edited

Loading