Skip to content

Latest commit

 

History

History

qa

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Open-Domain Question Answering

Open-domain question answering focuses on using a large-scale corpus D to answer arbitrary questions via search combined with reading comprehension. We use the open-domain setting of the Natural Questions dataset (Kwiatkowski et al., 2019). Following Chen et al. (2017), we first retrieve relevant passages from Wikipedia using a document retriever, and then select an answer span from the considered passages using a document reader. We use a Dense Passage Retriever model (Karpukhin et al., 2020) for the retriever, and a BERT model (Devlin et al., 2019) for the reader. The BERT model yields several score variants—we use multiple in our cascade (relevance CLS logit, start logit, and end_logit). Any span from any retrieved passage that matches any of the annotated answer strings when lower-case and stripped of articles and punctuation is considered to be correct.

For our predictions, we used results from the DPR repository.

For every split, we provide a json file of our predictions, where each entry is structured like the following:

[
    {
        "question": "who is the killer in season 1 of broadchurch",                                                           
        "gold_answers": [
            "Joe"
        ],
        "predictions": [
            {
                "text": "joe miller",
                "start_score": 13.041522979736328,                                                                            
                "end_score": 13.02441692352295,                                                                               
                "score": 26.065939903259277,                                                                                  
                "relevance_score": 10.888565063476562,                                                                        
                "passage_idx": 9,
                "passage_id": "18417956",                                                                                     
                "passage_score": 80.00108                                                                                     
            }, 
            ...

The conformal_open_qa.py script then transforms these input files into the appropriate input formats and runs the conformal experiments.