This project is a collaboration between Aggregate Intellect, McGill University, and Ryerson University on explainable information retrieval. Information retrieval and search systems normally use various techniques to generate candidates and then to rank them. Users’ trust of the shortlisting and then ranking process has a significant impact on their willingness to use the system.
This project aims to explore various post-hoc and embedded methods that can be used to introduce explainability to systems like this. The group will then move on to implement a few potential solutions, and package those as open source libraries. The goal is to create open source libraries and publish papers on the topics of post-hoc or embedded explainability in Info Retrieval, Search, or related tasks.
https://ai.science/l/236a6202-3495-4a8e-bbad-aedeee4bd21d@/assets
- 3000+ search queries start with a WH-word
- all candidate sentences are from wikipedia summary paragraph
- sentence that answer the question is labeled manually more information is at https://aclanthology.org/attachments/D15-1237.Presentation.pptx
- Due to the copyright, please email yujing.yang2@mail.mcgill.ca to get the dataset.
- API related to robust04 dataset (developed by ir_dataset) robust04_ir_dataset.ipynb
Building IR using robust04 dataset and vector search robust04_Reranking_Document.ipynb
Building IR using BEIR (scifact) dataset and BEIR vector search BEIR_dataset.ipynb
Building QA using BEIR (scifact) dataset and Haystack DensePassageRetriever Haystack_scifact_DensePassageRetriever.ipynb
Building QA using wikipedia snippets dataset, with EmbeddingRetriever, Seq2SeqGenerator, and FARMReader from Haystack haystack_wiki.ipynb
Evaluation of information retrieval using haystack haystack_evaluation.ipynb
Passage-level retriever and sentence-level retriever on wikiQA dataset (passage-level accuracy: 0.98, sentence-level accuracy: 0.61) wikiQA_sentence_level_retriever.ipynb