Question Answering Interface created with Flask and HuggingFace's transformers
To construct the dataset
cd data
wget -O data/train-v2.0.json
wget -O data/dev-v2.0.json
Warning: The file is very large(1.5gb) and loading the w2vec model consumes a lot of memory
Outside the qanda directory download and extract the W2Vec model using
cd qanda/../
wget -c ""
Go back into the qanda repo and start Jupyter Notebook
cd qanda/data/
Referring to DataExploration.pdf, we can see that a similarity score approach is not very useful as only a ~57% accuracy was achieved in identifying the correct line. Hence we move towards Deep Learning approaches
On visiting the website we noticed that a majority of the models in the top rankings made use of transformer based architecture. We also briefly looked into BiDAF models (the state of the art circa 2018) and decided that transformer based models were the better choice.
Comparing the Python libraries "transformers" (by HuggingFace) and "allennlp" (by AllenAI) we found the pretrained models from transformers easier to use and integrate into the application.
Of the multitude of pretrained models available in the transformers library we choose to use the "distilbert-base-uncased-distilled-squad" as it was one of the smaller models in the top 5 most downloaded models (keeping the constraint of speed in mind)
To start the server
cd quanda
flask run