BertQA - Attention on Steroids

Developers - Ankit Chadha (ankitrc@stanford.edu) Rewa Sood (rrsood@stanford.edu)

This repository is based off Hugging face's PyTorch BERT implementation

This was done as part of CS224n: Natural Language Processing with Deep Learning - Stanford / Winter 2019 class project. At the time of submission, we were #1 on the class's SQuAD Leaderboard.

Abstract

In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query and query to context attention via a set of modified Transformer encoder units. In addition, we explore adding convolution based feature extraction within the coattention architecture to add localized information to self-attention. The base BERT architecture with no SQUAD2.0 specific finetuning produces results with an F1 of 74. We found that coattention significantly improves the no answer F1 by 4 points while causing a loss in the has answer F1 score by the same amount. After adding skip connections the no answer F1 improved further without causing an additional loss in has answer F1. The addition of localized feature extraction added to attention produced the best results with an overall dev F1 of 77.03 due to a marked improvement in the has answer F1 score. We applied our findings to the large BERT model which contains twice as many layers and further used our own augmented version of the SQUAD 2.0 dataset created by back translation. Finaly, we performed hyperparameter tuning and ensembled our best models for a final F1/EM of 82.148/79.239 (Attention on Steroids, PCE Test Leaderboard).

Neural Architecture

Here is an overview of our network architecture

Dataset (SQuAD 2.Q)

We use an augmented version of the SQuAD 2.0 dataset based on the concept of Back Translation. You can download the dataset here.

To read more on the process of Back Translation you can refer this resource

Command Lines

This repository has command line bash files with the optimal hyperparameters our network was tuned for.

1. Sanity Check 
#Launch a debug run on 1 example out of the SQuAD 2.0 training set - Beyonce paragraph 
examples/rundbg.sh

2. Train on SQuAD 2.Q
#Fine tunes BERT layers on SQuAD 2.Q and trains additional directed co-attention layers.
run_bertqa_expt.sh

3. Train on SQuAD 2.0
#Fine tunes BERT embedding layers on SQuAD 2.0 and trains additional directed co-attention layers.
examples/run_bertqa.sh

BibTeX

@misc{Stanford-CS224n,
  author = {Chadha,Ankit;Sood,Rewa},
  title = {BertQA - Attention on Steroids},
  year = {2019},
  publisher = {Stanford-CS224n},
  howpublished = {\url{https://github.com/ankit-ai/BertQA-Attention-on-Steroids}}
}

Refer to the paper for more details on our hyperparameters chosen.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
debug_squad		debug_squad
docker		docker
examples		examples
hooks		hooks
img		img
info		info
notebooks		notebooks
samples		samples
tests		tests
CS224N_Final_Project_Report.pdf		CS224N_Final_Project_Report.pdf
HEAD		HEAD
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
bertqa.yml		bertqa.yml
config		config
description		description
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BertQA - Attention on Steroids

Developers - Ankit Chadha (ankitrc@stanford.edu) Rewa Sood (rrsood@stanford.edu)

Abstract

Neural Architecture

Dataset (SQuAD 2.Q)

Command Lines

BibTeX

About

Releases

Packages

Languages

License

ankit-ai/BertQA-Attention-on-Steroids

Folders and files

Latest commit

History

Repository files navigation

BertQA - Attention on Steroids

Developers - Ankit Chadha (ankitrc@stanford.edu) Rewa Sood (rrsood@stanford.edu)

Abstract

Neural Architecture

Dataset (SQuAD 2.Q)

Command Lines

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages