Work plan:

An extensible, clean implementation of DocumentQA, and a basis for developing RCQA models

Prepare harness for tokenization, batch building and evaluation
Make a basic LSTM->Dense->Spans&no-answer outputting model to get the whole training/testing process running
Think about data cleanup, tokenization and all the other shenanigans of working with SQuAD
- Lowercasing
- Dealing with abbreviations
- Dealing with numbers, dates etc
Add encoding of character-level info as well as word-level info
Add unit testing for core components
Make GPU compatible
Add option to read in a single answer span per question for training
Make a distinction between train and non-train datasets for proper handling of char/word -> idx mappings
Write dev validation during training
Implement BiDAF on top
Implement self attention as described in DocQA
Implement memory and runtime profiling
Add max context size
Do proper dropout
Test implementation with self attention
Do better structured config objects to pass around instead of bajillion parameters as it is used now
Implement char CNN for char embeddings
Reproduce DocQA Performance
Add the option to output no-answer probabilities with the output
Add encoding of sentence-level info
Integrate ELMo vectors

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
model		model
scripts		scripts
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
mypy.ini		mypy.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback