Based on the implementation of https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_classifier.py, but stripped down to the minimum and working only for the SST-2 dataset. Also refactored to understand different parts more easily.
Many thanks for the good work @huggingface!