Transformer Implement base transformer from scratch for language translation for German - English TODO: Word-piece Embedding & shared weight Parallel Processing Implement BERT from Scratch and complare with GTP-2 model Implement Tranformer-XL Reference: https://arxiv.org/abs/1706.03762 http://nlp.seas.harvard.edu/2018/04/03/attention.html#training https://mlexplained.com/