This tutorial delves into the paper "Attention is All You Need" by Vaswani et al., which can be found here. Introduced in 2017, this paper revolutionized natural language processing by presenting the transformer architecture. Our focus will primarily be on the decoder component of the transformer, providing a detailed look at its internal mechanics. The content of this tutorial is inspired by Andrej Karpathy's practical implementation, which is thoroughly explained in this Colab Notebook. We do a step-by-step walkthrough of a single forward pass in this notebook. Note that this tutorial does not cover training on the Beatles text dataset; it is focused solely on explaining the functionality of the model.
To set up your environment for this tutorial, install torch using pip, or install the requirements.txt:
pip install torch