This project implements the Transformers architecture from scratch in Python. It includes the following components:
- Self-Attention
- Causal Attention
- Cross Attention
- GPT (Generative Pre-trained Transformer) Model
- BERT (Bidirectional Encoder Representations from Transformers) Model
Self-attention mechanism computes the attention score for each position in the input sequence.
Causal attention is a variant of self-attention where each position can only attend to previous positions.
Cross attention computes the attention score between two sequences, typically used in tasks such as machine translation.
The Generative Pre-trained Transformer (GPT) model is a variant of the Transformer architecture used for various natural language processing tasks.
BERT (Bidirectional Encoder Representations from Transformers) is a Transformer-based machine learning model for natural language processing tasks.
The project is implemented in Python using Pytorch for numerical computations.
attention.py
: Implementation of self-attention mechanism.attention.py
: Implementation of causal attention mechanism.attention.py
: Implementation of cross attention mechanism.GPT.py
: Implementation of the GPT model.bert.py
: Implementation of the BERT model.
- Python 3.10.12
- Pytorch
This project is inspired by the Transformers architecture introduced in the paper "Attention Is All You Need" by Vaswani et al.