machine-learning-book/ch16 at main · rasbt/machine-learning-book

Name		Name	Last commit message	Last commit date
parent directory ..
bonus-distilbert-lightning-trainer		bonus-distilbert-lightning-trainer
figures		figures
README.md		README.md
ch16-part1-self-attention.ipynb		ch16-part1-self-attention.ipynb
ch16-part1-self-attention.py		ch16-part1-self-attention.py
ch16-part2-gpt2.ipynb		ch16-part2-gpt2.ipynb
ch16-part2-gpt2.py		ch16-part2-gpt2.py
ch16-part3-bert.ipynb		ch16-part3-bert.ipynb
ch16-part3-bert.py		ch16-part3-bert.py

README.md

Adding an attention mechanism to RNNs
- Attention helps RNNs with accessing information
- The original attention mechanism for RNNs
- Processing the inputs using a bidirectional RNN
- Generating outputs from context vectors
- Computing the attention weights
Introducing the self-attention mechanism
- Starting with a basic form of self-attention
- Parameterizing the self-attention mechanism: scaled dot-product attention
Attention is all we need: introducing the original transformer architecture
- Encoding context embeddings via multi-head attention
- Learning a language model: decoder and masked multi-head attention
- Implementation details: positional encodings and layer normalization
Building large-scale language models by leveraging unlabeled data
- Pre-training and fine-tuning transformer models
- Leveraging unlabeled data with GPT
- Using GPT-2 to generate new text
- Bidirectional pre-training with BERT
- The best of both worlds: BART
Fine-tuning a BERT model in PyTorch
- Loading the IMDb movie review dataset
- Tokenizing the dataset
- Loading and fine-tuning a pre-trained BERT model
- Fine-tuning a transformer more conveniently using the Trainer API
Summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.