This is a learning repo where I will take notes on and implement LLM's from scratch, mainly for learning purposes. Essentially, it will collate notes, resources, and code implementations for learning about LLM's.
./apps
:- Examples using the package
./notes
:- Unstructed markdown notes on various topics
./min_llm
:- Contains the packaged code
./nbs
:- Contains sandbox notebooks for learning concepts
playground
:- Contains sandbox code for learning concepts
Implement llama 3 in JAX and train a mini version of it. Implement all the fancy techniques discussed in the linked paper. Also make package of useful llm utilities like here.
- The Illustrated Transformer
- The Annotated Transformer
- Note 10: Self-Attention & Transformers from CS 224N
- Extra:
Chronologically ordered
- Generating Sequences With Recurrent Neural Networks
- Neural Machine Translation by Jointly Learning to Align and Translate
- Attention is All You Need
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- GPT-2: Language Models are Unsupervised Multitask Learners
- Language Models are Few-Shot Learners
More:
- Build a Large Language Model (From Scratch) by Sebastian Raschka
- Speech and Language Processing by Dan Jurafsky and James H. Martin
- RLHF book
- GPU MODE YouTube channel
Jax (eventually):
- Jax has better parallelization primitives which are useful for training large models
- Jax is lower-level and more similar to numpy, which forces you to dive deeper into the concepts
- Developing at a lower-level will make it easier to implement custom add-ons like speeding up inference with CUDA kernels or porting the inference module to C/Rust
- In the meantime, we will use some fancy PyTorch distributed stuff
- Architectures:
- MLSys:
- CUDA kernels
- Triton kernels
- Thunderkitten kernels
- Quantization
- pybind to integrate custom-written kernels into a PyTorch framework
- Model and data parallelism across GPU's: tensor parallelism, column parallelism, pipeline parallelism, data parallelism (fully-sharded data parallelism)
- General learning:
- Optimizing for both memory-bound and compute-bound operations
- Understanding GPU memory hierarchy and computation capabilities
- Efficient attention algorithms
Some parts of this repo implement a package that you can download and use. The motivation is inspired by Meta's Lingua.
$ pip install git+https://github.com/rosikand/min-llm.git
Usage:
import min_llm