This is a repo of my experiments and notes while learning about LLMs. I'm starting with a decent theoretical understanding of neural networks, and hands on experience training large models on distributed systems. I'm very comfortable with data and ML engineering.
I've completed:
- Andrej Karpathy's Neural Networks: Zero to Hero guide.
- Thoroughly read The Annotated Transformers paper and run the code side-by-side.
I've read:
- Efficiency:
- Synthetic Data
- Modelling
Here are all the things I'd like to do:
- Implement FlashAttention myself (in Cuda maybe?)
- Implement FSDP myself (no idea how!?)
- Model efficiency experiments. Try out the following and benchmark performance changes:
- Speculative decoding
- Knowledge distillation
- Quantization
- Pruning
- Sparsity low ran compression
- etc
- Play around with LLAMA models locally
- Depthwise Seperable Convolutions for NMT
- One Model To Learn Them All
- Self-Attention with Relative Position Representations
- GANs
- Stable Diffusion
- KANs
- Explore LLM evaluation
- Explore LLM interpretability