Triton-Accelerated NanoGPT

The WHY behind this ordeal

After practicing triton for about 2 weeks now, I challenged myself into implementing custom triton kernels for Karpathy's nanoGPT and quite an ordeal it was but somehow got something working, not perfect but getting there:), contributions are welcomed.

Kernels

Supports lightweight custom triton kernels for softmax, layer normalization, cross entropy loss and GELU activation.

Training

GPU-aware train loop with effective gradient accumulation, learning rate scheduling and gradient clipping with val loss tracking.

Setup: Requires GPU! Ensure you have PyTorch and Triton installed. GPU Poor? I am too, I used one free T4 on google colab.
Data: Using Tiny Shakespeare dataset by default. It will be downloaded automatically if not present.
Training:
```
python triton_nanoGPT.py
```

This will train for 100 epochs, save checkpoint as nanoGPT_cpkt.pth and sample from it.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Checkpoints		Checkpoints
Data		Data
README.md		README.md
triton_nanoGPT.py		triton_nanoGPT.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton-Accelerated NanoGPT

Kernels

Training

License

About

Releases

Packages

Languages

Jaykef/Triton-nanoGPT

Folders and files

Latest commit

History

Repository files navigation

Triton-Accelerated NanoGPT

Kernels

Training

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages