Inspired by and Learned from Andrej Karpathy.
The gpt 2 this project reproduce is based on the 124M model size, with 12 layers of Transformers and 768 d_model size.
The model is written using PyTorch, and Huggingface Transformers.
Inspired by and Learned from Andrej Karpathy.
The gpt 2 this project reproduce is based on the 124M model size, with 12 layers of Transformers and 768 d_model size.
The model is written using PyTorch, and Huggingface Transformers.