GPT-3 Model Implementation in PyTorch

This repository contains an implementation of the GPT-3 language model in PyTorch. The model architecture is based on the original GPT-3 paper [1] and consists of a stack of Transformer encoder layers, followed by a linear output layer. The model is trained using a standard autoregressive language modeling objective, where the goal is to predict the next token in a sequence given the previous tokens.

Requirements

Python 3.x
PyTorch
Transformers
tqdm

Model Architecture

The GPT-3 model architecture is based on a stack of Transformer encoder layers, where each layer consists of a multi-head self-attention mechanism and a feedforward neural network. The output of each layer is passed through a residual connection and a layer normalization step, before being passed to the next layer. The final layer is a linear output layer that maps the output of the last Transformer layer to a probability distribution over the vocabulary.

The model architecture is implemented in the GPT3 class in model.py.

Training

To train the model, you can use the train.py script. The script takes as input a text file containing the training data and a directory to save the trained model checkpoints. The training data should be preprocessed so that each line contains a single sequence of tokens, with tokens separated by whitespace.

The script reads in the training data, tokenizes it using the GPT-2 tokenizer, and converts it to PyTorch Dataset and DataLoader objects. The model is then initialized, and training is performed using the standard autoregressive language modeling objective. The script saves checkpoints of the model at specified intervals, and prints out the training loss at the end of each epoch.

You can modify the hyperparameters of the training process by changing the values of the corresponding arguments in the train.py script. You can also modify the model architecture and hyperparameters by editing the GPT3 class in model.py.

Inference

To generate text using the trained model, you can use the generate.py script. The script takes as input a checkpoint file containing the trained model weights, a prompt text string to condition the generation, and a length parameter specifying the number of tokens to generate. The script reads in the model weights from the checkpoint file, initializes the model, and generates text by sampling from the model output at each step.

You can modify the sampling temperature and other generation parameters by changing the values of the corresponding arguments in the generate.py script.

References

[1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
generate.py		generate.py
model.py		model.py
robust.py		robust.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-3 Model Implementation in PyTorch

Requirements

Model Architecture

Training

Inference

References

About

Uh oh!

Releases

Packages

Languages

antonvice/gpt-3_paper_implementation

Folders and files

Latest commit

History

Repository files navigation

GPT-3 Model Implementation in PyTorch

Requirements

Model Architecture

Training

Inference

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages