This project demonstrates the use of GPT-Neo models for code generation. GPT-Neo is a variant of the GPT (Generative Pre-trained Transformer) model, which is large scale autoregressive language model. In this project, we use GPT-Neo to generate code in Python programming language.
- Python 3.8 or higher
- Pytorch 1.13 or higher
- Hugging Face's transformers library
- GPU
Clone the repository
git clone https://github.com/0xsuid/code-generation-gpt-models.git
Install Nvidia Driver, CUDA Toolkit & Python Dependencies For More info check INSTALLATION.md
chmod +x install.sh
./install.sh
Fine-Tune the model on apps dataset
Note: argument "-u" is required to disable python output buffering
nohup python3 -u tune_gpt.py --limit 10 --local-rank 0 --model "EleutherAI/gpt-neo-125M" --tokenizer "EleutherAI/gpt-neo-125M" > output.log 2>&1 &
nohup deepspeed tune_gpt.py --deepspeed deepspeed.json --model "EleutherAI/gpt-neo-125M" --tokenizer "EleutherAI/gpt-neo-125M" > output.log 2>&1 &
- Limit
- "--limit" - "Limit Total no. of problems"
- Model
- "--model" - "ID of the Model from huggingface - i.e. 'EleutherAI/gpt-neo-125M' "
- Tokenizer
- "--tokenizer" - "ID of the Tokenizer from huggingface - i.e. 'EleutherAI/gpt-neo-125M' "
- Upload Model
- "--upload-model" - "Upload fine-tuned model to Huggingface"
- Stop
- "--stop-instance" - "Stop tensordock instance after training"
- Local Rank
- "--local-rank" - "Local rank for deepspeed, it should be 0 when not using deepspeed to save model"
- Upload Experiement
- "--upload-experiment"" - "Upload Experiment directory to huggingface repo"
- Verbosity
- "--verbosity"
tensorboard --logdir experiments/2022-10-15-9e416bbdeafeaea88e8747a0edd284f93d7551ea3cc387377269ceed52957730/logs
we pass the input data as the label instead of just the answer labels. This is because we are training a language model, hence we want the model to learn the pattern of the prompt and not just answer class. In a sense, the model learns to predict the words of the input question + answer structured in the prompt, and in the process learn the code generation task.
- https://huggingface.co/docs/transformers/v4.23.1/en/glossary#labels
- https://huggingface.co/transformers/v4.8.2/model_doc/gpt_neo.html#transformers.GPTNeoForCausalLM.forward
When Using Multi-GPU Environment and first gpu run out of memory but we have more memory available on other gpus then setting "max_split_size_mb" might be useful
- The generated code may not always be syntactically correct or runnable.
- The model is only as good as the dataset it is trained on, so the quality of the generated code will depend on the diversity and quality of the training data. GPT-Neo models are large, so they require a powerful GPU and a lot of memory to train.
Code generation with GPT-Neo models is a promising approach for automating repetitive coding tasks. With the right dataset and fine-tuning, it can be used to generate high-quality code in a variety of programming languages. However, it still has some limitations, and it is not a substitute for human programmers.