This is the official code for the paper Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation (accepted to EMNLP 2023).
Authors: Hailin Chen*, Amrita Saha*, Steven C.H. Hoi and Shafiq Joty
To install all dependencies and download the necessary model checkpoints:
conda env create -f environment.yml
source activate PersD
Put the openai_api_key.txt
& openai_organization_id.txt
files inside a directory named openai_creds
in home folder
./scripts/chatgpt_ilf_pipeline_auto_feedback_alpaca.sh -n CCG_ALP -e {process_name}
The process_name includes
- "gen_student_attempt"
- "eval_student_attempt"
- "get_personalized_refinement"
- "process_finetune_data"
- "finetune_StanD"
- "finetune_PersD"
- "finetune_PersD_combine"
- "finetune_PersD_refine"
- "evaluate_StanD"
- "evaluate_PersD"
- "evaluate_PersD_combine"
- "evaluate_PersD_refine"
In models/CCG_ALP
, the finetuned data are already provided, to run PersD-combine
finetuning:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7; sh ./scripts/chatgpt_ilf_pipeline_auto_feedback_alpaca.sh -n CCG_ALP -e finetune_PersD_combine
The model will be saved in models/CCG_ALP/gold_chatgpt_only_finetune_lr5e-6_ga20_20epochs
To evaluate it, run:
sh ./scripts/chatgpt_ilf_pipeline_auto_feedback_alpaca.sh -n CCG_ALP -e evaluate_PersD_combine