Skip to content

Latest commit

 

History

History
46 lines (38 loc) · 1.82 KB

README.md

File metadata and controls

46 lines (38 loc) · 1.82 KB

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

This is the official code for the paper Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation (accepted to EMNLP 2023).

Authors: Hailin Chen*, Amrita Saha*, Steven C.H. Hoi and Shafiq Joty

install

To install all dependencies and download the necessary model checkpoints:

conda env create -f environment.yml
source activate PersD

Credentials

Put the openai_api_key.txt & openai_organization_id.txt files inside a directory named openai_creds in home folder

Experiments

./scripts/chatgpt_ilf_pipeline_auto_feedback_alpaca.sh -n CCG_ALP -e {process_name}

The process_name includes

  1. "gen_student_attempt"
  2. "eval_student_attempt"
  3. "get_personalized_refinement"
  4. "process_finetune_data"
  • "finetune_StanD"
  • "finetune_PersD"
  • "finetune_PersD_combine"
  • "finetune_PersD_refine"
  • "evaluate_StanD"
  • "evaluate_PersD"
  • "evaluate_PersD_combine"
  • "evaluate_PersD_refine"

In models/CCG_ALP, the finetuned data are already provided, to run PersD-combine finetuning:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7; sh ./scripts/chatgpt_ilf_pipeline_auto_feedback_alpaca.sh -n CCG_ALP -e finetune_PersD_combine

The model will be saved in models/CCG_ALP/gold_chatgpt_only_finetune_lr5e-6_ga20_20epochs

To evaluate it, run:

sh ./scripts/chatgpt_ilf_pipeline_auto_feedback_alpaca.sh -n CCG_ALP -e evaluate_PersD_combine