This is an implementation of the model described in: Functionality-Aware Rankers for Code Generation via Contrastive Learning
Multi-sampling is a popular approach utilized in code generation to improve the likelihood of generating correct code.Despite its effectiveness, multi-sampling generates a vast number of candidate solutions, which can be challenging for users to evaluate.Therefore, it is imperative to train high-quality rankers to enable users to rapidly identify the best solution.Unfortunately, candidates generated by generation models tend to be highly homogeneous, which requires rankers must understand code functionality rather than simply rely on appearances.This paper proposes FareRanker, a novel contrastive learning-based Functionality-aware Ranker for code generation.
We introduce three components to assist the ranker in comprehending program functionality thoroughly.
(1) A novel hard negative sample construction strategy, FareSample, to inject typical generator errors into correct code.
(2) A novel contrastive object, FareObject, to align correct codes of consistent functionality when disturbed by FareSample while simultaneously ensure correct codes receive the highest ranking scores.
(3) A novel functionality-aware neural code ranking framework, FareRanker, to fully use the learned functional (in)-consistency introduced by FareObject, enabling the ranker to capture subtle errors and provide a more robust score in inference.
- python3
- tree-sitter
- torch
- transformers==4.8.1
You should first download the APPS dataset.
We use the checkpoint for CodeRL model. And for GPT-Neo-125M, please first finetune the model for 2 epoch by running:
cd finetune
bash train.sh
You can sample code candidates for APPS training set and test set by running (please change the arguments for dataset path bu yourself):
cd ../sample
bash generate_coderl_apps.sh
To execute the unit tests and obtain test outcomes, we adapt the official implementation of the APPS benchmark. You can run the following commands by configuring the parameters as you need:
cd ../run_test
bash test_one_solution.sh
Then, you can build the positive and negative samples and get the final training and test datasets by runing:
cd ../data_preprocess
bash data_preprocess.sh
cd ../generate_final_dataset
python dataset.py
Finally, you can train FareRanker by running:
cd ../train_ranker
bash run_train.sh