Skip to content

MrBlack0220/FareRanker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FareRanker

This is an implementation of the model described in: Functionality-Aware Rankers for Code Generation via Contrastive Learning

Multi-sampling is a popular approach utilized in code generation to improve the likelihood of generating correct code.Despite its effectiveness, multi-sampling generates a vast number of candidate solutions, which can be challenging for users to evaluate.Therefore, it is imperative to train high-quality rankers to enable users to rapidly identify the best solution.Unfortunately, candidates generated by generation models tend to be highly homogeneous, which requires rankers must understand code functionality rather than simply rely on appearances.This paper proposes FareRanker, a novel contrastive learning-based Functionality-aware Ranker for code generation.

We introduce three components to assist the ranker in comprehending program functionality thoroughly.

(1) A novel hard negative sample construction strategy, FareSample, to inject typical generator errors into correct code.

(2) A novel contrastive object, FareObject, to align correct codes of consistent functionality when disturbed by FareSample while simultaneously ensure correct codes receive the highest ranking scores.

(3) A novel functionality-aware neural code ranking framework, FareRanker, to fully use the learned functional (in)-consistency introduced by FareObject, enabling the ranker to capture subtle errors and provide a more robust score in inference.

Requirements

  • python3
  • tree-sitter
  • torch
  • transformers==4.8.1

Dataset

You should first download the APPS dataset.

Obtain the Code Generation Model

We use the checkpoint for CodeRL model. And for GPT-Neo-125M, please first finetune the model for 2 epoch by running:

cd finetune
bash train.sh

Sample Code Candidates

You can sample code candidates for APPS training set and test set by running (please change the arguments for dataset path bu yourself):

cd ../sample
bash generate_coderl_apps.sh

Run the Unit Tests

To execute the unit tests and obtain test outcomes, we adapt the official implementation of the APPS benchmark. You can run the following commands by configuring the parameters as you need:

cd ../run_test
bash test_one_solution.sh

Obtain the Final Datasets

Then, you can build the positive and negative samples and get the final training and test datasets by runing:

cd ../data_preprocess
bash data_preprocess.sh
cd ../generate_final_dataset
python dataset.py

Train FareRanker

Finally, you can train FareRanker by running:

cd ../train_ranker
bash run_train.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published