- Python 3.9.17
- PyTorch 2.0.1
- Huggingface transformers 4.35.2
- wandb
- pef 0.6.2
-
accelerate 0.24.1
-
datasets 2.13.0
-
trl
-
fire
- nvitop
- Java 8
- We have released 4 PEFT weights of each base model on HuggingFace, trained on APR-INSTRUCTION dataset.
- PEFT Weights here
The file structure of the artifact is as follow:
- contains source code of constructing
APR-INSTRUCTION
,base existing APR dataset[1]
-
output: peft weights by different peft method(lora, p-tuning,prefix tuning ,
$(IA)^3$ and Full-model Fine-tuning -
results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by
codellama-7b-hf
andcodellama-7b-hf
with peft weights, validation results of generated pacthes
-
output: peft weights by different peft method(lora, p-tuning,prefix tuning ,
$(IA)^3$ -
results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by
codellama-13b-hf
andcodellama-13b-hf
with peft weights, validation results of generated pacthes
-
output: peft weights by different peft method(lora, p-tuning,prefix tuning ,
$(IA)^3$ -
results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by
Deepseek-Coder Base 6.7B
andDeepseek-Coder Base 6.7B
with peft weights, validation results of generated pacthes
-
output: peft weights by different peft method(lora, p-tuning,prefix tuning ,
$(IA)^3$ -
results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by
Llama-2-7b-hf
andLlama-2-7b-hf
with peft weights, validation results of generated pacthes
- Instruction Dataset used this paper
- apr_instruction_30k.json: the APR instruction dataset constructed this paper
- oss_instrcution_30k.json: 30k random selection of OSS-Instruction Dataset
- code_alpaca_20k.json: Code Alpaca Instruction Dataset
- The rest of data is used for RQ3 to explore the impact of training data size for performance, which is parted as 10k, 15k, 20k and 25k
- This directory consists of source code used for patches generation and validation of LLMs
file name description defects4j_patch_validate.py patches generation and validation on Defects4j benchmark humaneval_patch_validate.py patches generation and validation on Humaneval-Java benchmark quixbugs_patch_validate.py patches generation and validation on Quixbugs benchmark peft_patch_validation.py Entry of model validation with PEFT methods, and then select different scripts for verification fmft_generate_patch.py Entry of model validation with Full-model fine-tuning, and then select different scripts for verification generate_patch_infill.py Entry of CodeLlama 7b validation with no fine-tuning and infill templates, and then select different scripts for verification prompter.py convert instances of benchmark to instruction result_look.py record $pass@k$ of each validation
-
This directory consists of bash scripts used for patches generation and validation of LLMs
-
each script is formed as
model name
_Fine-tuning method
_instruction dataset of Fine-tuning
_validation.sh
- This directory consists of bash scripts used for LLMs training
- each script is formed as
model name
_instrcution_Fine-tuning method
_hyper-parameters(Optional)
_train_instruction dataset of Fine-tuning
_validation.sh
-
This directory consists of source code used for LLM trainnin
file name description sfttrain_peft.py Training code for PEFT methods sfttrain_ft.py Training code for Full-model Fine-tuning prompter.py Add additional prompt for instruction
- This directory consists of results of patches generation and validation in experiments of RQ3
- Due to the size of
Fine-tuning weights
is too large, so we do not upload them on Github now - Considering the anonymous review, we will release weights after review
[1] Zhu, Qihao, et al. "A syntax-guided edit decoder for neural program repair." Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 2021.