This is the repo for CodeGRAG.
For C++ codes:
- AST/CFG/GraphView
sh run_ast_cfg_graph.sh
- Build DGL Graph
python build_graph.py
- Code embedding
We use code-T5 to encode the codes. Under
utils
folder:
python code_enc.py --path [codepath]
You should first prepare model weights under model_weights/ folder. Following weights are included:
- codet5p-110m-embedding
- unixcoder-base-nine
Under the test
folder:
For one round generation, please run:
python run_raw_multilingual.py --lang [programming_language] --output [your_output_path]
python run_with_code.py --lang [programming_language] --output [your_output_path] --ret_method [retrieval_model] --datapath [retrieval_pool]
python run_with_graph.py --lang [programming_language] --output [your_output_path] --ret_method [retrieval_model] --datapath [retrieval_pool]
Two result files are included:
- cpp_results/samples_with_graph.jsonl
- python_results/samples_with_graph.jsonl
See under the soft_prompt
folder.
If you find this repo useful, please cite our paper:
@article{du2024codegrag, title={CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation}, author={Du, Kounianhua and Rui, Renting and Chai, Huacan and Fu, Lingyue and Xia, Wei and Wang, Yasheng and Tang, Ruiming and Yu, Yong and Zhang, Weinan}, journal={arXiv preprint arXiv:2405.02355}, year={2024} }