Skip to content

KounianhuaDu/CodeGRAG

Repository files navigation

CodeGRAG

This is the repo for CodeGRAG.

Requirements

Data Preparation

For C++ codes:

  • AST/CFG/GraphView
sh run_ast_cfg_graph.sh
  • Build DGL Graph
python build_graph.py
  • Code embedding We use code-T5 to encode the codes. Under utils folder:
python code_enc.py --path [codepath]

Model Weights

You should first prepare model weights under model_weights/ folder. Following weights are included:

  • codet5p-110m-embedding
  • unixcoder-base-nine

Run

Under the test folder:

Meta Graph Prompt

For one round generation, please run:

python run_raw_multilingual.py --lang [programming_language] --output [your_output_path]
python run_with_code.py --lang [programming_language] --output [your_output_path] --ret_method [retrieval_model] --datapath [retrieval_pool]
python run_with_graph.py --lang [programming_language] --output [your_output_path] --ret_method [retrieval_model] --datapath [retrieval_pool]

Two result files are included:

  • cpp_results/samples_with_graph.jsonl
  • python_results/samples_with_graph.jsonl

Soft Prompting

See under the soft_prompt folder.

Citation

If you find this repo useful, please cite our paper:

@article{du2024codegrag, title={CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation}, author={Du, Kounianhua and Rui, Renting and Chai, Huacan and Fu, Lingyue and Xia, Wei and Wang, Yasheng and Tang, Ruiming and Yu, Yong and Zhang, Weinan}, journal={arXiv preprint arXiv:2405.02355}, year={2024} }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published