Skip to content

yuchen814/CodeHalu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification

Contributions welcome Research Paper Huggingface Dataset

Dataset Description

CodeHaluEval is a comprehensive evaluation tool for assessing the performance of Large Language Models (LLMs) in code generation tasks. It includes 8,883 samples from 699 diverse programming tasks, specifically designed to quantify and understand the tendencies of LLMs to produce code hallucinations and other errors during code generation. Utilizing our newly developed CodeHalu dynamic detection algorithm, researchers can identify and categorize various types of code issues, thereby enhancing the model’s application efficacy in real-world programming environments.

For more detailed introduction of the data, please see the 🤗 Huggingface Dataset.

Getting Started

Set Up

If you want to use some model APIs, you need to set variables in models.py

  • erniebot_api_key: Your Paddle API key.
  • gemini_api_key = Your Google API key.
  • openai_api_key = Your OpenAI API key.
  • claude_api_key= Your Claude API key.

Inference

An example for GPT-4 generation:

python generation.py \
    --model gpt4 \
    --data_path <path_to_the_test_set> \
    --save_path "results/gpt4_codehalu_test.jsonl"

Evaluation

To evaluate the results generated by GPT-4, run:

python eval.py \
    --halu_type <The type of hallucination you want to evaluate.> \
    --generation_file <File containing generations to be evaluated.>

Citation

Please consider citing if you find our work useful:

@misc{tian2024codehaluinvestigatingcodehallucinations,
      title={CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification}, 
      author={Yuchen Tian and Weixiang Yan and Qian Yang and Xuandong Zhao and Qian Chen and Wen Wang and Ziyang Luo and Lei Ma and Dawn Song},
      year={2024},
      eprint={2405.00253},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.00253}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages