This is a tool designed to evaluate Dify's Code Generator functionality.
Use this tool to evaluate and improve Code Generator prompts by:
- Testing different prompt variations
- Analyzing code generation accuracy
- Identifying areas for prompt optimization
Compare code generation accuracy across different LLM models to:
- Measure code quality metrics
- Assess performance across different programming languages(Python and JavaScript)
git clone git@github.com:Kota-Yamaguchi/dify-codegenerator-evaluator.git
cd dify-codegenerator-evaluator
cp .env.example .env
Note: For instructions on setting up the Dify Backend API locally, please refer to the Dify Self-hosted Installation Guide
- Ensure Go 1.20 or higher is installed
- Verify
GOPATH
is properly configured - Grant execution permissions to the build script:
chmod +x build.sh
./build.sh
Execute the evaluator based on your platform:
# For Linux
./bin/evaluate-code-linux
# For macOS (Intel)
./bin/evaluate-code-mac
# For macOS (Apple Silicon)
./bin/evaluate-code-mac-arm64
# For Windows
./bin/evaluate-code.exe
To add new test cases, please add them to testdata/testcases.json
.
- Generate comparative accuracy graphs for different LLM models
- Output accuracy metrics based on code complexity levels