loghound is a method-level fault localization approach that identifies fault locations at the method level through comprehensive analysis of VSM (Vector Space Model) calculations, path construction, test coverage, and stack trace information.
This method enables static code analysis without the need for dynamic execution or source code compilation to build call graphs. It incorporates various parameters and weights, achieving excellent fault localization performance.
- Python 3.x
- Required packages: Install using
pip install -r requirements.txt
project_root/
├── analyzer/
│ ├── analyzer.py
│ ├── extract_bug_reports.py
│ ├── tonic.py
│ ├── type_resolver.py
│ └── preprocess_bug_report.py
├── bug_reports/
├── classes/
├── conf/
│ └── conf.yml
├── dataset/
│ └── docs_output_file/
| └── dbugset_crawl.py
| └── get-target-system.sh
├── logRestore/
│ ├── src/
│ │ └── main/
│ │ └── java/
│ │ └── org/
│ │ └── example/
│ │ ├── LogRestore.java
│ │ └── Main.java
├── process/
│ ├── cal_final_score.py
│ ├── evaluation.py
│ ├── generate_call_graph.py
│ ├── log_extract.py
│ ├── param_lib.py
│ ├── parse_report.py
│ ├── preprocess_bug_report.py
│ ├── process_path.py
│ ├── process_source_code.py
│ ├── process_stack_traces_and_logs.py
│ ├── process_tock.py
│ └── vsm_construction.py
├── ql/
│ ├── sanity/
│ │ └── sanity-pack/
│ └── sanity-coverage/
│ ├── codeql-pack.lock.yml
│ ├── qlpack.yml
│ ├── static_coverage_summary_cas.ql
│ ├── static_coverage_summary_ha020.ql
│ ├── static_coverage_summary_ha021.ql
│ ├── static_coverage_summary_ha023.ql
│ ├── static_coverage_summary_hb90.ql
│ ├── static_coverage_summary_hb95.ql
│ └── static_coverage_summary_zk.ql
├── tgt_sys/
│ ├── build_dy.sh
│ ├── get_target_system.sh
│ └── run_coverage.sh
├── dbugset_resolve.xlsx
├── structuration_info.json
├── app.py
├── cover_cal.py
├── eval.py
├── read_version_json.py
├── requirements.txt
├── smp.py
└── README.md
First, modify the LLM API settings in the configuration file:
- Navigate to
conf/conf.yaml - Update the LLM
api,model, andbase_urlparameters
Go to the target system directory tgt_sys/ and execute the following commands:
cd ./tgt_sys/
# Download and prepare the target system source code
zsh ./get-target-system.sh
# Build CodeQL database
zsh build_db.sh
# Run coverage script (generate static coverage information)
sh run_coverage.sh
cd ..
# Calculate coverage results
python cover_cal.py
Run the following command to perform static analysis of the distributed system source code and build the call graph:
python smp.py -sc source_code_list # Replace with actual source code path
Run the complete fault localization process with:
python app.py -bp bug_reports -t docx -si structuration-info.json -sc source_code_list -l java
source_code_list need to replace with actual source code path
To check the evaluation results, use:
python eval.py -a dbugset_resolve.xlsx -n 5 # -n specifies the top N results to view
| Short | Long | Required | Type | Description |
|---|---|---|---|---|
-a |
--answer |
✅ | str |
Path to the ground truth file (evaluation reference) |
-n |
❌ | int |
Number of top results to evaluate (default: not limited) | |
-bp |
--bug-reports |
✅ | str |
Folder containing bug reports |
-t |
--report-type |
✅ | str |
Bug report file type (json, doc, docx, txt) |
-si |
--structuration-info |
✅ | str |
JSON file with structured bug report info in format: [{file: xx, title: xx, version: xx, description: xx, logs: [], stack_traces: []}] |
-sc |
--source-code |
✅ | str |
Path to source code for parsing |
-l |
--language |
✅ | str |
Programming language of the source code |
-v |
--version |
❌ | flag |
Show version information |