The artifact repository for
Kangzheng Gu, Yuan Zhang, Jiajun Cao, Xin Tan, and Min Yang. 2024. How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux Kernel. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE Companion ’24)
-
python environment
conda env create -f syzbot_bisect_env.yaml -n syzbot_bisect
-
srcml
- download from https://www.srcml.org/#download
- config .bashrc
export PATH=/PATH/TO/srcml/bin:$PATH export LD_LIBRARY_PATH=/PATH/TO/srcml/lib:$LD_LIBRARY_PATH
-
create a MYSQL database using syzbot_data/syzbot_bug_info.sql
-
assign the global variables in the analyses/config.py according to your database configuration.
DATA_DIR = '../syzbot_data' DB_IP = '1.1.1.1' DB_PORT = 6603 DB_USER = 'a' DB_PWD = 'a' DB_NAME = 'a'
The data we use to perform the study.
Before continuing, you should check the paths and database address in config.py
-
gt_filter.py
Filter out the unreasonable fixes tags.
See details in the code comments.
-
ground_truth.py
Build the ground truth.
Note: You can directly load the syzbot_data/syzbot_bug_info.sql into your database, instead of building from scratch.
-
data.py
Calculate the following statistics:
-
overall performance;
-
impact on bug-fixing time;
-
distribution of result commits
-
-
efficiency_analysis.py
Calculate the following statistics:
-
Avg building time and testing time for each commit;
-
analysis of the tested commits who cost more time than avg time, i.e. expected commits to test VS. actual.
-
-
failure_cause_analysis.py
Analyze the failure causes (C1/C2/T1/T2/T3)
-
dreamutil.py
Extract the files and functions modified by a commit.
Only support C/C++.
Parsing grammar is based on srcml.
-
file_maintainer.py
Obtain the maintainer information of guilty file, and output to maintainers_crash_report.json
-
relation_between_two_commits.py
Examine the relationship between the result commit and patch commit, including the developer assignment and code location (line, func, file), respectively for correct/incorrect bisection result.
-
time_limit.py
Calculate the avg number of versions tested if a timeout occurs.
-
dataset_dist.py
Count the version distribution of ground-truth commit and crash commit.