How Well Industry-level Cause Bisection Works in Real-world - A Study on Linux Kernel.

The artifact repository for

Kangzheng Gu, Yuan Zhang, Jiajun Cao, Xin Tan, and Min Yang. 2024. How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux Kernel. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE Companion ’24)

Prerequisites

python environment

conda env create -f syzbot_bisect_env.yaml -n syzbot_bisect

srcml

download from https://www.srcml.org/#download
config .bashrc

    export PATH=/PATH/TO/srcml/bin:$PATH
    export LD_LIBRARY_PATH=/PATH/TO/srcml/lib:$LD_LIBRARY_PATH

create a MYSQL database using syzbot_data/syzbot_bug_info.sql

assign the global variables in the analyses/config.py according to your database configuration.

 DATA_DIR = '../syzbot_data'
 DB_IP = '1.1.1.1'
 DB_PORT = 6603
 DB_USER = 'a'
 DB_PWD = 'a'
 DB_NAME = 'a'

Dataset (./syzbot_data)

The data we use to perform the study.

Statistics (./analyses)

Before continuing, you should check the paths and database address in config.py

gt_filter.py

Filter out the unreasonable fixes tags.

See details in the code comments.
ground_truth.py

Build the ground truth.

Note: You can directly load the syzbot_data/syzbot_bug_info.sql into your database, instead of building from scratch.
data.py

Calculate the following statistics:
1. overall performance;
2. impact on bug-fixing time;
3. distribution of result commits
efficiency_analysis.py

Calculate the following statistics:
1. Avg building time and testing time for each commit;
2. analysis of the tested commits who cost more time than avg time, i.e. expected commits to test VS. actual.
failure_cause_analysis.py

Analyze the failure causes (C1/C2/T1/T2/T3)
dreamutil.py

Extract the files and functions modified by a commit.

Only support C/C++.

Parsing grammar is based on srcml.
file_maintainer.py

Obtain the maintainer information of guilty file, and output to maintainers_crash_report.json
relation_between_two_commits.py

Examine the relationship between the result commit and patch commit, including the developer assignment and code location (line, func, file), respectively for correct/incorrect bisection result.
time_limit.py

Calculate the avg number of versions tested if a timeout occurs.
dataset_dist.py

Count the version distribution of ground-truth commit and crash commit.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
analyses		analyses
syzbot_data		syzbot_data
README.md		README.md
syzbot_bisect_env.yaml		syzbot_bisect_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Well Industry-level Cause Bisection Works in Real-world - A Study on Linux Kernel.

Prerequisites

Dataset (./syzbot_data)

Statistics (./analyses)

About

Releases

Packages

Languages

seclab-fudan/SyzbotCauseBisectionStudy

Folders and files

Latest commit

History

Repository files navigation

How Well Industry-level Cause Bisection Works in Real-world - A Study on Linux Kernel.

Prerequisites

Dataset (./syzbot_data)

Statistics (./analyses)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages