Skip to content

This repository is for the paper of ICSE 2023: Regression Fuzzing for Deep Learning Systems

License

Notifications You must be signed in to change notification settings

youhanmo/DRFuzz

Repository files navigation

DRFUZZ

Description

DRFuzz is a novel regression fuzzing framework for deep learning systems. It is designed to generate high-fidelity test inputs that trigger diverse regression faults effectively. To improve the fault-triggering capability of test inputs, DRFuzz adopts a Markov Chain Monte Carlo (MCMC) strategy to select mutation rules that are prone to trigger regression faults. Furthermore, to enhance the diversity of generated test inputs, we propose a diversity criterion to guide triggering more faulty behaviors. In addition, DRFuzz incorporates a GAN-based fidelity assurance method to guarantee the fidelity of test inputs. We conducted an empirical study to evaluate the effectiveness of DRFuzz on four regression scenarios (i.e., supplementary training, adversarial training, model fixing, and model pruning). The experimental results demonstrate the effectiveness of DRFuzz.

The Structure

Here, we briefly introduce the usage/function of each directory:

  • coverages: baseline coverages and the input evaluation to calculate the diversity in DRFuzz
  • dcgan: the GAN-based Fidelity Assurance Technique (the DCGAN structure and prediction code)
  • models: the original models and its regression model. (Since the file size of some models are large, here we provide all the models and regression model on MNIST-LeNet5 for the reproduction)
  • params: some params of DRFuzz and each model/datasets
  • src: the main algorithm of DRFuzz & The experimental script
  • kmnc_profile: profile for baseline approach DeepHunter, which saves the boundary value of each neuron.(Please note that this file is only for DeepHunter-KMNC following the implementation of its source code to improve the efficiency of KMNC, and it is not in the scope of DRFuzz.)

Datasets/Models

We use 4 popular DL models based on 4 datasets under five regression scenarios, as the initial seed models in DRFuzz, which have been widely used in many existing studies.

ID model dataset M1_acc M2_acc Scenario
1 LeNet5 MNIST 85.87% 97.83% SUPPLY
2 LeNet5 MNIST 98.07% 97.50% ADV:BIM
3 LeNet5 MNIST 98.07% 98.30% ADV:CW
4 LeNet5 MNIST 98.07% 98.12% FIXING
5 LeNet5 MNIST 98.07% 98.12% PRUNE
6 VGG16 CIFAR10 87.67% 87.88% SUPPLY
7 VGG16 CIFAR10 87.92% 87.51% ADV:BIM
8 VGG16 CIFAR10 87.92% 88.00% ADV:CW
9 VGG16 CIFAR10 87.92% 88.40% FIXING
10 VGG16 CIFAR10 87.92% 76.27% PRUNE
11 AlexNet FM 89.33% 90.34% SUPPLY
12 AlexNet FM 91.70% 90.96% ADV:BIM
13 AlexNet FM 91.70% 91.87% ADV:CW
14 AlexNet FM 91.70% 92.90% FIXING
15 AlexNet FM 91.70% 91.54% PRUNE
16 ResNet18 SVHN 88.85% 91.93% SUPPLY
17 ResNet18 SVHN 92.05% 91.90% ADV:BIM
18 ResNet18 SVHN 92.05% 92.01% ADV:CW
19 ResNet18 SVHN 92.05% 92.10% FIXING
20 ResNet18 SVHN 92.05% 91.00% PRUNE

1: We design 4 regression scenarios: supplementary training (denoted as SUPPLY), adversarial training (denoted as ADV), white-box model fixing (denoted as FIXING), and model pruning(denoted as PRUNE).

2: For SUPPY, we select 80% of the training set to train the original model and use the 20% remaining data for supplementary training.

3: For ADV, we provide adversarial training on 2 kinds of adversarial examples. (C&W and BIM)

4: The overall result is saved here, named overall_result.jpg, due to the limited paper space.

5: Please note that since the code logic of our baseline method (DiffChaser) is so different from our technique and DeepHunter, we directly use the source code released by the authors for the experiment rather than combining them in our framework.

6: If you want to download other models, please check the link below:

The Requirements:

  • python==3.7 (In fact 3.7.5 and 3.7.16 fits our work)

  • keras==2.3.1

  • tensorflow==1.15.0

  • cleverhans==3.0.1

  • h5py==2.10.0

Please note that if you encounter the following error 'TypeError: Descriptors can not be created directly.', you may need to downgrade the protobuf package to 3.20.x or lower. It is so rarely happened. You can use 'pip install protobuf==3.20.1' to avoid this circumstance. Still, if you are confused with the environment setting, we also provided a file named requirement.txt to facilitate the installation process. You can directly use the script below. You can choose to use pip or conda in the script.

pip install -r requirement.txt

Please note that if your are alerted that you are missing corresponding .so file in your environment, it indicates that you should install the corresponding library libglib2.0-dev. You may use the script below.

apt-get install libglib2.0-dev

We strongly suggest you run this project on Linux. We implemented the entire project on Linux version 4.18.0-15-generic. We will provide the configuration on windows in this project description in the future. We also provide you with the docker image of DRFuzz on https://drive.google.com/file/d/1rIpzyY_jWFPp-ZuvKl_lZNH1y4HvRzHz/view?usp=sharing. The code has been downloaded in the directory /home/share/DRFuzz-main, you can use the conda environment drfuzz by the script below to using the artifact.

source activate drfuzz

Reproducibility

Environment

We conducted 20 experiments in DRFuzz. The basic running steps are presented as follows:

Step 0: Please install the above runtime environment.

Step 1: Clone this repository. Download the dataset and models from our Google-Drive. Save the code and unzip datasets and models to /your/local/path/, e.g., /your/local/path/models and /your/local/path/dataset. (/your/local/path/ should be the absolute path on your server, e.g., /home/user_xxx/)

Step 2: Train yourself DCGAN models and save them to /your/local/path/dcgan. (Or you can use the one provided by us for reproductivity.)

Step 3: Edit configuration files /your/local/path/src/experiment_builder.py and /your/local/path/dcgan/DCGAN_utils.py in order to set the dataset, model, and DCGAN model path into DRFuzz

Running DRFuzz

The DRFuzz artifacts are well organized, and researchers can simply run DRFuzz with the following command.

python main.py --params_set mnist LeNet5 change drfuzz --dataset MNIST --model LeNet5 --coverage change --time 1440

main.py contains all the configurations for our experimentation.

--params_set refers to the configuration and parameters in each dataset/model. Please select according to your requirements based on files in params directory. If you want to adopt your own parameters, please go to your/local/path/params to change the setting params.

--dataset refers to the dataset you acquired. There are totally 4 choices ["MNIST", "CIFAR10", "FM", "SVHN"]

--models refers to the model you acquired. We totally designed 20 models according to 'Datasets/Models'. The settings are presented in the main.py. Please select according to Datasets/Models and your experimental settings.

--coverage refers to the coverage used guide the fuzzing process; please set it to 'change' if you want to use DRFuzz; other choices are for compared approaches such as DeepHunter.

--time refers to the time of running our experiments. We set it to 1440 minutes (24 hours) for our experiment. For quick installation, you can set it to 5 minutes as a try.

Reusability

For users who want to evaluate if DRFuzz is effective on their own regression models, we also prepared an approach to reuse DRFuzz on new regression models and new datasets.

Firstly, the users need to update the addresses of their own datasets and regression models under test in function _get_dataset and function _get_models in the src/experiment_builder.py file for DRFuzz to load. Please note that if the dataset requires further preprocessing, the users should also update the corresponding preprocessing method picture_preprocess in the src/utility.py file.

Secondly, the users are required to train a simple Discriminator of GAN (e.g., in dcgan) to guarantee the fidelity of generated test inputs. From that, DRFuzz can be adapted to new regression models under test; it will conduct the following fuzzing process and finish the entire job.

Please do not forget to name your own regression scenarios and regression datasets, setting the corresponding parameters (or configurations) in params so that you can load the parameters through experimental scripts.

Extra Files

For users who want to fetch the kmnc_profile file, please see the link below:

Link: https://pan.baidu.com/s/1apWPrYnvrEutz7VA8gZZNg?pwd=z3an

Pin-code: z3an

Additional Results

RQ1

The overall result is saved here named overall_result.jpg due to the limited paper space.

You can also refer to the table below for detailed information.

ID Dataset-Model Regression Approach #RF #URF #Seed #GF
DiffChaser 27,065 1,185 1,138 27,400
1 MNIST-LeNet5 SUPPLY DeepHunter 4,837 2,701 2,127 49,003
DRFuzz 36,293 8,767 4,345 68,451
DiffChaser 14,994 303 277 15,602
2 MNIST-LeNet5 ADV:BIM DeepHunter 19,588 8,670 4,760 61,871
DRFuzz 41,376 13,342 5,948 82,377
DiffChaser 531 81 73 1,601
3 MNIST-LeNet5 ADV:CW DeepHunter 3,799 2,408 1,791 53,178
DRFuzz 25,727 7,996 3,972 117,329
DiffChaser 4,799 582 513 6,927
4 MNIST-LeNet5 FIXING DeepHunter 2,760 2,279 1,906 35,139
DRFuzz 32,858 11,075 4,990 116,021
DiffChaser 19,047 2,832 2,426 33,838
5 MNIST-LeNet5 PRUNE DeepHunter 6,763 3,689 2,547 68,281
DRFuzz 26,470 9,071 4,571 100,967
DiffChaser 8,356 2,204 1,739 23,790
6 CIFAR10-VGG16 SUPPLY DeepHunter 983 630 516 8,089
DRFuzz 41,422 16,105 6,505 331,701
DiffChaser 8,976 1,562 1,259 17,888
7 CIFAR10-VGG16 ADV:BIM DeepHunter 857 664 519 7,272
DRFuzz 58,192 17,222 6,925 321,619
DiffChaser 3,230 780 628 13,489
8 CIFAR10-VGG16 ADV:CW DeepHunter 247 185 163 4,228
DRFuzz 33,412 11,471 5,004 374,531
DiffChaser 18,220 2,614 1,877 27,192
9 CIFAR10-VGG16 FIXING DeepHunter 2,126 1,202 854 7,360
DRFuzz 74,644 22,576 7,338 249,626
DiffChaser 152,702 6,989 4,037 169,086
10 CIFAR10-VGG16 PRUNE DeepHunter 6,885 2,051 1,212 11,394
DRFuzz 115,099 22,333 7,883 228,750
DiffChaser 13,315 325 285 21,326
11 FM-AlexNet SUPPLY DeepHunter 6,248 2,872 2,088 39,535
DRFuzz 63,981 12,711 5,743 260,899
DiffChaser 26,157 750 557 45,619
12 FM-AlexNet ADV:BIM DeepHunter 6,130 2,886 1,875 26,690
DRFuzz 114,729 17,886 6,806 382,470
DiffChaser 4,995 491 407 26,169
13 FM-AlexNet ADV:CW DeepHunter 2,178 1,473 1,191 33,401
DRFuzz 50,801 13,803 5,759 389,195
DiffChaser 32,580 1,292 861 44,825
14 FM-AlexNet FIXING DeepHunter 9,574 5,225 3,044 25,493
DRFuzz 176,104 27,352 7,999 377,966
DiffChaser 52,384 1,483 1,063 66,501
15 FM-AlexNet PRUNE DeepHunter 17,302 8,150 3,963 35,696
DRFuzz 168,029 26,229 7,885 306,389
DiffChaser 1,220 250 221 1,599
16 SVHN-ResNet18 SUPPLY DeepHunter 1,731 1,126 878 10,790
DRFuzz 31,364 15,980 8,493 170,618
DiffChaser 1,088 83 71 1,487
17 SVHN-ResNet18 ADV:BIM DeepHunter 1,492 1,057 864 9,908
DRFuzz 29,779 18,131 9,558 169,835
DiffChaser 370 64 60 1,074
18 SVHN-ResNet18 ADV:CW DeepHunter 264 225 213 5,773
DRFuzz 10,943 8,509 5,610 178,922
DiffChaser 663 199 184 1,198
19 SVHN-ResNet18 FIXING DeepHunter 941 742 627 8,815
DRFuzz 22,612 16,431 8,741 168,544
DiffChaser 712 626 532 1,200
20 SVHN-ResNet18 PRUNE DeepHunter 1,888 1,118 887 5,429
DRFuzz 34,561 18,266 10,420 105,750

RQ3 and its accuracy against test set

model scenario train on\test on DRFuzz DeepHunter DiffChaser Acc Change
LeNet5 SUPPLY DRFuzz 84.58% 59.42% 83.36% 0.78%
LeNet5 SUPPLY DeepHunter 58.99% 80.71% 71.29% 0.90%
LeNet5 SUPPLY DiffChaser 53.42% 34.27% 72.92% 0.96%
AlexNet SUPPLY DRFuzz 84.68% 67.56% 77.56% 0.79%
AlexNet SUPPLY DeepHunter 51.85% 73.56% 42.57% 1.18%
AlexNet SUPPLY DiffChaser 41.25% 37.18% 62.28% -1.95%
VGG16 SUPPLY DRFuzz 90.70% 90.54% 88.84% -0.33%
VGG16 SUPPLY DeepHunter 68.65% 71.20% 65.75% -0.12%
VGG16 SUPPLY DiffChaser 72.27% 75.77% 71.24% -0.22%
ResNet18 SUPPLY DRFuzz 79.97% 78.84% 43.25% 0.12%
ResNet18 SUPPLY DeepHunter 61.01% 65.85% 68.25% -2.22%
ResNet18 SUPPLY DiffChaser 46.44% 51.25% 62.01% -2.69%
LeNet5 ADV:BIM DRFuzz 87.69% 73.38% 75.94% 0.84%
LeNet5 ADV:BIM DeepHunter 45.47% 79.42% 58.76% 0.56%
LeNet5 ADV:BIM DiffChaser 49.58% 57.97% 61.75% 0.42%
AlexNet ADV:BIM DRFuzz 85.68% 74.40% 76.94% 1.30%
AlexNet ADV:BIM DeepHunter 67.44% 76.04% 70.93% 1.08%
AlexNet ADV:BIM DiffChaser 49.29% 51.99% 72.66% 0.09%
VGG16 ADV:BIM DRFuzz 88.76% 85.09% 78.09% 0.34%
VGG16 ADV:BIM DeepHunter 73.74% 71.53% 76.35% 0.65%
VGG16 ADV:BIM DiffChaser 71.11% 62.68% 76.64% 0.61%
ResNet18 ADV:BIM DRFuzz 85.97% 86.63% 91.74% 0.75%
ResNet18 ADV:BIM DeepHunter 69.83% 73.99% 81.31% 0.35%
ResNet18 ADV:BIM DiffChaser 65.38% 68.90% 80.79% 0.43%
LeNet5 ADV:CW DRFuzz 75.76% 35.00% 65.87% 0.00%
LeNet5 ADV:CW DeepHunter 42.12% 69.01% 26.59% -0.08%
LeNet5 ADV:CW DiffChaser 52.55% 30.65% 78.77% -0.63%
AlexNet ADV:CW DRFuzz 86.72% 69.36% 63.21% 0.04%
AlexNet ADV:CW DeepHunter 53.82% 62.61% 59.50% -0.15%
AlexNet ADV:CW DiffChaser 46.21% 42.98% 65.77% 0.42%
VGG16 ADV:CW DRFuzz 88.98% 82.49% 85.92% -0.63%
VGG16 ADV:CW DeepHunter 77.69% 73.74% 74.59% -0.01%
VGG16 ADV:CW DiffChaser 66.04% 64.65% 69.49% 0.02%
ResNet18 ADV:CW DRFuzz 81.44% 83.71% 92.49% 0.34%
ResNet18 ADV:CW DeepHunter 60.36% 68.37% 95.83% 0.41%
ResNet18 ADV:CW DiffChaser 58.00% 63.26% 95.83% -0.80%
LeNet5 FIXING DRFuzz 85.41% 66.48% 65.64% 0.01%
LeNet5 FIXING DeepHunter 54.25% 70.46% 62.47% -0.01%
LeNet5 FIXING DiffChaser 50.83% 49.23% 64.78% -0.12%
AlexNet FIXING DRFuzz 76.08% 56.88% 39.46% -0.20%
AlexNet FIXING DeepHunter 41.96% 62.41% 39.28% -1.14%
AlexNet FIXING DiffChaser 37.86% 35.28% 60.27% -4.65%
VGG16 FIXING DRFuzz 71.98% 66.42% 45.52% -1.21%
VGG16 FIXING DeepHunter 54.42% 56.56% 58.67% -2.64%
VGG16 FIXING DiffChaser 55.56% 59.13% 44.45% -4.38%
ResNet18 FIXING DRFuzz 71.56% 67.55% 58.11% 0.40%
ResNet18 FIXING DeepHunter 64.19% 68.09% 74.32% 0.00%
ResNet18 FIXING DiffChaser 50.98% 61.73% 73.27% -1.78%
LeNet5 PRUNE DRFuzz 75.21% 39.57% 70.27% 0.09%
LeNet5 PRUNE DeepHunter 38.77% 72.11% 44.41% -0.06%
LeNet5 PRUNE DiffChaser 34.54% 30.69% 76.54% -0.03%
AlexNet PRUNE DRFuzz 83.85% 71.00% 63.38% 0.78%
AlexNet PRUNE DeepHunter 53.46% 71.88% 54.12% 0.68%
AlexNet PRUNE DiffChaser 35.12% 33.04% 64.31% -1.51%
VGG16 PRUNE DRFuzz 86.53% 87.51% 80.22% 11.75%
VGG16 PRUNE DeepHunter 74.90% 83.67% 79.62% 12.00%
VGG16 PRUNE DiffChaser 72.49% 78.22% 79.65% 12.13%
ResNet18 PRUNE DRFuzz 80.54% 83.41% 83.53% 3.54%
ResNet18 PRUNE DeepHunter 71.82% 76.74% 77.22% 3.17%
ResNet18 PRUNE DiffChaser 71.68% 80.26% 81.94% 4.06%

About

This repository is for the paper of ICSE 2023: Regression Fuzzing for Deep Learning Systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages