DeepVulMatch: Learning and Matching Latent Vulnerability Representations for Dual-Granularity Vulnerability Detection
(Replication Package)

DeepVulMatch

Learning and Matching Latent Vulnerability Representations for Dual-Granularity Vulnerability Detection

How to reproduce

Environment Setup

First of all, clone this repository to your local machine and access the main dir via the following command:

git clone https://github.com/awsm-research/DeepVulMatch.git
cd optimatch

Then, install the python dependencies via the following command:

pip install -r requirements.txt

We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.
To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.
Python 3.9.7 is recommended, which has been fully tested without issues.

Reproduction of Experiments

Download necessary data and unzip via the following command:

cd data
sh download_data.sh 
cd ..

Reproduce Main Results (Table 1 in the paper)

OPTIMATCH (proposed approach)

Inference

cd our_method/optimatch/saved_models/checkpoint-best-f1
sh download_models.sh
cd ../..
sh test_phase_2_150pat.sh
cd ..

Retrain Phase 1 Model

cd our_method/optimatch
sh train_phase_1.sh
cd ..

Retrain Phase 2 Model

cd our_method/optimatch
sh train_phase_2_150pat.sh
cd ..

Baselines

To reproduce baseline approaches, please follow the instructions below:
- Step 1: cd to "./baselines" folder
- Step 2: cd to the specific baseline folder you wish to reproduce, e.g., "statement_codebert"
- Step 3: cd to the models folder, e.g., "saved_models/checkpoint-best-f1"
- Step 4: download the models via "sh download_models.sh" and "cd ../.."
- Step 5: find the shell script named as "train_xyz.sh" (e.g., train_multi_task_baseline_codebert.sh) and run it via "sh train_xyz.sh"
To run inference, find the shell script named as "test_xyz.sh" and run it via "sh test_xyz.sh",
If "test_xyz.sh" does not exist, remove "do_test" command in "train_xyz.sh" and run the inference via "sh train_xyz.sh"

A concrete example is provided as follows:
- Statement-Level CodeBERT
  - Retrain
```
cd baselines/statement_codebert/saved_models/checkpoint-best-f1
sh download_models.sh
cd ../..
sh train_multi_task_baseline_codebert.sh
cd ../..
```

Reproduce Ablation Study (Table 2 in the paper)

To reproduce w/o vulnerability codebook & matching, run the following commands:
- Retrain (ignore "sh train_phase_one.sh" if running inference only)
```
cd our_method/optimatch/saved_models/checkpoint-best-f1
sh download_models.sh
cd ../..
sh train_phase_one.sh
sh test_phase_one.sh
cd ../..
```

Each ablation trial (except w/o vulnerability codebook & matching) consists of phase 1 and 2 trainings like our OPTIMATCH approach. First cd to the folder contains your interested trial. To retrain models in any phases, run "train_xyz.sh". To run inference in any phases, run "test_xyz.sh".

To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_mean"
To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_max"
To reproduce OPTIMATCH wt N vulnerability centroids, cd to "./ablation/num_patterns"

Citation

under review at IEEE TDSC

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ablation		ablation
baselines		baselines
data		data
img		img
our_method/optimatch		our_method/optimatch
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepVulMatch: Learning and Matching Latent Vulnerability Representations for Dual-Granularity Vulnerability Detection
(Replication Package)

DeepVulMatch

Table of contents

How to reproduce

Environment Setup

Reproduction of Experiments

Reproduce Main Results (Table 1 in the paper)

Reproduce Ablation Study (Table 2 in the paper)

Citation

About

Releases

Packages

Languages

License

awsm-research/DeepVulMatch

Folders and files

Latest commit

History

Repository files navigation

DeepVulMatch: Learning and Matching Latent Vulnerability Representations for Dual-Granularity Vulnerability Detection (Replication Package)

DeepVulMatch

Table of contents

How to reproduce

Environment Setup

Reproduction of Experiments

Reproduce Main Results (Table 1 in the paper)

Reproduce Ablation Study (Table 2 in the paper)

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

DeepVulMatch: Learning and Matching Latent Vulnerability Representations for Dual-Granularity Vulnerability Detection
(Replication Package)

Packages