This repository contains metadata and usage instructions for the ARVO vulnerability dataset described in our paper ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software.
The code to generate the ARVO dataset will be published soon. The generated dataset and related metadata are updated in this repository. Each report file represents one found vulnerability on OSS-Fuzz.
Run the following command to feed the proof-of-concept (POC) to a vulnerability found on this page. You should see an ASAN report for a heap overflow bug.
docker run -it n132/arvo:25402-vul arvo
ARVO uses source metadata from OSS-Fuzz to solve reproducing problems and build a reproducible dataset: each vulnerability can be compiled from source at its vulnerable version, triggered using the PoC input found by the fuzzer, compiled at the patched version, and finally the patch can be verified by checking that the PoC input no longer triggers.
The meta folder includes metadata for all the recompilable vulnerabilities. You can find the original report on the oss-fuzz issue tracker. The patching commits are identified by ARVO, achieving over 80% correctness based on our evaluation. Additionally, we provide an interactive recompiling environment on our Docker Hub Repository.
-
Select interesting vulnerabilities from the meta folder (e.g., 25402).
-
Run a Docker container to create an interactive environment for these vulnerabilities:
docker run -it n132/arvo:25402-vul bash # vulnerable version
docker run -it n132/arvo:25402-fix bash # fixed version
- [Optional] Modify the code or change the compile settings and recompile it:
# Run this command inside the Docker container
arvo compile
- Feed the POC to the vulnerable/fixed binary to verify the vulnerability/fix:
# Run this command inside the Docker container
arvo
In the patches folder, we provide the patches ARVO located for each vulnerability.
If you find any cases that are not reproducible, please open an issue for the case.