Moonshine: Fuzzing Corpus Design and Construction

Moonshine is a research project looking at efficient methods for designing a corpus typically for use in fuzzing campaigns. We call this design process distillation.

Corpus distillation is the process by which we take a very large corpus of file types - e.g. 100,000 PDF files - called seeds - and choose a much smaller subset to fuzz against - say 1,000 PDF files. However, we want this smaller subset corpus to have the same "expressive power" as the original large corpus. We don't want to lose any information in the distillation process.

Currently "expressive power" means maximising code coverage. However, we are planning other types of measures of interest, such as execution complexity and usage of specific libraries or system calls.

The project name "Moonshine" is a pun on the word distillation. We use the term distillation in preference to the commonly used term reduction. This is because in fuzzing parlance, "reduction" is a term that is also used in the crash triage process.

We have included five benchmark corpora in this repo. The following table gives you an understanding of what level of distillations you should get when you run the tool.

File Type	Collection Size	Distilled Size	Improvement Gain
PDF	42,056	664	63
DOC	7,836	745	10
PNG	4,831	94	51
TTF	5,666	27	210
HTML	69,991	410	171

Now if you were fuzzing Adobe Acrobat, instead of using 42,000 files you would only need the much smaller subset of 664 files.

Sometimes what is important is not just the number of seeds in your corpus but the total weight (or size) of the seeds in bytes. This is particularly important if your fuzzer is IO bound. In this case the corpus design and construction is choosing seeds which not only maximise code coverage but also have the smallest weight.

Moonshine can perform both weighted and unweighted distillations.

For more information on the performance of Moonshine against the current state of the art see Results.

Installation

See here

Usage

See here

Trace Data

For more information on the expected input format of seed trace data see here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
python		python
src		src
.clang-format		.clang-format
BENCHMARKS.md		BENCHMARKS.md
CMakeLists.txt		CMakeLists.txt
COMPILE.md		COMPILE.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
DATA.md		DATA.md
Doxyfile.in		Doxyfile.in
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
USAGE.md		USAGE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moonshine: Fuzzing Corpus Design and Construction

Installation

Usage

Trace Data

About

Releases

Packages

Languages

License

moonlight-project/moonlight

Folders and files

Latest commit

History

Repository files navigation

Moonshine: Fuzzing Corpus Design and Construction

Installation

Usage

Trace Data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages