Skip to content

Commit 2cfd200

Browse files
committed
Update OSS-Fuzz Scripts to Use New QA-Assets Repo Structure
This change is required to support the changes to the seed data repo structure introduced in: gitpython-developers/qa-assets#2 This moves most of the seed data related build steps into the OSS-Fuzz Docker image build via `container-environment-bootstrap.sh`. This includes moveing the dictionaries into that repo. The fuzzing/README.md here should be updated in a follow-up with a link to the qa-assets repo (and probably some context setting about corpora in general) but I have opted to defer that as I think the functionality added by the seed data improvements is valuable as is and shouldn't be blocked by documentation writers block.
1 parent 2493c3a commit 2cfd200

File tree

5 files changed

+53
-114
lines changed

5 files changed

+53
-114
lines changed

Diff for: fuzzing/README.md

-19
Original file line numberDiff line numberDiff line change
@@ -76,25 +76,6 @@ Contains Python files for each fuzz test.
7676
reason, fuzz tests should gracefully handle anticipated exception cases with a `try`/`except` block to avoid false
7777
positives that halt the fuzzing engine.
7878

79-
### Dictionaries (`dictionaries/`)
80-
81-
Provides hints to the fuzzing engine about inputs that might trigger unique code paths. Each fuzz target may have a
82-
corresponding `.dict` file. For information about dictionary syntax, refer to
83-
the [LibFuzzer documentation on the subject](https://llvm.org/docs/LibFuzzer.html#dictionaries).
84-
85-
**Things to Know**:
86-
87-
- OSS-Fuzz loads dictionary files per fuzz target if one exists with the same name, all others are ignored.
88-
- Most entries in the dictionary files found here are escaped hex or Unicode values that were recommended by the fuzzing
89-
engine after previous runs.
90-
- A default set of dictionary entries are created for all fuzz targets as part of the build process, regardless of an
91-
existing file here.
92-
- Development or updates to dictionaries should reflect the varied formats and edge cases relevant to the
93-
functionalities under test.
94-
- Example dictionaries (some of which are used to build the default dictionaries mentioned above) can be found here:
95-
- [AFL++ dictionary repository](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries#readme)
96-
- [Google/fuzzing dictionary repository](https://github.com/google/fuzzing/tree/master/dictionaries)
97-
9879
### OSS-Fuzz Scripts (`oss-fuzz-scripts/`)
9980

10081
Includes scripts for building and integrating fuzz targets with OSS-Fuzz:

Diff for: fuzzing/dictionaries/fuzz_blob.dict

-1
This file was deleted.

Diff for: fuzzing/dictionaries/fuzz_config.dict

-56
This file was deleted.

Diff for: fuzzing/oss-fuzz-scripts/build.sh

+3-24
Original file line numberDiff line numberDiff line change
@@ -7,34 +7,13 @@ set -euo pipefail
77

88
python3 -m pip install .
99

10-
# Directory to look in for dictionaries, options files, and seed corpora:
11-
SEED_DATA_DIR="$SRC/seed_data"
12-
13-
find "$SEED_DATA_DIR" \( -name '*_seed_corpus.zip' -o -name '*.options' -o -name '*.dict' \) \
14-
! \( -name '__base.*' \) -exec printf 'Copying: %s\n' {} \; \
10+
find "$SRC" -maxdepth 1 \
11+
\( -name '*_seed_corpus.zip' -o -name '*.options' -o -name '*.dict' \) \
12+
-exec printf '[%s] Copying: %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" {} \; \
1513
-exec chmod a-x {} \; \
1614
-exec cp {} "$OUT" \;
1715

1816
# Build fuzzers in $OUT.
1917
find "$SRC/gitpython/fuzzing" -name 'fuzz_*.py' -print0 | while IFS= read -r -d '' fuzz_harness; do
2018
compile_python_fuzzer "$fuzz_harness" --add-binary="$(command -v git):."
21-
22-
common_base_dictionary_filename="$SEED_DATA_DIR/__base.dict"
23-
if [[ -r "$common_base_dictionary_filename" ]]; then
24-
# Strip the `.py` extension from the filename and replace it with `.dict`.
25-
fuzz_harness_dictionary_filename="$(basename "$fuzz_harness" .py).dict"
26-
output_file="$OUT/$fuzz_harness_dictionary_filename"
27-
28-
printf 'Appending %s to %s\n' "$common_base_dictionary_filename" "$output_file"
29-
if [[ -s "$output_file" ]]; then
30-
# If a dictionary file for this fuzzer already exists and is not empty,
31-
# we append a new line to the end of it before appending any new entries.
32-
#
33-
# LibFuzzer will happily ignore multiple empty lines in a dictionary but fail with an error
34-
# if any single line has incorrect syntax (e.g., if we accidentally add two entries to the same line.)
35-
# See docs for valid syntax: https://llvm.org/docs/LibFuzzer.html#id32
36-
echo >>"$output_file"
37-
fi
38-
cat "$common_base_dictionary_filename" >>"$output_file"
39-
fi
4019
done

Diff for: fuzzing/oss-fuzz-scripts/container-environment-bootstrap.sh

+50-14
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,20 @@ set -euo pipefail
99
# Prerequisites #
1010
#################
1111

12-
for cmd in python3 git wget rsync; do
12+
for cmd in python3 git wget zip; do
1313
command -v "$cmd" >/dev/null 2>&1 || {
1414
printf '[%s] Required command %s not found, exiting.\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$cmd" >&2
1515
exit 1
1616
}
1717
done
1818

19-
SEED_DATA_DIR="$SRC/seed_data"
20-
mkdir -p "$SEED_DATA_DIR"
21-
2219
#############
2320
# Functions #
2421
#############
2522

2623
download_and_concatenate_common_dictionaries() {
2724
# Assign the first argument as the target file where all contents will be concatenated
28-
target_file="$1"
25+
local target_file="$1"
2926

3027
# Shift the arguments so the first argument (target_file path) is removed
3128
# and only URLs are left for the loop below.
@@ -38,22 +35,61 @@ download_and_concatenate_common_dictionaries() {
3835
done
3936
}
4037

41-
fetch_seed_corpora() {
42-
# Seed corpus zip files are hosted in a separate repository to avoid additional bloat in this repo.
43-
git clone --depth 1 https://github.com/gitpython-developers/qa-assets.git qa-assets &&
44-
rsync -avc qa-assets/gitpython/corpra/ "$SEED_DATA_DIR/" &&
45-
rm -rf qa-assets # Clean up the cloned repo to keep the Docker image as slim as possible.
38+
create_seed_corpora_zips() {
39+
local seed_corpora_dir="$1"
40+
local output_zip
41+
for dir in "$seed_corpora_dir"/*; do
42+
if [ -d "$dir" ] && [ -n "$dir" ]; then
43+
output_zip="$SRC/$(basename "$dir")_seed_corpus.zip"
44+
printf '[%s] Zipping the contents of %s into %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$dir" "$output_zip"
45+
zip -jur "$output_zip" "$dir"/*
46+
fi
47+
done
48+
}
49+
50+
prepare_dictionaries_for_fuzz_targets() {
51+
local dictionaries_dir="$1"
52+
local fuzz_targets_dir="$2"
53+
local common_base_dictionary_filename="$WORK/__base.dict"
54+
55+
printf '[%s] Copying .dict files from %s to %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$dictionaries_dir" "$SRC/"
56+
cp -v "$dictionaries_dir"/*.dict "$SRC/"
57+
58+
download_and_concatenate_common_dictionaries "$common_base_dictionary_filename" \
59+
"https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/utf8.dict" \
60+
"https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/url.dict"
61+
62+
find "$fuzz_targets_dir" -name 'fuzz_*.py' -print0 | while IFS= read -r -d '' fuzz_harness; do
63+
if [[ -r "$common_base_dictionary_filename" ]]; then
64+
# Strip the `.py` extension from the filename and replace it with `.dict`.
65+
fuzz_harness_dictionary_filename="$(basename "$fuzz_harness" .py).dict"
66+
local output_file="$SRC/$fuzz_harness_dictionary_filename"
67+
68+
printf '[%s] Appending %s to %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$common_base_dictionary_filename" "$output_file"
69+
if [[ -s "$output_file" ]]; then
70+
# If a dictionary file for this fuzzer already exists and is not empty,
71+
# we append a new line to the end of it before appending any new entries.
72+
#
73+
# LibFuzzer will happily ignore multiple empty lines in a dictionary but fail with an error
74+
# if any single line has incorrect syntax (e.g., if we accidentally add two entries to the same line.)
75+
# See docs for valid syntax: https://llvm.org/docs/LibFuzzer.html#id32
76+
echo >>"$output_file"
77+
fi
78+
cat "$common_base_dictionary_filename" >>"$output_file"
79+
fi
80+
done
4681
}
4782

4883
########################
4984
# Main execution logic #
5085
########################
86+
# Seed corpora and dictionaries are hosted in a separate repository to avoid additional bloat in this repo.
87+
# We clone into the $WORK directory because OSS-Fuzz cleans it up after building the image, keeping the image small.
88+
git clone --depth 1 https://github.com/gitpython-developers/qa-assets.git "$WORK/qa-assets"
5189

52-
fetch_seed_corpora
90+
create_seed_corpora_zips "$WORK/qa-assets/gitpython/corpora"
5391

54-
download_and_concatenate_common_dictionaries "$SEED_DATA_DIR/__base.dict" \
55-
"https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/utf8.dict" \
56-
"https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/url.dict"
92+
prepare_dictionaries_for_fuzz_targets "$WORK/qa-assets/gitpython/dictionaries" "$SRC/gitpython/fuzzing"
5793

5894
# The OSS-Fuzz base image has outdated dependencies by default so we upgrade them below.
5995
python3 -m pip install --upgrade pip

0 commit comments

Comments
 (0)