Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added fuzzing support #3007

Merged
merged 14 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 58 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ If you're looking for guidance on filing a bug report or getting support, please
- [Bumping Java dependencies](#bumping-java-dependencies)
- [Rake tasks](#rake-tasks)
- [Making a release](#making-a-release)
- [Fuzzing your gumbo parser changes](#fuzzing-your-gumbo-parser-changes)

<!-- tocstop -->

Expand Down Expand Up @@ -382,7 +383,7 @@ To modify or add a dependency, a few things needs to be in sync:
A quick summary of what this looks like for you, the developer:

1. edit the `requirements` in the gemspec
2. run `bundle exec rake vendor_jars` which updates everything under `lib/nokogiri/jruby`
2. run `bundle exec rake vendor_jars` which updates everything under `lib/nokogiri/jrubfuzzing-your-changesy`
fuzzy-boiii23a marked this conversation as resolved.
Show resolved Hide resolved
3. run `bundle exec rake check_manifest` and if necessary update the gemspec `files`
4. make sure to check everything under `lib/nokogiri/jruby` into git, including the jar files

Expand All @@ -408,3 +409,59 @@ A quick checklist:
- [ ] submit a PR to https://github.com/rubysec/ruby-advisory-db
- [ ] update nokogiri.org
- [ ] bump `lib/nokogiri/version/constant.rb` to a prerelease version like `v1.14.0.dev`

## Fuzzing your gumbo parser changes

When making changes or adding new features to `gumbo-parser`, it's recommended to run [libfuzzer](https://llvm.org/docs/LibFuzzer.html) against `gumbo-parser` using various [sanitizers](https://github.com/google/sanitizers/wiki). This can be done by navigating to the `nokogiri/gumbo-parser` directory and executing `make fuzzing` in order to build the `gumbo-parser` fuzzer. Once built, navigate to the `nokogiri/gumbo-parser/fuzzer/build` directory and execute one of the following binaries in this directory with no arguments to start fuzzing:

- parse_fuzzer (standard fuzzer with no sanitizer)
- parse_fuzzer-address (fuzzer built using [ASAN](https://clang.llvm.org/docs/AddressSanitizer.html))
- parse_fuzzer-memory (fuzzer built using [MSAN](https://clang.llvm.org/docs/MemorySanitizer.html))
- parse_fuzzer-undefined (fuzzer built using [UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html))

If the binary executed successfully you should now be seeing the following output filling up your terminal:

```
INFO: Seed: 4156947595
INFO: Loaded 1 modules (7149 inline 8-bit counters): 7149 0x58a462, 0x58c04f,
INFO: Loaded 1 PC tables (7149 PCs): 7149 0x53beb0,0x557d80,
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 24Mb
NEW_FUNC[1/44]: 0x429840 in gumbo_parse_with_options (/home/user/nokogiri/gumbo-parser/fuzzer/build/parse_fuzzer+0x429840)
NEW_FUNC[2/44]: 0x42c0d0 in destroy_node (/home/user/nokogiri/gumbo-parser/fuzzer/build/parse_fuzzer+0x42c0d0)
#721 NEW cov: 180 ft: 181 corp: 2/12b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 4 ChangeByte-ChangeByte-ChangeBit-InsertRepeatedBytes-
#722 NEW cov: 186 ft: 196 corp: 3/23b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 1 ChangeBit-
#723 NEW cov: 186 ft: 228 corp: 4/34b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 1 ChangeBinInt-
#724 NEW cov: 188 ft: 241 corp: 5/45b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 1 ChangeBit-
#725 NEW cov: 188 ft: 254 corp: 6/56b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 1 ChangeByte-
#726 NEW cov: 188 ft: 270 corp: 7/67b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 1 CopyPart-
#732 NEW cov: 188 ft: 279 corp: 8/78b lim: 11 exec/s: 0 rss: 27Mb L: 11/11 MS: 1 ChangeBit-
NEW_FUNC[1/1]: 0x441de0 in gumbo_token_destroy (/home/user/nokogiri/gumbo-parser/fuzzer/build/parse_fuzzer+0x441de0)
```

However, if the fuzzer finds a "crash" (indicating that a bug has been found) it will stop fuzzing and the following output would be expected:

```
INFO: Seed: 1523017872
INFO: Loaded 1 modules (16 guards): 0x744e60, 0x744ea0,
INFO: -max_len is not provided, using 64
INFO: A corpus is not provided, starting from an empty corpus
#0 READ units: 1
#1 INITED cov: 3 ft: 2 corp: 1/1b exec/s: 0 rss: 24Mb
#3811 NEW cov: 4 ft: 3 corp: 2/2b exec/s: 0 rss: 25Mb L: 1 MS: 5 ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte-
#3827 NEW cov: 5 ft: 4 corp: 3/4b exec/s: 0 rss: 25Mb L: 2 MS: 1 CopyPart-
#3963 NEW cov: 6 ft: 5 corp: 4/6b exec/s: 0 rss: 25Mb L: 2 MS: 2 ShuffleBytes-ChangeBit-
#4167 NEW cov: 7 ft: 6 corp: 5/9b exec/s: 0 rss: 25Mb L: 3 MS: 1 InsertByte-
==31511== ERROR: libFuzzer: deadly signal
...
artifact_prefix='./'; Test unit written to ./crash-b13e8756b13a00cf168300179061fb4b91fefbed
```

The above indicates that a crash has been identified and it can be reproduced by feeding the `crash-b13e8756b13a00cf168300179061fb4b91fefbed` file back into the binary used for fuzzing (e.g. parse-fuzzer) using the following command:

```
parse_fuzzer crash-b13e8756b13a00cf168300179061fb4b91fefbed
```

If you'd like to learn more about libfuzzer please give https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md a try.
2 changes: 2 additions & 0 deletions gumbo-parser/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
build
googletest
src/*.o
fuzzer/build
src/libgumbo.a
15 changes: 15 additions & 0 deletions gumbo-parser/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,20 @@ LDFLAGS := -pthread

all: check

fuzzing: fuzzer-normal fuzzer-asan fuzzer-ubsan fuzzer-msan

fuzzer-normal:
cd fuzzer && ./build.sh && cd -

fuzzer-asan:
cd fuzzer && SANITIZER=address ./build.sh && cd -

fuzzer-ubsan:
cd fuzzer && SANITIZER=undefined ./build.sh && cd -

fuzzer-msan:
cd fuzzer && SANITIZER=memory ./build.sh && cd -

# don't try to regenerate ragel or gperf files in CI, that should be a development-only action and
# the generated files should be committed to SCM
ifneq ($(CI),true)
Expand Down Expand Up @@ -81,6 +95,7 @@ coverage:

clean:
$(RM) -r build
$(RM) -r fuzzer/build

build/src/flags: | build/src
@echo 'old_CC := $(CC)' > $@
Expand Down
50 changes: 50 additions & 0 deletions gumbo-parser/fuzzer/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
export SANITIZER_OPTS=""
export SANITIZER_LINK=""

if [ -x "$(command -v llvm-config)" ]; then
fuzzy-boiii23a marked this conversation as resolved.
Show resolved Hide resolved
export LLVM_CONFIG=$(which llvm-config)
fi

if [ -z "${LLVM_CONFIG}" ]
then
echo 'llvm-config could not be found and $LLVM_CONFIG has not been set, expecting "export LLVM_CONFIG=/usr/bin/llvm-config-12" assuming clang-12 is installed, however any clang version works'
exit
fi

if [ ! -d "build" ]
then
mkdir build
fi

export CC="$($LLVM_CONFIG --bindir)/clang"
export CXX="$($LLVM_CONFIG --bindir)/clang++"
export CXXFLAGS="-fsanitize=fuzzer-no-link"
export CFLAGS="-fsanitize=fuzzer-no-link"
export ENGINE_LINK="$(find $($LLVM_CONFIG --libdir) -name libclang_rt.fuzzer-x86_64.a | head -1)"

if [ "$SANITIZER" = "undefined" ]
then
export SANITIZER_OPTS="-fsanitize=undefined"
export SANITIZER_LINK="$(find $($LLVM_CONFIG --libdir) -name libclang_rt.ubsan_standalone_cxx-x86_64.a | head -1)"
fi
if [ "$SANITIZER" = "address" ]
then
export SANITIZER_OPTS="-fsanitize=address"
export SANITIZER_LINK="$(find $($LLVM_CONFIG --libdir) -name libclang_rt.asan_cxx-x86_64.a | head -1)"
fi
if [ "$SANITIZER" = "memory" ]
then
export SANITIZER_OPTS="-fsanitize=memory -fPIE -pie -Wno-unused-command-line-argument"
export SANITIZER_LINK="$(find $($LLVM_CONFIG --libdir) -name libclang_rt.msan_cxx-x86_64.a | head -1)"
fi

export CXXFLAGS="-O3 $CXXFLAGS $SANITIZER_OPTS"
export CFLAGS="-O3 $CFLAGS $SANITIZER_OPTS"
cd ../src && make clean && make && cd -

if [ -z "${SANITIZER}" ]
then
$CXX $CXXFLAGS -o build/parse_fuzzer parse_fuzzer.cc ../src/libgumbo.a $ENGINE_LINK $SANITIZER_LINK
else
$CXX $CXXFLAGS -o build/parse_fuzzer-$SANITIZER parse_fuzzer.cc ../src/libgumbo.a $ENGINE_LINK $SANITIZER_LINK
fi
68 changes: 68 additions & 0 deletions gumbo-parser/fuzzer/parse_fuzzer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#include "../src/nokogiri_gumbo.h"
#include <stdint.h>

int SanityCheckPointers(const char* input, size_t input_length, const GumboNode* node, int depth) {
if (node->type == GUMBO_NODE_DOCUMENT || depth > 400) {
return -1;
}
if (node->type == GUMBO_NODE_ELEMENT) {
const GumboElement* element = &node->v.element;
const GumboVector* attributes = &element->attributes;

for (unsigned int i = 0; i < attributes->length; ++i) {
const GumboAttribute* attribute = static_cast<const GumboAttribute*>(attributes->data[i]);
if (!attribute)
{
return -1;
}
}
const GumboVector* children = &element->children;
for (unsigned int i = 0; i < children->length; ++i) {
const GumboNode* child = static_cast<const GumboNode*>(children->data[i]);
if (!child)
{
return -1;
}
SanityCheckPointers(input, input_length, child, depth + 1);
}
} else {
const GumboText* text = &node->v.text;
if (!text)
{
return -1;
}
}

return 0;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
if (size < 10)
{
return 0;
}

GumboOptions options = kGumboDefaultOptions;
GumboOutput* output;
GumboNode* root;

output = gumbo_parse_with_options(&options, (char*)data, size);
root = output->document;

int result = SanityCheckPointers((char*)data, size, output->root, 0);

if (result < 0)
{
if (output) {
gumbo_destroy_output(output);
}

return -1;
}

if (output) {
gumbo_destroy_output(output);
}

return 0;
}