Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
e7e5123
Removed python version 3.5 from build system since it is end-of-life
pd-fkie Jun 24, 2021
62f8c15
Added bytecode instrumentation functionality
pd-fkie Jun 25, 2021
954571e
Added atheris.Instrument() to get better control of what gets instrum…
pd-fkie Jun 25, 2021
03d9500
Bug fix: Force constant selection of co_consts to be of same type
pd-fkie Jun 25, 2021
06d21c6
Changed from global TARGET_PACKAGES in import_hook.py to a local vari…
pd-fkie Jun 25, 2021
89423dc
Updated .md files with instrumentation information
pd-fkie Jun 25, 2021
0a8ad1c
Apparently co_stacksize sometimes is too large. Take the largest valu…
pd-fkie Jun 25, 2021
bf169fa
Allow num_counters to be 0
pd-fkie Jun 25, 2021
a39d5be
Don't allow atheris to instrument itself
pd-fkie Jun 25, 2021
3022c6e
Updated example_fuzzers to include atheris.Instrument()
pd-fkie Jun 25, 2021
0499848
Added -ldl flag to asan_with_fuzzer.so
pd-fkie Jun 25, 2021
95fd68a
Separate loop for calculation of stack size in Instrumentor.to_code
pd-fkie Jun 28, 2021
c7632f2
Corrected code that parses lnotab
pd-fkie Jun 28, 2021
b41e3fb
Added trace_dataflow argument to atheris.Instrument()
pd-fkie Jun 29, 2021
e5833b7
Added copyright notice for Fraunhofer FKIE and updated notices to the…
pd-fkie Jun 29, 2021
f551ad2
Changed atheris.Instrument() to atheris.instrument()
pd-fkie Jun 30, 2021
2b78510
Changed atheris.instrument()'s arguments to `include` and `exclude`
pd-fkie Jun 30, 2021
d378dfe
Added notice about atheris 1.0 when python version is too old
pd-fkie Jun 30, 2021
07e2c56
Updated comment of ujson_fuzzer
pd-fkie Jun 30, 2021
7c805f0
Renamed _loc to _trace_branch and _reg to _reserve_counters
pd-fkie Jun 30, 2021
644f7f0
Added support for instrumenting modules after atheris.Fuzz() has been…
pd-fkie Jun 30, 2021
abdf201
Documented TraceCompareOp
pd-fkie Jun 30, 2021
cc0dd60
Got rid of floating point equality
pd-fkie Jun 30, 2021
fa07027
Got rid of floating point equality
pd-fkie Jun 30, 2021
f6c2a25
Added exception handling to _cmp and cleaned up libfuzzer.cc
pd-fkie Jul 1, 2021
67b49f3
Added PCTable creation
pd-fkie Jul 8, 2021
5b731f4
Renamed _cmp to _trace_cmp
pd-fkie Jul 8, 2021
5828203
Restructured the atheris package and added `internal_libfuzzer` argum…
pd-fkie Jul 8, 2021
1f4941d
Added atheris.path()
pd-fkie Jul 8, 2021
2594241
Updated documentation to reflect the new changes
pd-fkie Jul 8, 2021
b70b55a
Updated example_fuzzers
pd-fkie Jul 8, 2021
4a6d37c
Fixed indentation
pd-fkie Jul 8, 2021
6ada953
Updated copyright info
pd-fkie Jul 8, 2021
3094e7e
Merge branch 'google:master' into master
pd-fkie Jul 8, 2021
3b296fa
Sending instrumentation output to stderr
pd-fkie Jul 8, 2021
0ad5eec
Bug fix: Use `pybind11::module` instead of `pybind11::module_`
pd-fkie Jul 8, 2021
d58e219
Bug fix ? Cast `pybind11::detail::item_accessor` to `pybind11::module`
pd-fkie Jul 8, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
dist
atheris.egg-info
.hypothesis
.hypothesis
/.eggs
/build
/tmp
File renamed without changes.
73 changes: 26 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Pytho

## Installation Instructions

Atheris supports Linux (32- and 64-bit) and Mac OS X.
Atheris supports Linux (32- and 64-bit) and Mac OS X.
Only python versions 3.6 - 3.9 are supported.

### Linux

Expand Down Expand Up @@ -39,29 +40,22 @@ CLANG_BIN="$(pwd)/bin/clang" pip3 install atheris
### Example:

```python
import atheris
import sys
import atheris

with atheris.instrument():
import some_library

def TestOneInput(data):
if data == b"bad":
raise RuntimeError("Badness!")
some_library.parse(data)

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
```

Atheris supports fuzzing Python code, and uses Python code coverage information for this purpose.

### Fuzzing Python Code

While Atheris supports Python 2.7 and Python 3.3+, its Python code coverage support is *significantly better* when used with Python 3.8+, as it supports opcode-by-opcode coverage. If fuzzing Python code, we strongly recommend using Python 3.8+ where possible.

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

Be sure to pass `enable_python_coverage=True` as an argument to `Setup()`. You can additionally pass `enable_python_opcode_coverage=[True/False]` to turn on and off opcode coverage. Opcode coverage is typically beneficial, but may provide more performance impact than benefit on large Python projects. This option defaults to `True` on Python 3.8+, or `False` otherwise.

Opcode coverage must be enabled to support features like intelligent string comparison fuzzing for Python code.

### Fuzzing Native Extensions

In order for native fuzzing to be effective, such native extensions must be built with Clang, using the argument `-fsanitize=fuzzer-no-link`. They should be built with the same `clang` as was used when building Atheris.
Expand All @@ -82,55 +76,40 @@ Atheris is fully supported by [OSS-Fuzz](https://github.com/google/oss-fuzz), Go

## API

### Main Interface

The `atheris` module provides two key functions: `Setup()` and `Fuzz()`.
The `atheris` module provides three key functions: `instrument()`, `Setup()` and `Fuzz()`.

In your source file, define a fuzzer entry point function, and pass it to `atheris.Setup()`, along with the fuzzer's arguments (typically `sys.argv`). Finally, call `atheris.Fuzz()` to start fuzzing. Here's an example:
In your source file, import all libraries you wish to fuzz inside a `with atheris.instrument():`-block, like this:
```py
# library_a will not get instrumented
import library_a

```python
def Setup(args, callback, enable_python_coverage=True, enable_python_opcode_coverage=True):
with atheris.instrument():
# library_b will get instrumented
import library_b
```
Define a fuzzer entry point function and pass it to `atheris.Setup()` along with the fuzzer's arguments (typically `sys.argv`). Finally, call `atheris.Fuzz()` to start fuzzing. You must call `atheris.Setup()` before `atheris.Fuzz()`.

#### `instrument(include=[], exclude=[])`
- `include`: A list of fully-qualified module names that shall be instrumented. If this is not specified every module will get instrumented.
- `exclude`: A list of fully-qualified module names that shall NOT be instrumented.

Configure the Atheris Python Fuzzer. You must call `atheris.Setup()` before `atheris.Fuzz()`.
This has to be used together with a `with`-Statement.

Args:
#### `Setup(args, test_one_input, internal_libfuzzer=True)`
- `args`: A list of strings: the process arguments to pass to the fuzzer, typically `sys.argv`. This argument list may be modified in-place, to remove arguments consumed by the fuzzer.
See [the LibFuzzer docs](https://llvm.org/docs/LibFuzzer.html#options) for a list of such options.
- `test_one_input`: your fuzzer's entry point. Must take a single `bytes` argument (`str` in Python 2). This will be repeatedly invoked with a single bytes container.
- `test_one_input`: your fuzzer's entry point. Must take a single `bytes` argument. This will be repeatedly invoked with a single bytes container.
- `internal_libfuzzer`: Indicates whether libfuzzer shall be provided by atheris or an external library (see [using_sanitizers.md](./using_sanitizers.md)).

Optional Args:
- `enable_python_coverage`: boolean. Controls whether to collect coverage information on Python code. Defaults to `True`. If fuzzing a native extension with minimal Python code, set to `False` for a performance increase.
- `enable_python_opcode_coverage`: boolean. Controls whether to collect Python opcode trace events. You typically want this enabled. Defaults to `True` on Python 3.8+, and `False` otherwise. Ignored if `enable_python_coverage=False`, or if using a version of Python prior to 3.8.

```python
def Fuzz():
```
#### `Fuzz()`

This starts the fuzzer. You must have called `Setup()` before calling this function. This function does not return.

In many cases `Setup()` and `Fuzz()` could be combined into a single function, but they are
separated because you may want the fuzzer to consume the command-line arguments it handles
before passing any remaining arguments to another setup function.

```python
def TraceThisThread(enable_python_opcode_coverage=True):
```

While we don't recommend using threads during fuzzing if you can avoid it,
Atheris does support it.

This function enables the collection of coverage information for the current
thread. Python coverage collection must be enabled in `Setup()` or this has no
effect. (Thread coverage still works if this function is called before
`Setup()`, and `Setup()` is subsequently called with
`enable_python_coverage=True`).

Optional Args:
- `enable_python_opcode_coverage`: boolean. Controls whether to collect Python opcode trace events for this thread. You typically want this enabled. Defaults to `True` ; ignored and unsupported if using a version of Python prior to 3.8.


### FuzzedDataProvider
#### `FuzzedDataProvider`

Often, a `bytes` object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.
Alternatively, you can use [Hypothesis](https://hypothesis.readthedocs.io/) as described below.
Expand Down
123 changes: 109 additions & 14 deletions atheris.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
// Copyright 2020 Google LLC
// Copyright 2021 Fraunhofer FKIE
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand All @@ -14,35 +15,131 @@

#include "atheris.h"

#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>

#include <Python.h>
#include <exception>
#include <iostream>
#include <sstream>
#include <limits>

#include "fuzzed_data_provider.h"
#include "macros.h"
#include "pybind11/functional.h"
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
#include "tracer.h"
#include "util.h"
#include "atheris.h"

namespace atheris {

namespace py = pybind11;

namespace {

std::function<void(py::bytes data)>& test_one_input_global =
*new std::function<void(py::bytes data)>([](py::bytes data) -> void {
std::cerr << "You must call Setup() before Fuzz()." << std::endl;
_exit(-1);
});

std::vector<std::string>& args_global = *new std::vector<std::string>();
unsigned long long num_counters = 0;
bool internal_libfuzzer = true;
bool setup_called = false;

} // namespace

NO_SANITIZE
void _trace_branch(unsigned long long idx) {

}

NO_SANITIZE
void _reserve_counters(unsigned long long num) {
num_counters += num;
}

NO_SANITIZE
py::handle _trace_cmp(py::handle left, py::handle right, int opid, unsigned long long idx, bool left_is_const) {
PyObject* ret = PyObject_RichCompare(left.ptr(), right.ptr(), opid);

if (ret == nullptr) {
throw py::error_already_set();
} else {
return ret;
}
}

NO_SANITIZE
std::vector<std::string> Setup(
const std::vector<std::string>& args,
const std::function<void(py::bytes data)>& test_one_input,
py::kwargs kwargs) {
if (setup_called) {
std::cerr << Colorize(STDERR_FILENO,
"Setup() must not be called more than once.")
<< std::endl;
exit(1);
}
setup_called = true;

args_global = args;
test_one_input_global = test_one_input;

// Strip libFuzzer arguments (single dash).
std::vector<std::string> ret;
for (const std::string& arg : args) {
if (arg.size() > 1 && arg[0] == '-' && arg[1] != '-') {
continue;
}
ret.push_back(arg);
}

if (kwargs.contains("internal_libfuzzer")) {
internal_libfuzzer = kwargs["internal_libfuzzer"].cast<bool>();
}

return ret;
}

NO_SANITIZE
void Fuzz() {
if (!setup_called) {
std::cerr << Colorize(STDERR_FILENO,
"Setup() must be called before Fuzz() can be called.")
<< std::endl;
exit(1);
}

py::module atheris = (py::module) py::module::import("sys").attr("modules")["atheris"];
py::module core;

if (internal_libfuzzer) {
core = py::module::import("atheris.core_with_libfuzzer");
} else {
core = py::module::import("atheris.core_without_libfuzzer");
}

atheris.attr("_trace_cmp") = core.attr("_trace_cmp");
atheris.attr("_reserve_counters") = core.attr("_reserve_counters");
atheris.attr("_trace_branch") = core.attr("_trace_branch");

core.attr("start_fuzzing")(args_global, test_one_input_global, num_counters);
}

#ifndef ATHERIS_MODULE_NAME
#define ATHERIS_MODULE_NAME atheris
#error Need ATHERIS_MODULE_NAME
#endif // ATHERIS_MODULE_NAME

PYBIND11_MODULE(ATHERIS_MODULE_NAME, m) {
Init();

m.def("Setup", &Setup);
m.def("Fuzz", &Fuzz);
m.def("TraceThisThread", [](pybind11::kwargs kwargs){
bool enable_python_opcode_coverage = true;
if (kwargs.contains("enable_python_opcode_coverage")) {
enable_python_opcode_coverage =
kwargs["enable_python_opcode_coverage"].cast<bool>();
}
TraceThisThread(enable_python_opcode_coverage);
});
m.def("_trace_branch", &_trace_branch);
m.def("_reserve_counters", &_reserve_counters);
m.def("_trace_cmp", &_trace_cmp, py::return_value_policy::move);

py::class_<FuzzedDataProvider>(m, "FuzzedDataProvider")
.def(py::init<py::bytes>())
Expand Down Expand Up @@ -73,8 +170,6 @@ PYBIND11_MODULE(ATHERIS_MODULE_NAME, m) {
.def("remaining_bytes", &FuzzedDataProvider::remaining_bytes)
.def("buffer", &FuzzedDataProvider::buffer);
m.attr("ALL_REMAINING") = std::numeric_limits<size_t>::max();

m.def("path", &GetDynamicLocation);
}

} // namespace atheris
13 changes: 9 additions & 4 deletions atheris.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
/*
* Copyright 2020 Google LLC
* Copyright 2021 Fraunhofer FKIE
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -30,16 +31,20 @@
#include "pybind11/stl.h"

namespace atheris {

void Init();
namespace py = pybind11;

std::vector<std::string> Setup(
const std::vector<std::string>& args,
const std::function<void(pybind11::bytes data)>& test_one_input,
pybind11::kwargs kwargs);
const std::function<void(py::bytes data)>& test_one_input,
py::kwargs kwargs);

void Fuzz();

py::handle _trace_cmp (py::handle left, py::handle right, int opid, unsigned long long idx, bool left_is_const);
void _reserve_counters(unsigned long long num);
void _trace_branch(unsigned long long idx);

} // namespace atheris

#endif // THIRD_PARTY_PY_ATHERIS_LIBFUZZER_H_
17 changes: 17 additions & 0 deletions atheris/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2021 Fraunhofer FKIE
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .atheris import Setup, Fuzz, FuzzedDataProvider, _trace_branch, _reserve_counters, _trace_cmp
from .import_hook import instrument
from .utils import path
Loading