-
Notifications
You must be signed in to change notification settings - Fork 42
Under Constrained Symbolic Execution IRAD
-
Under-constrained Symbolic Execution IR&D
- Prior Work: "Under-Constrained Symbolic Execution: Correctness Checking for Real Code"
- Design Notes
The purpose of this IR&D project is to extend Crucible with the capability to perform under-constrained symbolic execution. Namely, to generate inputs for, and symbolically execute, arbitrary functions inside a program without starting from the program entrypoint, and without requiring significant specification effort from a user.
This page consists of Langston Barrett's notes on this project.
An extension of KLEE (called UC-KLEE) that analyzes LLVM bitcode.
Software bugs are a well-known source of security vulnerabilities. One technique for finding bugs, symbolic execution, considers all possible inputs to a program but suffers from scalability limitations. This paper uses a variant, under-constrained symbolic execution, that improves scalability by directly checking individual functions, rather than whole programs. We present UC-KLEE, a novel, scalable framework for checking C/C++ systems code, along with two use cases. First, we use UC-KLEE to check whether patches introduce crashes. We check over 800 patches from BIND and OpenSSL and find 12 bugs, including two OpenSSL denial-of-service vulnerabilities. We also verify (with caveats) that 115 patches do not introduce crashes. Second, we use UC-KLEE as a generalized checking framework and implement checkers to find memory leaks, uninitialized data, and unsafe user input. We evaluate the checkers on over 20,000 functions from BIND, OpenSSL, and the Linux kernel, find 67 bugs, and verify that hundreds of functions are leak free and that thousands of functions do not access uninitialized data.
Symbolc execution suffers from path explosion, and so is often unable to reach code deep in the control-flow graph.
The key difference with classical symbolic execution seems to be that underconstrained symbolic execution operations on one function at a time, rather than whole programs.
without requiring a manual specification or even a single testcase
UC-KLEE works on LLVM bitcode.
Individual functions have preconditions that the symbolic execution tool can't know, which means the inputs that it generates are under-constrained. This means there may be false positives, where the tool generates an input that doesn't conform to the function's preconditions.
Users may manually annotate code (annotations are written in C).
All pointers begin in an unbound state. When an unbound pointer is compared to null, UC-KLEE will fork. When an unbound pointer is loaded from or stored to, UC-KLEE lazily allocates a block of memory.
To prevent infinite execution, UC-KLEE takes a command-line flag with a static depth bound on all generated datastructures.
UC-KLEE will never construct inputs with aliasing or cyclical datastructures.
UC-KLEE can check that patches don't introduce bugs. It works via crash equivalence, i.e. it will only report errors that introduce new crashes relative to the old code.
UC-KLEE uses a static analysis to find paths in the new code that cannot execute differing sequences of instructions from the old code, and soundly prunes them from consideration for performance.
UC-KLEE uniques errors by PC and error type.
In the build system, UC-KLEE overrides the following C macros:
__LINE__
VERSION
SRCID
DATE
OPENSSL_VERSION_NUMBER
If there is a control-flow path such that it's infeasible to avoid an error after you've followed that path, that's a must-fail bug.
TODO I need a deeper understanding of this.
With some caveats, UC-KLEE can verify that a patch doesn't introduce any new bugs (up to the specified depth) by exhaustively checking every path.
The authors claim that there are three main classes of missing preconditions:
- Data structure invariants
- State machine invariants (???)
- API invariants
They provide an example of a buffer and it's length bundled in a struct
where
UC-KLEE has no awareness that the integer represents the length of the buffer.
-
Annotations
These false positives may be partially alleviated with annotations written in C.
For one of the programs, the authors wrote 400 lines of annotations. Seems like a lot, especially considering the number/rate of false positives.
TODO catalog these
-
Heuristics
- Must-fail: See above
- Belief-fail: A bug is more likely to be "true" if it involves a function "contradicting itself", i.e. dereferencing a pointer where a locally-added path constraint says the pointer is null. A form of Belief Analysis.
- Concrete-fail: assertion failure or memory error occurred with a concrete condition or pointer, respectively.
only 8.6% of the belief-fail errors for BIND and 20% of those for OpenSSL were true bugs.
So belief-fail doesn't sound super effective…
UC-KLEE provides a C++ interface for generic checkers.
When UC-KLEE encounters a call to a function that doesn't have a definition in the bitcode module, it skips it. This under-approximates the postcondition of the function (which may write to memory), so may miss bugs. LB: I think it also may cause false positives? Because you might have a concrete error later that would have been prevented if that function had written to its arguments?
The authors report that over-approximation (havocing every global and piece of memory reachable in the points-to graph from the function's arguments) caused significant performance problems.
UC-KLEE also just skips inline assembly and symbolic function pointers.
The authors say
in practice, loads of uninitialized data are often intentional; they frequently arise within calls to
memcpy
or when code manipulates bit fields within a Cstruct
.
While data generated by UC-KLEE must be considered under-constrained, data received from the network, or read from files or environment variables can be considered fully-constrained, because it really could be anything.
UC-KLEE tracks fully-constrained data via shadow memory, reporting its use in bugs.
UC-KLEE conservatively assumes that the appearance of comparisons between fully-constrained and under-constrained data in path conditions represents sanitization of the fully-constrained data.
The leak checker had a false positive rate of 90%. Seems not much better than a compiler warning or linter at that point…
The authors claim that the combination of bugs found with the ability to verify significant numbers of functions represents a significant step forward.
Chou observed that data swapped from network byte order to host byte order is generally untrusted.
When generating input data,
If the size is too small, later accesses to this object may trigger out-of-bounds memory errors. On the other hand, a size that is too large can hide legitimate errors.
-
Strategy 1: Backtracking
One strategy was to pick an initial size based on type information where available, and backtrack whenever an out-of-bounds access happened.
-
Strategy 2: Symbolic Sizes
Another strategy is to use symbolic sizes for each object.
Unlike traditional symbolic execution, underconstrained symex can't report whole-program inputs (basically, bytes), but instead generates pointer-rich datastructures. The authors have a clever way to report errors.
The authors use what Crucible calls "overrides", i.e. specialized models, for
functions like strlen
.
Instead of eagerly querying SMT solvers at branches to see if one of them is infeasible, UC-KLEE sometimes replaces the branch condition with a fresh symbolic variable and a lazy constraint that it is equal to the branch condition. In the case that no errors are reported on one of the branches, no query is necessary. If an error is reported, the lazy constraint is added before a query is omitted.
Emulation of object-oriented code using structs and function pointers proved problematic. There were some manual annotations that alleviated some of this.
A minimal initial prototype would consist of a command-line program that takes as input:
- an LLVM bitcode module,
- the (mangled) name of a function to execute, and
- an input depth and/or size bound,
and produces as output either an "all clear" message, or some representation of inputs to that function that trigger some kind of undesirable behavior.
An great final product would also include (in roughly decreasing priority order):
- The ability to generate source-language-level programs that trigger the bug
- Advanced heuristics for detecting false positives
- Cross-language (C, C++, Java, Rust) support
- Performance optimizations?
Rather than pursue lazy initialization (a la UC-KLEE), the prototype will instead use an iterative, type- and error-driven approach. Instead of allocating memory dynamically as it's used by the program, it will instead generate program inputs from the program's type signature.
Specifically, the first input will be the smallest possible input of the correct type. The tool will then execute the function on this input, and iteratively expand it:
-
If the symbolic execution succeeded with no safety condition violations, it will increase the input depth (up to the user-specified bound) and execute it again.
-
If some safety conditions were (potentially) violated, the tool will examine the error, and either:
- report it to the user (and optionally continue executing to find other problems), or
- use heuristics to decide that the error was likely a false positive (i.e., due to a
unsatisfied precondition, such as an uninitialized global variable). The
tool may then use further heuristics to generate further inputs or a program
state/memory layout that satisfies the precondition. These may include:
- Allocating memory to back previously unmapped pointers
- Expanding the sizes of allocations
- Initializing global variables
How complex of preconditions would we like to be able to deduce? Generally, as we invest in deducing preconditions we may end up with fewer false positives, at the cost of higher complexity.
Generally, any of the "relational" preconditions below would be considerably more difficult to deduce in the general case, and somewhat more difficult to generate arguments for.
Note that pointers in crucible-llvm
are represented by a (natural, bitvector)
pair where the natural is the allocation ID, and the bitvector is its size.
-
Nullness
This is possibly the simplest precondition, stating that a pointer should be non-null. For example, we should deduce that the following function has a precondition that
x
is not null.int deref(int *x) { return *x; }
This precondition also expresses depth constraints. If you have a linked list
typedef struct node { int val; struct node *next; } node_t;
You could express that it has length greater than three by saying that
l->next, =l->next->next
, andl->next->next->next
are all non-null.You could deduce that a function (probably) has this sort of precondition if
- execution failed with a null pointer dereference, and
- the generated arguments contained a null pointer.
-
Minimal Allocation Size
If a function indexes a constant amount into an array, it likely has a precondition that said array has at least that much space.
-
Constant Minimal Allocation Size
In this example, we could likely deduce that the allocation pointed to by
x
needs to have a size larger than2
.int get_const(int *x) { return x[2]; }
You could deduce that a function (probably) has this sort of precondition if
- execution failed with an out-of-bounds read or write,
- the offset of the read or write is constant, and
- the allocation being pointed to was allocated during argument generation.
-
Relational Minimal Allocation Size
In some similar cases, we may be able to deduce that the allocation pointed to by
x
needs to have a size larger thani
:int get(int *x, size_t i) { return x[i]; }
If the index is involved in some kind of calculation, it would be more difficult, if not impossible. This may limit the utility of this kind of heuristic.
int get(int *x, size_t i) { return x[i*2]; }
You could deduce that a function (probably) has this sort of precondition if
- execution failed with an out-of-bounds read or write,
- the allocation being pointed to was allocated during argument generation, and
- the offset of the read or write was a bitvector that appears inside one of the function's arguments.
-
-
Bound by Allocation Size
If a function indexes an array with an argument, it likely expects that the argument is within bounds of that array.
-
Bound by Constant Allocation Size
In this case, we could probably deduce fairly easily that
i
needs to be less than5
.int glob[5]; int get(int i) { return glob[i]; }
You could deduce that a function (probably) has this sort of precondition if:
- execution failed with an out-of-bounds read or write,
- the offset of the read or write was a bitvector that appears inside one of the function's arguments,
- the size of the allocation was concrete, and
- the allocation was not allocated as part of argument generation.
-
Relational Bound by Allocation Size
This is the converse case to Relational Minimal Allocation Size.
-
-
Required Write
-
To Arguments
This function requires not only that it's argument has the right size (as in Constant Minimal Allocation Size), but also that there was a previous write to
x[2]
.int get_const(int *x) { return x[2]; }
For contrast, the following function does not require such a write:
void write_const(int *x) { x[2] = 0; }
You could deduce that a function (probably) has this sort of precondition if:
- execution failed with an uninitialized read, and
- the allocation being pointed to was allocated during argument generation.
-
To Globals
This case is analogous to the above, but for global variables. In this case, the required write may just be initialization (Crucible doesn't always automatically allocate and initialize all global variables).
-