-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds the symbolic executor #1105
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Generators now are able to produce objects of random width modulo that the random value underneath the hood is still an integer. TODO: drop the width parameter from the byte generator.
We're now using a much more efficient MSG algorithm with the implementation that uses the new Bitvec module, so that we can now generate arbitrary number of random bits and specify any range we like.
it produces lots of false positives due to unwanted aliasing.
It uses malloc to allocate a place for the register and lisp parameters to store the location. We should also consider clearing errno as soon as possible as it is the source of aliasing and lots of UAF and DF false positives.
basically, the idea is that each function that defines a buffer or a string (e.g., malloc, strdup, memcpy, etc) imposes constraints on the buffer that we may track. For example, if we break the bounds of the strdup generated string it could be a soft alarm.
it is just a warning, there is no need to overreact
coerces boolean operations to bitvectors (we will keep booleans only for actual logic formulas, i.e., for meta logic - to answer questions whether a path is feasible and so on).
Now each generator will generate its own sequence. Note, we still need to create a separate generator for each independent data location.
TODO: document it TODO: publish bitvector interfaces
this plugin will control all the random generators... as much as they could be controlled
forgot to commit it
Not only it is possible to enable/disable it, but it is also now possible to provide alternative competing runtime support without modifying the existing code. However, the main motivaiton to move it was the desire to get access to the project data structure that enables a much more robust runtime identification that works even in the absence of symbols.
fixes a type error in strcat fixes an error in strncat
1) the greedy-promiscuous-executor was still mentioned by other systems 2) yet another type error in strings
instead of the parameters
The convention is for each concrete feature `xxx` that needs a symbolic counterpart to provide the `symbolic-xxx` feature. The stdlib is right now empty we will start to fill it in with the stuff after the release.
It is the same as the old `allocate` (which is still available, no worries), but takes two words instead of one word and int, which makes it possible to create regions whose size do not fit into OCaml int, which is quite possible during execution in a randomized mode. And makes the code much eaiser to write as now we don't need to deal with conversion errors.
it will no longer fail on big numbers
so it is possible to map the whole 2^64 memory in one run
gitoleg
approved these changes
May 28, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Finally, BAP can boast with its own symbolic executor! The executor
itself comprise a rather small part of this PR (roughly 25%, or 1kLoC)
the rest of the PR includes necessary changes that enable the symbolic
executor.
Before we delve into details, let's give a high-level picture of the
symbolic executor. It is a plugin (no library interface at this
release) which when enabled will track all inputs and compute their
symbolic values. Inputs are either automatically identified or
explicitly set using Lisp primitives,
symbolic-value
andsymbolic-memory
. Two more primitives,symbolic-assert
andsymbolic-assume
could be used to specify verification conditions andfurther refine symbolic models of various subsystems (aka symbolic
stubs). In addition to the symbolic computer we have the symbolic
executor itself that computes inputs trying to reach all parts of the
program. The algorithm is rather simple, we follow the concrete
execution path, and on each condition evaluation we enqueue a task
that refutes it. Each location is tried up to the certain number of
times, set with
--primus-symbolic-executor-cutoff
command-lineargument.
To get the idea of the symbolic executor capabilities you can look at
our current testsuite (see testsuite/src/symexec-*.c files), for
example this test is the hardened version of angr's fauxware (the
original version was too easy to break). In this test (as well as in
all others we can reach all parts of the program and compute inputs
that will lead to them, including the SOSNEAKY password).
Since the symbolic executor is driven by the concrete executor we rely
on the state randomization and Primus generators a lot. We implemented
a new primus-random plugin with a sophisticated interface that allows
specifying generators for registers, memory regions, and function
values. We also rewrote our random generator completely from scratch
to prevent unecessary aliasing. Another important addition is that now
we can generate words of abitrary sizes (previously we could generate
only bytes).
One of the significant limitation (if not to say a bug) of the Primus
Lisp primitives was that many operations were opaque and not
observable via the Primus interpreter observation interface, therefore
the symbolic computer wasn't able to track values to their origins. We
rewrote signficant number of primitives and expressed them in terms of
the Primus interpreter. We also added
ite
,branch
, andrepeat
operations to the Primus interpreter so that the primitives author can
express directly structured control flow. As the result, all Primus
Lisp control operators are correctly translated into path constraints,
so Primus Lisp could be now seen as an advanced interface to SMT with
theory that includes branches and loops, e.g.,
Besides, there is now the
eval-lisp
command so you can run the abovecode with
Notice that the function was evaluated twice, since it has two paths.
Since the Primus Lisp interpreter and Primus Lisp primitives are now
using the main Primus interpreter for all operations and our Primus
Lisp type system is less strict than the Bil type system we decided to
relax the interpreter and injected a few coercions here and there. So
now it is ok to add two words with different sizes, the interpreter
will just coerce the to the size of the bigger word, and so one with
other operations.
While evaluation the new symbolic executor on various tests we added
many new stubs that handle various posix operations. We also
implemented the symbolic version of stdio, so now input files and
streams are treated symbolically. To enable reuse of the concrete
versions of function stubs we added an ability to overload functions
based on the system and used components. (We always had the
overloading facility, which was used to provide different
implementations for different ABI and architectures, now we added an
ability to overload based on the components that comprise the system
or system itself), for example adding
to a definition will make it applicable only in the context of a system that
has this component.
The symbolic executor uses z3 underneath the hood and is limited to
QF_FD logic. One of the drawbacks is that z3 requires custom runtime,
so baptop is currently broken. But we already fixed this issue
upstream, see Z3Prover/z3#4468, and backported
this patch via opam-repository to 4.8.8 so that we can use the latest
released version without having to wait for 4.8.9 (see
ocaml/opam-repository#16535).
Below is the summary of changes. If you have any questions about them
feel free to ask for clarifications. The commit messages are also
usually quite informative. As always reviews and comments are very
welcome.
List of changes:
fini
operation to System.runite
,branch
, andrepeat
Primus operationssymbol-of-string
primitiveeval-lisp
bap command