-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
brings knowledge and systems to Primus #1075
Merged
Merged
+2,572
−496
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
deprecates the old add_component in favor of the new Components module, starts updating the registration functions. Though tests should work now.
except the observations, as I will touch them in the other branch.
there is no other way to preserve the user interface.
those signals are much better for the cleanup/teardown tasks as they are executed in the restricted mode
we are getting more interested in terms that we didn't visit rather than in terms we have visted.
we have three xfails, which we will rectify later.
So that if a term is already visited we won't fork on it and won't loose statistics.
Run will now report progress after each new subroutine and, if run in the isolation mode, will not enter the already visited entry points
it was called in a context of a different machine
the restriction was checked using the local state, not the local, therefore only a fork was restricted.
the slowdown is only 30%
apparently it boosts performance both on vanilla and flambda compilers by about 25-30%
and use the existing runners.
the idea is that static analysis before run can mark certain entry points as interesting and they then will be automatically chosen by the run plugin for further analysis if `marked-subroutines` is used as the entry point.
rectifies names and descriptions
so that we can render the progress bar correctly.
via the primus-observations command :) They are also now packaged.
as well as rewires existing analyses to more applicable observations.
Both run and taint tests are now run twice, one for the bap:legacy-main system and once for the system that corresponds to that test. For run it is `bap:stubbed-executor` and for taint it is `bap:reflective-taint-analyzer`.
Probably, we do not have enough symbolic information. But in any case, we will investigate it later. (The original idea was to add the --read-symbols-from option, but we don't have symbols for windows binaries in the testsuite)
All finalization procedures should be invoked in the restricted mode (i.e., use either machine-kill or system-stop) observations, as otherwise they risk to have some unpleasant interactions with other components, e.g., with bap:limit that will kill the machine when it does any new operation (which increases the machine clock). In this case it was an infinite loop, as the taint finalization was attached to the halting observation and was reflected to Primus Signals which had methods that were doing quite a few operations, which, in turn, woke up the limited that was using the halt operation to halt them, and here we go again.
gitoleg
approved these changes
Mar 9, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
This is a big update of the Primus Framework that brings some of the feature that we were planning for Primus 2.0 but didn't manage to do it in the right time. First of all it brings the Knowledge monad to Primus, so that we can now access the Knowledge Base directly from the computation. The KB is now used as the communication media between different Primus analyses. Next, it improves the flexibility of the Primus framework (fixing #910) as now it is possible to build your own Primus machines and run them independently in your own analysis or using the run module, which is also significantly extended. This PR also rectifies a few issues with the order of component evaluation and non-determinism happening in unexpected places. It also improves the user experience by improving the documentation and provides bap commands that will list registered components, observations, and so on. Last but not least, it significantly improves the performance of Primus analysis. As a teaser, when I started working on this branch it took more that 3 minutes to run the UAF analysis on
/bin/true
on mine (2GHz) machine. Now it takes 20 seconds!Below there are lot of details, but before delving into them, take a look at our revamped Primus documentation. It now has a nice and fully rewritten from scratch intro page with ToC (fixes #1030).
Motivations
Flexibility
Before this patch we had only one Primus Machine ([Primus.Machine.Main]) that was composed of all ever registered components. Using command line parameters we could enable/disable some of them, but we still had only one main machine per
bap
invocation. We couldn't add or delete components and couldn't really define what we want to run programmatically. For example, we couldn't use Primus for analyzing branch destinations during disassembly. Thus the whole framework was basically destined to be used byrun
plugin and only it.Performance
The big Primus Main Machine was not only inflexible, but it was also rather slow. Indeed, it was composed of more than 50 components and not all of them are always necessary. By designating a specific set of components that we need to run our analysis we can speed up things a lot.
Knowledge
Since the BAP 2.0 release we switched to the KB as our main storage of program properties, instead of the [Project.t] data structure however, we weren't able to use it inside of the Primus computation. Moreover, it was quite inconvenient to return any results from the Primus Machine after it was run as we had to use the old Project data structure, which imposes quite hard restrictions on what information could be put into it (e.g., it should be binable).
Robustness
One of the recurring problems that we had with Primus is the dependency on the order of components initialization. This problem was especially hard, when components were using non-determinism or exceptions, or posted their own observations. For example, schedulers usually attach their behavior to the
fini
observation as well as many other analysis that would like to summarize the results of analysis when a computation finishes, e.g., Beagle. However, depending on the order of initialization of the components, the scheduler could be run before the analysis finalization procedure and since it will switch the machines and terminate the computation the finalization procedure won't be ever called. Or, an even harder to catch situation, is when the finalization procedure is touching an interpreter and the limiter component is killing it because we ran out of gas. In that case, the summary wouldn't be computed and, even worse, the scheduler won't be run, if the finalization procedure were called before the scheduler. Such issues were a nightmare to track and didn't have a clear solution.Solution
Primus Systems
Following the established tradition of borrowing ideas from the Common Lisp world, we introduce the new notion to the Primus Framework nomenclature. A system is a composition of components and other systems. A system is basically a Primus Observation that we can, multiple times if necessary and with different parameters. A system could be defined either programmatically, using the Primus.System.define or in an
.asd
file that has the same syntax (as subset of it) as the Common Lisp ASDF system definition files. For a concrete example, see core.asd in which we define our core systems (we will later refine those systems and add more components, consider this as the base point).To enable backward compatibility with the existing interface, we introduce the
bap:legacy-main
system that is composed of all components that are registered with the (now deprecated)Machine.add_component
function. That brings us a seamless integration of the old main machine into the new abstractions, it is just a system now.The
run
plugin now takes the--run-system(s)
option that takes a list of systems to run. It defaults to run thebap:legacy-main
system, so there is no need to change any command line parameters (or code) after this is merged. But if we would like to just run a binary without any promiscuity, e.g., for our tests, we could use a corresponding system that only runs binaries, e.g.,bap ./echo --run --run-system=bap:stubbed-executor
, or if we need to run a UAF analysis we can dobap ./echo --run --run-system=bap:greedy-promiscuous-executor --other-options
. And this brings us to the next question: how to dig in all those names? Including the component names and system names?As an answer to this question we now have three new bap commands:
primus-systems
,primus-components
, andprimus-observations
. They output an itemized list of corresponding Primus registry entities. The output of these commands is provided at the end of this post.Primus Analysis
The Primus monad is a transformer, while the Knowledge monad is not. Therefore we can't just mix-in the Knowledge monad to the implementation of the Primus monad without sacrificing the transformer part or turning the Knowledge itself into a transformer. The latter is not acceptable, because we don't want to generalize our knowledge base over the infinite set of all possible monads, the former will break the interface but in general is acceptable, as making Primus a transformer was, let's say it straight, was a premature generalization. This was the road that we choose when we started our work in Primus 2.0 and this was a dead end, mostly because we have about 70 components and a dozen of libraries depending on the current version of Primus and rewriting them all (even in the direction of a simpler interface) was something that couldn't afford at that time (and now too). The big idea was to turn the limitations of the Primus Transformer to our advantage. Since it is a transformer we can apply it to the Knowledge monad and bless it is our new main specialization, so let's welcome a new module called
Primus.Analysis
, which is just thePrimus.Machine.Make(Knowledge)
with a couple of the Knowledge operations lifted for our convenience. So we can now use the Knowledge empowered computation in our components (which in addition to be more powerful is not required to be a functor, what a syntactic convenience!) and keep all existing components working and backward-compatible. But how we can register a new component that is specialized to the Knowledge monad, given that so far our components were first-class functors?The answer is simple. Our new repository of components (meet a new module Primus.Components) has two registration functions:
register_generic
that takes the good old generic first-class functor, aka(module Primus.Machine.Component)
andregister
that just takes aunit Analysis.t
computation. We now call the specialized components analyses and the old components generics.Generics and analyses are stored in different namespaces, so that you can have in the registry two implementations of the same component, one specialized and one general. When a system is run via the Primus.System.run function the specialized components take precedence over the generalized components.
It is still possible to run an instantiation of the Primus Monad over any other monad, via the
Primus.System.Generic
functor that creates therun
function for the provided monad. In particular this is how we implement the oldPrimus.Main.run
which now just runs thebap:legacy-main
system using the Generic System interface.So what about robustness? We address this issue by clearly specifying the timeline of the system life that prescribes what should be done at which stage and by introducing a new restricted mode. In the restricted mode observations and non-determinism are disabled. In other words, Primus operates as it was a simple deterministic state monad. A system boots in the restricted mode, which prevents havoc during the system initialization when components could create forks or trigger each other in asynchronous manner via the observations. After the system boots we enable non-determinism and observations and Primus operates in its normal mode. Finally, when machine terminates the
machine-kill
observation is posted for each killed machine. Observers of this observation are called in the restricted mode and they could easily access the interpreter or any other state of the killed machine without fear of being interrupted by any other components. This is an ideal place to calmly summarize the analysis. Finally, when the whole system finishes, themachine-stop
observation is made and we again enter the restricted mode. At this time we can summarize the run of the whole system, and this is that long-asked the final finalization observation.To summarize we now have two more events at the machine startup: [init] and [start], which split the machine timeline into three parts, before-init, the booting stage, after [init], and finally the [start] (we specifically introduced the new [start] observation to distinguish between system booting, initialization, and actually running). And we now have two more observation that designate the end of computation
machine-killed
, which is basically a destructor for each machine, andsystem-stop
which is the global destructor. This was a short and informal description, but refer to the Primus.System for the details.Since now we can run many Primus Machines it is natural to have some sort of the queue, the same as we have with passes, but for Primus Analyses. So meet a yet another new abstraction called Primus.Job which is a system that is yet to be run, and Primus.Jobs which is a global queue of jobs, that could be run either directly, using
Primus.Jobs.run
or by therun
plugin, that will run all the jobs from the queue. Besides, unlike the pass pipeline the Job queue is more like a worklist, as a new job could be added while other jobs are running, and the run function will continue until the queue is exhausted (or until explicitly stopped). Each job could be parameterized via theinit
andstart
functions (so it can have its own constructor and entry point). The overall idea, is that we can now write static analyses that will identify a points of interest in the program and register Primus jobs that will later analyze it. As a specialization of this idea, the static analysis can just tag a subroutine with themark
attribute and therun
plugin will pick it automatically (see the newall-marked
designator of the--run-entry-points
command-line parameter. Using this approach static analyses don't even need to depend on Primus in order to run Primus. Moreover, the existing analysis can now trigger Primus without modification (via themap-terms
facility).There are probably some other new features that I've forgot to mention (like the revamped testsuite), but the features above are the most important.
All registered entities of the Primus Framework
Below is the output of the
bap primus-systems
,bap primus-components
, andbap primus-observations
commands for the reference. We plan to add this output to the set of automatically generated documentation, but so far, either use this commands or consult this document.Systems
bap:legacy-main:
The legacy Primus system that contains all components registered with the
Machine.add_component function.
bap:string-deobfuscator:
Uses greedy-promiscuous-executor to find obfuscated strings.
bap:taint-analyzer:
Uses greedy-promiscuous-executor for taint analysis.
Propagates taint by computation.
bap:binary-executor:
Executes a binary program.
bap:exact-taint-analyzer:
Uses greedy-promiscuous-executor for taint analysis.
Propagates taint exactly.
bap:constant-tracker:
Uses greedy-promiscuous-executor for constant tracking.
bap:terminating-stubbed-executor:
Executes a binary together with the Lisp Machine that
is guaranteed to terminate.
bap:reflective-taint-analyzer:
A taint analyzer that reflects taints to/from BIR terms
bap:multi-analyzer:
Runs several analyses in parallel.
bap:greedy-promiscuous-executor:
Executes all linearly independent paths and never fails.
bap:stubbed-executor:
Executes a binary together with the Lisp Machine
bap:base-taint-analyzer:
Uses greedy-promiscuous-executor for taint analysis.
No policy is specified
bap:base-lisp-machine:
Executes Primus Lisp program.
Components
bap:round-robin-scheduler:
[generic]
Enables the round-robin scheduler (experimental).
bap:load-binary:
[generic]
Links the binary program into the Primus machine. All symbols of the binary
program are linked weakly, i.e., if a symbol is already linked, then this
component will not override it.
bap:mark-visited:
[generic]
Marks visited (by Primus) program terms with the [Term.visited] attribute
and unvisited with the [Term.dead] attribute. Note, that the attributes are
attached only when the system exits
bap:propagate-taint-exact:
[generic]
Enables the exact taint propagation policy. In this policy, a taint is only
propagated between values when one value is derived from another using some
transparent (not changing the value computation), e.g., store, load, cast,
concat, extract.
bap:load-lisp-library:
[generic]
Loads the Primus Library. Links all functions defined as external into the
Primus Machine. Symbols are assumed to be strong, i.e., if the symbol is
already linked, then it will be overriden by the corresponding Lisp
implemenetation
bap:constant-tracker:
[generic]
Tracks constants, using the following policy. A value is a static constant
if it was initialized from a constant value or computed from static
constant values.
bap:lisp-ieee754:
[generic]
Provides primitives for IEE754 floating-point arithemtic.
bap:greedy-scheduler: (b,None). In that case (b,None) is
[generic]
Enables the greedy scheduler. The greedy scheduling strategy will continue
with the same state, unless the machine reaches a termination state, i.e.,
when the
another alive state that has a
chosen. If such state doesn't exist, then the Machine finally terminates.
Thus this strategy will perform a depth-first traversal of the state tree,
and guarantees that all paths are explored. The greedy scheduler will
attempt to reschedule every time a basic block is evaluated.
bap:lisp-core:
[generic]
Initializes Primus Lisp core. Forwards Lisp message to the BAP log
subsystem and enables propagation of observations to signals.
bap:promiscuous-mode:
[generic]
Enables the promiscuous mode. Requires the Trivial Condition Form. When
this mode is enabled the Primus Machine will venture into paths with
unsatisfied constraints. Basically, it means that on every branch the state
is duplicated.
bap:lisp-basic-io:
[generic]
Provides basic IO primitives to Primus Lisp.
bap:monitor:
[generic]
Monitors the specified set of observations and prints a trace of recorded
observations when a machine finishes.
bap:taint-marker:
[generic]
Translates tainted values of the Taint engine to tainted-regs and
tainted-ptrs term attributes of the IR.
bap:powerpc-init:
[generic]
Initializes PPC registers.
bap:lisp-incidents:
[generic]
Exposes the incident reporting facitilites to Primus Lisp.
bap:lisp-type-checker:
[generic]
Typechecks program and outputs the summary in the standard output.
bap:lisp-type-error-printer:
[generic]
Prints Primus Lisp type errors into the standard output.
bap:taint-mapper:
[generic]
Maintains a mapping between tainted objects and term identifiers.
bap:program-loader:
[generic]
Initializes the runtime environment. This is a generic loader that should
work with most of the ABIs. It loads memory segments, including virtual,
sets up [brk], [end], [edata], [etext], and [environ] variables (see any
unix man page for the description of these symbols). Note that ([edata] and
[etext] are not guaranteed, while [end] and [brk] are). The loader also
initializes all CPU registers to zero and setups and alloocates the main
stack frame (copies arguments from the ctxt to the stack), and points the
stack-pointer register to the command line arguments. (The environment
variables follows the arguments).
bap:taint-primitives:
[generic]
Exposes Primus Lisp primitives that controls the Primus Taint Engine.
bap:taint-intro:
[generic]
Introduces a taint when a term marked with tainted-reg or tainted-ptr is
entered.
bap:taint-signals:
[generic]
Reflects Primus Taint observations to Lisp signals.
bap:conservative-garbage-collector:
[generic]
Enables taint garbage collector. The component scans the memory and
registers and deletes taints that are no longer reachable.
bap:wondering-scheduler:
[generic]
Enables the random wonderning scheduler (experimental). The scheduler will
pick between the states randomly.
bap:propagate-taint-by-computation:
[generic]
Enables the propagate-by-computation taint propagation policy. In this
policy, a value is tainted if it was computed using a tainted value.
bap:x86-setup-plt:
[generic]
Sets up PLT entries.
bap:constant-tracker-primitives:
[generic]
Exposes the constant tracker primitives to Primus Lisp.
bap:beagle-hunter:
[generic]
Monitors execution and detects data that looks like words from the provided
dictionary.
bap:exploring-scheduler:
[generic]
Enables the exploring scheduler (experimental). The exploring scheduler
will prioritize clones that will wonder into not yet explored or less
explored areas. More specifically, from a set of machine clones, it will
choose those, that will proceed to a basic block that was visited the least
amount of times. The round-robin scheduler will switch the context after
each basic block. It will count the number of time the block was evaluated
bap:lisp-dictionary:
[generic]
Provides a key-value storage for Primus Lisp programs. Dictionaries are
represented with symbols and it is a responsibility of user to prevent name
clashing between different dictionaries.
bap:lisp-primitives:
[generic]
Provides the core set of Primus Lisp primitives.
bap:lisp-regions:
[generic]
Provides a set of operations to store and manipulate interval trees. The
module provides a persistent storage for intervals, denoted in the module
as regions, since these intervals often represent memory regions. Intervals
are stored in interval sets, that are implemented as efficient interval
tree data structures (using AVL trees underneath the hood). Each interval
set is denoted with a symbol, and it is possible to create arbitrary number
of sets, as well as move regions from one set to another.
bap:x86-flag-initializer:
[generic]
Intializes x86 flags to zero.
bap:incident-location-recorder:
[generic]
Records tracepoints for incident reporting.
bap:limit:
[generic]
Enables program termination by limiting the maximum life time of each
Primus Machine.
Observations
leave-phi:
Occurs after the given term is evaluated.
machine-kill:
Occurs after the given machine is killed.
Observations are in the restricted mode.
leave-pos:
Occurs after the given position is evaluated.
concat:
Occurs on each concat operation.
stored:
Occurs after the value is stored at the address.
beagle-prey:
Occurs when the sequence of characters is detected.
pc-changed:
Occurs after the program counter is changed.
enter-jmp:
Occurs before the given term is evaluated.
call:
Occurs when the call is made.
enter-def:
Occurs before the given term is evaluated.
lisp-call:
Occurs when the call to Lisp code is made.
pagefault:
Occurs on pagefault.
leave-jmp:
Occurs after the given term is evaluated.
enter-sub:
Occurs before the given term is evaluated.
call-return:
Occurs when the call returns.
enter-pos:
Occurs before the given position is evaluated.
extract:
Occurs on each extract operation.
clock:
Occurs on each clock (operation) of the interpreter.
binop:
Occurs on each binary operation.
loaded:
Occurs after the given address is read.
leave-def:
Occurs after the given term is evaluated.
lisp-primitive:
Occurs when the Lisp primitive is invoked.
lisp-message:
Occurs with X when (msg x) is evaluated.
enter-phi:
Occurs before the given term is evaluated.
halting:
Occurs before the halt operation is posted.
system-stop:
Occurs after the system is stopped. Observations
are in the restricted mode
fini:
Occurs when machine finishes.
leave-top:
Occurs after the given term is evaluated.
leave-arg:
Occurs after the given term is evaluated.
cast:
Occurs on each cast.
interrupt:
Occurs before an interrupt.
undefined:
Occurs when an undefined value is created.
taint-finalize:
Occurs when the taint becomes unreachable.
jumping:
Occurs before the jump to the given destination.
writing:
Occurs before the value is assigned to the variable.
lisp-call-return:
OCcurs when the call to Lisp code returns.
system-start:
Occurs after the system start.
machine-switch:
Occurs when machines are switched.
leave-sub:
Occurs after the given term is evaluated.
storing:
Occurs before the value is stored at the address.
memory-switch:
Occurs when the memory bank is switched.
exception:
Occurs just before the Machine exception is raised.
cfi-violation:
Occurs before the control flow violation is commited.
enter-top:
Occurs before the given term is evaluated.
beagle:
Occurs when the sequence of characters is classified as text.
linker-exec:
Occurs before code with the given name is executed.
segfault:
Occurs just before the segmentation fault.
ite:
Occurs on each ite expression.
init:
Occurs when all components of the machine are initialized.
read:
Occurs after the given variable is evaluated to the value.
enter-exp:
Occurs before the given exp is evaluated.
lisp-type-error:
Occurs when the Lisp type error is detected by the type checker
division-by-zero:
Occurs before the division by zero happens.
machine-fork:
Occurs after a machine fork is created.
enter-blk:
Occurs before the given term is evaluated.
unop:
Occurs on each unary operation.
taint-attached:
Occurs when a taint is attached to the value.
enter-arg:
Occurs before the given term is evaluated.
const:
Occurs on each constant.
reading:
Occurs before the given variable is read.
eval-cond:
Occurs when the jump condition is evaluated.
leave-exp:
Occurs after the given exp is evaluated.
leave-term:
Occurs after the given term is evaluated.
written:
Occurs after the value is assigned to the variable.
loading:
Occurs just before the given address is dereferenced.
enter-term:
Occurs before the given term.
incident-location:
Occurs when the location of a possible incident is created.
incident:
Occurs when the incident is reported that involves the given locations.
linker-unresolved:
Occurs when a linker hits the unresolved name.
leave-blk:
Occurs after the given term is evaluated.