Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primus Lisp enhancements #798

Merged
merged 21 commits into from
Mar 19, 2018

Conversation

ivg
Copy link
Member

@ivg ivg commented Mar 16, 2018

This PR provides various enhancements to the Primus Lisp subsystem that aim to make it faster and easier to use.

Documentation generator

This PR provides a very crude and fast coded autodoc generator, that dumps docstrings in the dot format (that I manually translate to html using emacs), example could be found here. This is very preliminary, and only the API index is generated (no module/aka features documentation). Anyway, I believe it is much better than nothing.

Makes the Primus Lisp interface public

Basically, the whole interface is just two functions: eval_fun and eval_method. Now it is possible to run Lisp programs without actually having the IR.

Adds runtime parameters

This is a new feature in Primus Lisp. Now a module can specify parameters using the defparameter form. Parameters are like any other global variables, except that they could be documented, and have the default form. The default form is not evaluated until the parameter is used. It is also is not evaluated if a parameter is set by a user. Probably, the best place is to initialize the parameters is in the init method.

Enhanced malloc Model

Using the new parameters mechanim we enhanced the malloc model with several new features:

The new implementation is also using a more efficient calloc model with the O(1) memory and time complexity (unlike previous O(N)).

More precise control flow observations

A new jumping observation is much more precise and easier to use than the existing enter/leave-jmp as it operates on the lower level and doesn't loose the information about the value origins. This event also made it possible to remove the TCF precondition from the Primus Lisp or Run modules/passes. (We still leave the TCF pass as a dependency though, as although it is not necessary it still makes analysis easier)

Bug fixes

  • A bug fix in the method dispatching procedure, that basically made all except one method useless
  • A potential race condition between primitives initialization and program linking
  • A potential bug due to the lack of reindexing of method and parameters bodies.

Makes Primus Lisp easier to bootstrap

Previously some passes should be run to initialize the Primus Lisp subsystem. That was quite unclear how all machinery interacts and what is the task of each pass. Now this is rectified, as all Primus Lisp plugins perform their effects during the load time, so it is only necessary to do Plugins.run ~provides:["primus"] () to get the Primus subsystem up and running... err not running fortunately, but ready to be run.

Implementation optimization

This PR proposes a more optimal implementation of the Primus local variables stack and made it O(logN) instead of O(N) as well as removed unnecessary stuff from it, to keep it lighter.

For further details please consider reading the commit messages, and thanks for reading and reviewing.

ivg added 18 commits March 13, 2018 15:35
The jumping observation is made just before a jump is taken and has
two parameters: jump condition and jump destination. This is a much
more precise and useful observation than the `{enter,leave}-term`
one, as it doesn't loose the origin of values that define the
jump (i.e., we can track (taint) both the condition and the
destination).

We also added three functions to the Linker interface, mostly for
convenience. These functions resolve code names into addresses,
symbolic names, and tid.
So far the plugin loads some stuff during the link time, some after
the configuration is available, and the rest during the analysis phase
when it is invoked as a pass. It may even be invoked  twice.

Now it is rectified. Everything that doesn't require an access to the
project data structure is loaded automatically just after the
configuration phase has finished. The rest (e.g., linking with the
program) is done during the analysis.
if the parameter is not set, then the allocated memory chunk is not
filled (i.e., filled with random values), if it is set, then we fill
the chunk with the specified value.

The motivation, is to make calloc more efficient.
and applies the reindexer to all runtime definitions.
previously only definitions were printed
To be in line with Common Lisp we've decided to put docstring and
declarations after the value of defparameter and defconstant. This
indeed looks more natural.
1. Provides an efficient stack that now is equipped with a map data
structure that allows us effeciently query for locally bound
variables.

In Primus Lisp variables live in two scope:
 1. the Primus global such as CPU registers, parameters, etc.
 2. the Primus Lisp local scope that is populated by lexically scoped
    local variables such as function parameters and let bound
    variables.

Any Lisp operation on a variable involves checking whether this is a
local variable or global variable. Since we're using lexical scoping
we should be able to distinguish between them in compile time. And
that's what we will eventually do in the future. But so far it
requires to many changes to the interpreter. Previously, we were using
an assoc list, that was traversed every time we read or set a
variable. The new implementation uses a balanced tree that maps
variables to the total number of their occurences on the stack. That
makes the check to be logarithmic in the number of unique variable
names on the stack.

Few other small optimizations:

a) faster function frame costructor, that is not monadic (as it
shouldn't be), and that is not preserving the order of arguments (so
that we can append them in the reversed order, using fold left), and
that counts the total number of arguments, so that we don't call
List.length when we're poping the frame off.

b) a faster frame pushing mechanism - instead of just appending
arguments, we are prepending them in the reverse order. This is not
only faster, but since the frame is in the reverse order already, it
preserves the existing order (so no changes in semantics)

2. Fixes an nasty bug in the signal dispatcher. When a signal is
dispatched to several methods each method ignores effects from other
methods as it resets the state as it was before the dispatcher was
entered.
1. we do not want primitives, that were registered by other components
to be dropped when a new program is linked. In fact, primitives is a
complete different beast than other definitions, as they are defined
by OCaml modules (i.e., plugins) rather than by the Lisp code, so
probably they shouldn't be a part of the program data structure at
all. Though having them in the program makes things much easier.

Besides, when we drop primitives during the program linking time, we
are discarding all primitives that were registered prior to the call
to program_link. So far, just by accident, the link_program function
was called before any other components (since it was registered last -
after all other lisp components). This is so fragile, that I consider
this is a bug. Nasty one.

2. The `init_env` function in the Primus.Lisp module was adding
special variables of the form `%<id>` and `@<id>` that hold addresses
of the correspondingly named program terms. It was using the local
variables stack. Neither should it be using the local stack for
storing this variables (as they are globals by their nature), nor this
should be done during the linking time of the program. This function
is now moved to the loader (that is the component that's responsible
for setting up the environemnt).
parameters are lazy, and are not evaluated unless are used, and not
set before.
This implementation adds the following features, that are useful
during the analysis:

- the upper limit to the maximum size of the allocated memory chunk
- the upper limit to the maximum size of the malloc heap
- malloc guards with optional coloring
- efficient calloc that doesn't
  a) take memory
  b) take time
There are two entities in Primus Lisp that could be evaluated -
functions and methods. Both are now runable directly from OCaml via
the `eval_fun` and `eval_method`.
so far we can generate only index, but still not that bad.
- merges different definitions of the same entity
- removes quotations
- removes duplicating whitespaces
@ivg ivg requested a review from gitoleg March 16, 2018 20:24
@ivg ivg merged commit 8b04e0f into BinaryAnalysisPlatform:master Mar 19, 2018
ivg added a commit to ivg/bap that referenced this pull request Apr 4, 2018
One of the previous commits [1] in PR BinaryAnalysisPlatform#798 was claiming that we do not
need to preserve the order of the arguments, so we can provide a more
efficient frame allocation.

Apparently, we do need to preserve the order, at least of the proper
signaling the `call` and `call-return` messages. This commit reversed
the order of the arguments, since they were reversed.

[1]: BinaryAnalysisPlatform@6d5afcd
ivg added a commit that referenced this pull request Apr 9, 2018
* skips evaluations of the mem variable in the store operation

since this variable doesn't have any meaningfull value

* fixes the cleanup procedure after term evaluation

in case of abnormal termination the cleanup procedure wasn't called.

* ignores return statements

this is a big change actually, and we will return to it later, to make
it more robust.

If we won't ignore them, then we will have double returns, the first
one when a function is returned based on the IR return statement (if
such is present and is well defined) and the second one, when function
exec finishes and we finally call the return destination. That's was a
disaster, and this is the fast solution. I will evolve it later

* adds machine-switch and machine-fork observations

also simplifies swith and fork operators, for some reason we were
holding a machine identifier in the stored continuation, probably
this is a reminiscent of the debugging stage, in any case we dont
need it anymore.

* halts machine after switch ion the greedy scheduler

it shouldn't have any observable effect, though it is better to make
it explicit.

* disables call bypassing

When we make a call we add a failsafe machine that will resume the
computation in case if the call doesn't return. We shouldn't use
this failsafe machine if everything went allright.

* handles empty blocks with branches correctly

* few cosmetic changes

* fixes the parameter order in linked lisp stubs

One of the previous commits [1] in PR #798 was claiming that we do not
need to preserve the order of the arguments, so we can provide a more
efficient frame allocation.

Apparently, we do need to preserve the order, at least of the proper
signaling the `call` and `call-return` messages. This commit reversed
the order of the arguments, since they were reversed.

[1]: 6d5afcd

* fixes strncpy, strncmp and adds strpbrk summaries

* parametrizes memory allocator with the zero-sentinel

that's a value that is returned when malloc is called with the zero
argument.

* implements a buffer overflow check

This check is a part of the memcheck check suite and it verifies that
functions from the string.h API are called correctly. The correctness
property is the following, if the pointer to the begining of a string
that is passed to the function, belongs to some heap region, then the
pointer to the end of this string must belong to the same heap region
@ivg ivg deleted the primus-lisp-enhancements branch September 13, 2018 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants