BASIL IR Interpreter

The interpreter is designed for testing, debugging, and validation of static analyses and code transforms. This page describes first how it can be used for this purpose, and secondly its design.

Basic Usage

The interpreter can be invoked from the command line, via the interpret flag, by default this prints a trace and checks that the interpreter exited on a non-error stop state.

./mill run -i src/test/correct/indirect_call/gcc/indirect_call.adt -r src/test/correct/indirect_call/gcc/indirect_call.relf --interpret
[INFO]   Interpreter Trace:
         StoreVar(#5,Local,0xfd0:bv64)
         StoreMem(mem,HashMap(0xfd0:bv64 -> 0xf0:bv8, 0xfd6:bv64 -> 0x0:bv8, 0xfd2:bv64 -> 0x0:bv8, 0xfd3:bv64 -> 0x0:bv8, 0xfd4:bv64 -> 0x0:bv8, 0xfd7:bv64 -> 0x0:bv8, 0xfd5:bv64 -> 0x0:bv8, 0xfd1
...
[INFO]   Interpreter stopped normally.

The --verbose flag can also be used, which may print interpreter trace events as they are executed, but not this may not correspond to the actual execution trace, and contain additional events not corresponding to the program. E.g. this shows the memory intialisation events that precede the program execution. This is mainly useful for debugging the interpreter.

Testing with Interpreter

The interpreter is invoked with interpret(p: IRContext) to interpret normally and return an InterpreterState object containing the final state.

Traces

There is also, interpretTrace(p: IRContext) which returns a tuple of (InterpreterState, Trace(t: List[ExecEffect])), where the second argument contains a list of all the events generated by the interpreter in order. This is useful for asserting a stronger equivalence between program executions, but in most cases events describing "unobservable" behaviour, such as register accesses should be filtered out from this list before comparison.

To see an example of this used to validate the constant prop analysis see /src/test/scala/DifferentialAnalysis.scala.

BreakPoints

Finally interpretBreakPoints(p: IRContext, breakpoints: List[BreakPoint]) is used to run an interpreter and perform additional actions at specified code points. For example, this may be invoked such as:

val watch = IRWalk.firstInProc((program.procedures.find(_.name == "main")).get).get
val bp = BreakPoint("entrypoint",  BreakPointLoc.CMD(watch), BreakPointAction(saveState=true, stop=true, evalExprs=List(("R0", Register("R0", 64))), log=true))
val res = interpretBreakPoints(program, List(bp))

The structure of a breakpoint is as follows:

case class BreakPoint(name: String = "", location: BreakPointLoc, action: BreakPointAction)

// the place to perform the breakpoint action
enum BreakPointLoc:
  case CMD(c: Command)                      // at a command c
  case CMDCond(c: Command, condition: Expr) // at a command c, when condition evaluates to TrueLiteral

// describes what to do when the breakpoint is triggered
case class BreakPointAction(
    saveState: Boolean = true,                // stash the state of the interpreter
    stop: Boolean = false,                    // stop the interpreter with an error state
    evalExprs: List[(String, Expr)] = List(), // Evaluate the rhs of the list of expressions, and stash them (lhs is an arbitrary human-readable name)
    log: Boolean = false                      // Print a log message about passing the breakpoint describing the results of this action
)

To see an example of this used to validate the constant prop analysis see /src/test/scala/InterpretTestConstProp.scala.

Resource Limit

This kills the interpreter in an error state once a specified instruction count is reached, to avoid the interpreter running forever on infinite loops.

It can be used simply with the function interptretRLimit, this automatically ignores the initialisation instructions.

def interpretRLimit(p: IRContext, instructionLimit: Int) : InterpreterState

It can also be combined with other interpreters as shown:

def interp(p: IRContext, instructionLimit: Int) : (InterpreterState, Trace) = {
  val interpreter = LayerInterpreter(tracingInterpreter(NormalInterpreter), EffectsRLimit(instructionLimit))
  val initialState = InterpFuns.initProgState(NormalInterpreter)(p, InterpreterState())
  BASILInterpreter(interpreter).run((initialState, Trace(List())), 0)._1
}

Implementation / Code Structure

Summary

Bitvector.scala
- Evaluation of bitvector operations, throws IllegalArgumentException on violation of contract (e.g negative divisor, type mismatch)
ExprEval.scala
- Evaluation of expressions, defined in terms of partial evaluation down to a Literal
- This can also be used to evaluate expressions in static analyses, by passing a function to query variable assignments and memory state from the value domain.
Interpreter.scala
- Definition of core Effects[S, E] and Interpreter[S, E] types describing state transitions in the interpreter
- Instantiation/definition of Effects for concrete state InterpreterState
InterpreterProduct.scala
- Definition of product and layering composition of generic Effects[S, E]s interpreters
InterpretBasilIR.scala
- Definition of Eval object defining expression evaluation in terms of Effects[S, InterpreterError]
- Definition of Interpreter instance for BASIL IR, using a generic Effects instance and concrete state.
- Definition of ELF initialisation in terms of generic Effects[S, InterpreterError]
InterpretBreakpoints.scala
- Definition of a generic interpreter with a breakpoint checker layered on top
interpretRLimit.scala
- Definition of layered interpreter which terminates after a specified cycle count
InterpretTrace.scala
- Definition of a generic interpreter which records a trace of calls to the Effects[] instance.

Explanation

The interpreter is structured for compositionality, at its core is the Effects[S, E] type, defined in Interpreter.scala. This type defines a small set of functions which describe all the possible state transformations, over a concrete state S, and error type E (always InterpreterError in practice).

This is implemented using the state Monad, State[S,V,E] where S is the state, V the value, and E the error type. This is a flattened State[S, Either[E]], defined in util/functional.scala. Effects methods return delayed computations, functions from an input state (S) to a resulting state and a value ((S, Either[E, V])). These are sequenced using flatMap (monad bind), or the for{} yield() syntax sugar for flatMap.

This Effects[S, E] is instantiated for a given concrete state, the main example of which is NormalInterpreter <: Effects[InterpreterState, InterpreterError], also defined in Interpreter.scala. The memory component of the state is abstracted further into the MemoryState object.

The actual execution of code is defined on top of this, in the Interpreter[S, E] type, which takes an instance of the Effects by parameter, and defines both the small step (interpretOne) over on instruction, and the fixed point to termination from some in initial state in run(). The fact that the stepping is defined outside the effects is important, as it allows concrete states, and state transitions over them to be composed somewhat arbitrarily, and the interpretatation of the language compiled down to calls to resulting instance of Effects.

This is defined in InterpretBasilIR.scala. BASILInterpreter defines an Interpreter over an arbitrary instance of Effects[S, InterpreterError], encoding BASIL IR commands as effects. This file also contains definitions of the initial memory state setup of the interpreter, based on the ELF sections and symbol table.

Composition of interpreters

There are two ways to compose Effects, product and layer. Both produce an instance of Effects[(L, R), E], where L and R are the concrete state types of the two Effects being composed.

Product runs the two effects, over two different concrete state types, simultaneously without interaction.

Layer runs the before effect first, and passes its state to the inner effect whose value is returned.

case class ProductInterpreter[L, T, E](val inner: Effects[L, E], val before: Effects[T, E]) extends Effects[(L, T), E] {
case class LayerInterpreter[L, T, E](val inner: Effects[L, E], val before: Effects[(L, T), E])

Examples of using these are in the interpretTrace and interpretWithBreakPoints interpreters respectively.

Note, this only works by the aforementioned requirement that all effect calls come from outside the Effects[] instance itself. In the simple case, the Interpreter instance is the only object calling Effects. This means, Effects triggered by an inner Effects[] instance do not flow back to the ProductInterpreter, but only appear from when Interpreter above the ProductInterpreter interprets the program via effect calls. For this reason if, for example, NormalInterpreter makes effect calls they will not appear in a trace emitted by interptretTrace.

Note on memory space initialisation

Most of the interpret functions are overloaded such that there is a version taking a program interpret(p: Program), and a version taking IRContext. The variant taking IRContext uses the ELF symbol information to initialise the memory before interpretation. If you are interpreting a real program (i.e. not a synthetic example created through the DSL), this is most likely required.

We initialise:

The general interpreter state, stack and memory regions, stack pointer, a symbolic mapping from addresses functions
The initial and readonly memory sections stored in Program
The .bss section to zero
The relocation table. Each listed offset is stored an address to either a real procedure in the program, or a location storing a symbolic function pointer to an intrinsic function.

.bss is generally the top of the initialised data, the ELF symbol __bss_end__ being equal to the symbol __end__. Above this we can somewhat choose arbitrarily where to put things, usually the heap is above, followed by dynamically linked symbols, then the stack. There is currently no stack overflow checking, or heap implemented in the interpreter.

Unfortunately these details are defined by the load-time linker and the system's linker script, and it is hard to find a good description of their behaviour. Some details are described here https://refspecs.linuxfoundation.org/elf/elf.pdf, and here https://dl.acm.org/doi/abs/10.1145/2983990.2983996.

Missing features

There is functionality to implement external function calls via intrinsics written in Scala code, but currently only basic printf style functions are implemented as no-ops. These can be extended to use a file IO abstraction, where a memory region is created for each file (e.g. stdout), with a variable to keep track of the current write-point such that a file write operation stores to the write-point address, and increments it by the size of the store. Importantly, an implementation of malloc() and free() is needed, which can implement a simple greedy allocation algorithm.
Despite the presence of procedure parameters in the current IR, they are not used for by the boogie translation and are hence similarly ignored in the interpreter.
The interpreter's immutable state representation is motivated by the ability to easily implement a sound approach to non-determinism, e.g. to implement GoTos with guessing and rollback rather than look-ahead. This is more useful for checking specification constructs than executing real programs, so is not yet implemented.
The trace does not clearly distinguish internal vs external calls, or observable and non-observable behaviour.
While the interpreter semantics supports memory regions, we do not initialise the memory regions (or the initial memory state) based on those present in the program, we simply assume a flat mem and stack memory partitioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interpreter.md

interpreter.md

BASIL IR Interpreter

Basic Usage

Testing with Interpreter

Traces

BreakPoints

Resource Limit

Implementation / Code Structure

Summary

Explanation

Composition of interpreters

Note on memory space initialisation

Missing features

Files

interpreter.md

Latest commit

History

interpreter.md

File metadata and controls

BASIL IR Interpreter

Basic Usage

Testing with Interpreter

Traces

BreakPoints

Resource Limit

Implementation / Code Structure

Summary

Explanation

Composition of interpreters

Note on memory space initialisation

Missing features