The interpreter is designed for testing, debugging, and validation of static analyses and code transforms. This page describes first how it can be used for this purpose, and secondly its design.
The interpreter can be invoked from the command line, via the interpret flag, by default this prints a trace and checks that the interpreter exited on a non-error stop state.
./mill run -i src/test/correct/indirect_call/gcc/indirect_call.adt -r src/test/correct/indirect_call/gcc/indirect_call.relf --interpret
[INFO] Interpreter Trace:
StoreVar(#5,Local,0xfd0:bv64)
StoreMem(mem,HashMap(0xfd0:bv64 -> 0xf0:bv8, 0xfd6:bv64 -> 0x0:bv8, 0xfd2:bv64 -> 0x0:bv8, 0xfd3:bv64 -> 0x0:bv8, 0xfd4:bv64 -> 0x0:bv8, 0xfd7:bv64 -> 0x0:bv8, 0xfd5:bv64 -> 0x0:bv8, 0xfd1
...
[INFO] Interpreter stopped normally.
The --verbose
flag can also be used, which may print interpreter trace events as they are executed, but not this may not correspond to the actual
execution trace, and contain additional events not corresponding to the program.
E.g. this shows the memory intialisation events that precede the program execution. This is mainly useful for debugging the interpreter.
The interpreter is invoked with interpret(p: IRContext)
to interpret normally and return an InterpreterState
object
containing the final state.
There is also, interpretTrace(p: IRContext)
which returns a tuple of (InterpreterState, Trace(t: List[ExecEffect]))
,
where the second argument contains a list of all the events generated by the interpreter in order.
This is useful for asserting a stronger equivalence between program executions, but in most cases events describing "unobservable"
behaviour, such as register accesses should be filtered out from this list before comparison.
To see an example of this used to validate the constant prop analysis see /src/test/scala/DifferentialAnalysis.scala.
Finally interpretBreakPoints(p: IRContext, breakpoints: List[BreakPoint])
is used to
run an interpreter and perform additional actions at specified code points. For example, this may be invoked such as:
val watch = IRWalk.firstInProc((program.procedures.find(_.name == "main")).get).get
val bp = BreakPoint("entrypoint", BreakPointLoc.CMD(watch), BreakPointAction(saveState=true, stop=true, evalExprs=List(("R0", Register("R0", 64))), log=true))
val res = interpretBreakPoints(program, List(bp))
The structure of a breakpoint is as follows:
case class BreakPoint(name: String = "", location: BreakPointLoc, action: BreakPointAction)
// the place to perform the breakpoint action
enum BreakPointLoc:
case CMD(c: Command) // at a command c
case CMDCond(c: Command, condition: Expr) // at a command c, when condition evaluates to TrueLiteral
// describes what to do when the breakpoint is triggered
case class BreakPointAction(
saveState: Boolean = true, // stash the state of the interpreter
stop: Boolean = false, // stop the interpreter with an error state
evalExprs: List[(String, Expr)] = List(), // Evaluate the rhs of the list of expressions, and stash them (lhs is an arbitrary human-readable name)
log: Boolean = false // Print a log message about passing the breakpoint describing the results of this action
)
To see an example of this used to validate the constant prop analysis see /src/test/scala/InterpretTestConstProp.scala.
This kills the interpreter in an error state once a specified instruction count is reached, to avoid the interpreter running forever on infinite loops.
It can be used simply with the function interptretRLimit
, this automatically ignores the initialisation instructions.
def interpretRLimit(p: IRContext, instructionLimit: Int) : InterpreterState
It can also be combined with other interpreters as shown:
def interp(p: IRContext, instructionLimit: Int) : (InterpreterState, Trace) = {
val interpreter = LayerInterpreter(tracingInterpreter(NormalInterpreter), EffectsRLimit(instructionLimit))
val initialState = InterpFuns.initProgState(NormalInterpreter)(p, InterpreterState())
BASILInterpreter(interpreter).run((initialState, Trace(List())), 0)._1
}
- Bitvector.scala
- Evaluation of bitvector operations, throws
IllegalArgumentException
on violation of contract (e.g negative divisor, type mismatch)
- Evaluation of bitvector operations, throws
- ExprEval.scala
- Evaluation of expressions, defined in terms of partial evaluation down to a Literal
- This can also be used to evaluate expressions in static analyses, by passing a function to query variable assignments and memory state from the value domain.
- Interpreter.scala
- Definition of core
Effects[S, E]
andInterpreter[S, E]
types describing state transitions in the interpreter - Instantiation/definition of
Effects
for concrete stateInterpreterState
- Definition of core
- InterpreterProduct.scala
- Definition of product and layering composition of generic
Effects[S, E]
s interpreters
- Definition of product and layering composition of generic
- InterpretBasilIR.scala
- Definition of
Eval
object defining expression evaluation in terms ofEffects[S, InterpreterError]
- Definition of
Interpreter
instance for BASIL IR, using a genericEffects
instance and concrete state. - Definition of ELF initialisation in terms of generic
Effects[S, InterpreterError]
- Definition of
- InterpretBreakpoints.scala
- Definition of a generic interpreter with a breakpoint checker layered on top
- interpretRLimit.scala
- Definition of layered interpreter which terminates after a specified cycle count
- InterpretTrace.scala
- Definition of a generic interpreter which records a trace of calls to the
Effects[]
instance.
- Definition of a generic interpreter which records a trace of calls to the
The interpreter is structured for compositionality, at its core is the Effects[S, E]
type, defined in Interpreter.scala.
This type defines a small set of functions which describe all the possible state transformations, over a concrete state S
, and error type E
(always InterpreterError
in practice).
This is implemented using the state Monad, State[S,V,E]
where S
is the state, V
the value, and E
the error type.
This is a flattened State[S, Either[E]]
, defined in util/functional.scala.
Effects
methods return delayed computations, functions from an input state (S
) to a resulting state and a value ((S, Either[E, V])
).
These are sequenced using flatMap
(monad bind), or the for{} yield()
syntax sugar for flatMap.
This Effects[S, E]
is instantiated for a given concrete state, the main example of which is NormalInterpreter <: Effects[InterpreterState, InterpreterError]
,
also defined in Interpreter.scala
. The memory component of the state is abstracted further into the MemoryState
object.
The actual execution of code is defined on top of this, in the Interpreter[S, E]
type, which takes an instance of the Effects
by parameter,
and defines both the small step (interpretOne
) over on instruction, and the fixed point to termination from some in initial state in run()
.
The fact that the stepping is defined outside the effects is important, as it allows concrete states, and state transitions over them to be
composed somewhat arbitrarily, and the interpretatation of the language compiled down to calls to resulting instance of Effects
.
This is defined in InterpretBasilIR.scala. BASILInterpreter
defines an
Interpreter
over an arbitrary instance of Effects[S, InterpreterError]
, encoding BASIL IR commands as effects.
This file also contains definitions of the initial memory state setup of the interpreter, based on the ELF sections and symbol table.
There are two ways to compose Effects
, product and layer. Both produce an instance of Effects[(L, R), E]
,
where L
and R
are the concrete state types of the two Effects being composed.
Product runs the two effects, over two different concrete state types, simultaneously without interaction.
Layer runs the before
effect first, and passes its state to the inner
effect whose value is returned.
case class ProductInterpreter[L, T, E](val inner: Effects[L, E], val before: Effects[T, E]) extends Effects[(L, T), E] {
case class LayerInterpreter[L, T, E](val inner: Effects[L, E], val before: Effects[(L, T), E])
Examples of using these are in the interpretTrace
and interpretWithBreakPoints
interpreters respectively.
Note, this only works by the aforementioned requirement that all effect calls come from outside the Effects[]
instance itself. In the simple case, the Interpreter
instance is the only object calling Effects
.
This means, Effects
triggered by an inner Effects[]
instance do not flow back to the ProductInterpreter
,
but only appear from when Interpreter
above the ProductInterpreter
interprets the program via effect calls.
For this reason if, for example, NormalInterpreter
makes effect calls they will not appear in a trace emitted by interptretTrace
.
Most of the interpret functions are overloaded such that there is a version taking a program interpret(p: Program)
,
and a version taking IRContext
. The variant taking IRContext uses the ELF symbol information to initialise the
memory before interpretation. If you are interpreting a real program (i.e. not a synthetic example created through
the DSL), this is most likely required.
We initialise:
- The general interpreter state, stack and memory regions, stack pointer, a symbolic mapping from addresses functions
- The initial and readonly memory sections stored in Program
- The
.bss
section to zero - The relocation table. Each listed offset is stored an address to either a real procedure in the program, or a location storing a symbolic function pointer to an intrinsic function.
.bss
is generally the top of the initialised data, the ELF symbol __bss_end__
being equal to the symbol __end__
.
Above this we can somewhat choose arbitrarily where to put things, usually the heap is above, followed by
dynamically linked symbols, then the stack. There is currently no stack overflow checking, or heap implemented in the
interpreter.
Unfortunately these details are defined by the load-time linker and the system's linker script, and it is hard to find a good description of their behaviour. Some details are described here https://refspecs.linuxfoundation.org/elf/elf.pdf, and here https://dl.acm.org/doi/abs/10.1145/2983990.2983996.
- There is functionality to implement external function calls via intrinsics written in Scala code, but currently only basic printf style functions are implemented as no-ops. These can be extended to use a file IO abstraction, where a memory region is created for each file (e.g. stdout), with a variable to keep track of the current write-point such that a file write operation stores to the write-point address, and increments it by the size of the store. Importantly, an implementation of malloc() and free() is needed, which can implement a simple greedy allocation algorithm.
- Despite the presence of procedure parameters in the current IR, they are not used for by the boogie translation and are hence similarly ignored in the interpreter.
- The interpreter's immutable state representation is motivated by the ability to easily implement a sound approach to non-determinism, e.g. to implement GoTos with guessing and rollback rather than look-ahead. This is more useful for checking specification constructs than executing real programs, so is not yet implemented.
- The trace does not clearly distinguish internal vs external calls, or observable and non-observable behaviour.
- While the interpreter semantics supports memory regions, we do not initialise the memory regions (or the initial memory state)
based on those present in the program, we simply assume a flat
mem
andstack
memory partitioning.