Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbolic tracing #157

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ set(SHARED_RUNTIME_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/RuntimeCommon.cpp
${CMAKE_CURRENT_SOURCE_DIR}/LibcWrappers.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Shadow.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Tracer.cpp
${CMAKE_CURRENT_SOURCE_DIR}/GarbageCollection.cpp)

if (${QSYM_BACKEND})
Expand Down
4 changes: 4 additions & 0 deletions runtime/GarbageCollection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
/// A list of memory regions that are known to contain symbolic expressions.
std::vector<ExpressionRegion> expressionRegions;

std::vector<ExpressionRegion>& getExpressionRegions() {
return expressionRegions;
}

void registerExpressionRegion(ExpressionRegion r) {
expressionRegions.push_back(std::move(r));
}
Expand Down
1 change: 1 addition & 0 deletions runtime/GarbageCollection.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ using ExpressionRegion = std::pair<SymExpr *, size_t>;
/// Add the specified region to the list of places to search for symbolic
/// expressions.
void registerExpressionRegion(ExpressionRegion r);
std::vector<ExpressionRegion>& getExpressionRegions();

/// Return the set of currently reachable symbolic expressions.
std::set<SymExpr> collectReachableExpressions();
Expand Down
8 changes: 8 additions & 0 deletions runtime/RuntimeCommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include "GarbageCollection.h"
#include "RuntimeCommon.h"
#include "Shadow.h"
#include "Tracer.h"

namespace {

Expand Down Expand Up @@ -486,3 +487,10 @@ SymExpr _sym_build_bit_to_bool(SymExpr expr) {
return _sym_build_not_equal(expr,
_sym_build_integer(0, _sym_bits_helper(expr)));
}

void _sym_trace_execution(uintptr_t pc) { Tracer::traceStep(pc);
}

void _sym_finalize_tracing() {
Tracer::writeTraceToDisk();
}
2 changes: 2 additions & 0 deletions runtime/RuntimeCommon.h
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,8 @@ void _sym_notify_basic_block(uintptr_t site_id);
*/
const char *_sym_expr_to_string(SymExpr expr); // statically allocated
bool _sym_feasible(SymExpr expr);
void _sym_trace_execution(uintptr_t pc);
void _sym_finalize_tracing(void);

/*
* Garbage collection
Expand Down
113 changes: 113 additions & 0 deletions runtime/Tracer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@

#include "Tracer.h"
#include "Shadow.h"
#include "llvm/Support/raw_ostream.h"

nlohmann::json Tracer::currentTrace;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RuntimeCommon.cpp there is a goal to do the runtime thread local. You use here the global state. Could you refactor your code so that it can be run in multiple threads in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand. Are we talking about threads of the tested program ? In my understanding, the shadow memory pages, the set of accumulated path constraints, etc. should be shared across threads of the target program.

As this tracing feature simply takes snapshots of this state, I do not see how it could be thread local.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, the shadow memory pages, the set of accumulated path constraints, etc. should be shared across threads of the target program.

I think that in case of multi-thread tested programs the set of accumulated path constraints should be thread local, because each thread will execute potentially different path, so the set should be different. According to the shadow memory, you are right.

My idea was to have the possibility to execute the multiple runtimes in different threads to implement #14 without forking, but I don't know if that has a much sense.

Nevertheless if there is a possibility to doesn't have global state we should work in that direction, because programs with such state doesn't scale well.

nlohmann::json Tracer::expressions;
nlohmann::json Tracer::pathConstraints;
const std::string Tracer::BACKEND_TRACE_FILE = "/tmp/backend_trace.json";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be either configurable or created in more generic way (e.g. mktemp).


void Tracer::traceStep(uintptr_t pc) {

nlohmann::json newEntry;
newEntry["pc"] = pc;
newEntry["memory_to_expression_mapping"] = nlohmann::json::object();

/* dump shadow pages */
for (auto const &[pageAddress, _] : g_shadow_pages) {
for (auto byteAddress = pageAddress; byteAddress < pageAddress + kPageSize;
byteAddress++) {
auto byteExpr = _sym_read_memory((u_int8_t *)byteAddress, 1, true);
if (byteExpr != nullptr && !byteExpr->isConcrete()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can compare the current memory state with previous one to don't duplicate information in each step and only print diffs? This will reduce number of IO operations and needed memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would certainly be a useful improvement but unfortunately I will not have time to work on this.


newEntry["memory_to_expression_mapping"][std::to_string(reinterpret_cast<uintptr_t>(byteAddress))] =
getExpressionID(byteExpr);
}
}
}

/* dump regions that the client has registered with _sym_register_expression_region
* for SymQEMU, this is env_exprs which contains the expressions for the global TCG variables (i.e., guest registers) */
for (auto &expressionRegion : getExpressionRegions()){
for (auto byteAddress = expressionRegion.first; byteAddress < expressionRegion.first + expressionRegion.second / sizeof(byteAddress);
byteAddress++) {
auto byteExpr = *byteAddress;
if (byteExpr != nullptr && !byteExpr->isConcrete()) {

newEntry["memory_to_expression_mapping"][std::to_string(reinterpret_cast<uintptr_t>(byteAddress))] =
getExpressionID(byteExpr);
}
}
}

currentTrace.push_back(newEntry);
}


void Tracer::tracePathConstraint(SymExpr constraint, bool taken) {
if (pathConstraints.empty()) {
symcc_set_test_case_handler(
reinterpret_cast<TestCaseHandler>(traceNewInput));
}

nlohmann::json newEntry;
newEntry["expression"] = getExpressionID(constraint);
newEntry["after_step"] = currentTrace.size() - 1;
newEntry["new_input_value"] = nlohmann::json();
newEntry["taken"] = taken;

pathConstraints.push_back(newEntry);
}

void Tracer::traceNewInput(const unsigned char *input, size_t size) {
for (size_t i = 0; i < size; i++) {
pathConstraints[pathConstraints.size() - 1]["new_input_value"].push_back(
input[i]);
}
}

void Tracer::writeTraceToDisk() {
for (auto const &[_, expressionPtr] : getAllocatedExpressions()) {
recursivelyCollectExpressions(expressionPtr);
}

nlohmann::json dataToSave;
dataToSave["trace"] = currentTrace;
dataToSave["expressions"] = expressions;
dataToSave["path_constraints"] = pathConstraints;

std::ofstream o(BACKEND_TRACE_FILE);
o << std::setw(4) << dataToSave << std::endl;
}

void Tracer::recursivelyCollectExpressions(const shared_ptr<qsym::Expr>&expressionPtr) {
string expressionID = getExpressionID(expressionPtr.get());
if (expressions.count(expressionID) > 0) {
return;
}

expressions[expressionID]["operation"]["kind"] = expressionPtr->kind();
expressions[expressionID]["operation"]["properties"] = nlohmann::json::object();
if (expressionPtr->kind() == qsym::Constant){
auto value_llvm_int = static_pointer_cast<qsym::ConstantExpr>(expressionPtr)->value();
std::string value_str;
llvm::raw_string_ostream rso(value_str);
value_llvm_int.print(rso, false);

expressions[expressionID]["operation"]["properties"]["value"] = value_str;
}
expressions[expressionID]["size_bits"] = expressionPtr->bits();
expressions[expressionID]["input_byte_dependency"] = *expressionPtr->getDependencies();
expressions[expressionID]["args"] = nlohmann::json::array();
for (int child_i = 0; child_i < expressionPtr->num_children(); child_i++) {
shared_ptr<qsym::Expr> child = expressionPtr->getChild(child_i);
string childID = getExpressionID(child.get());
expressions[expressionID]["args"].push_back(childID);
recursivelyCollectExpressions(child);
}
}

string Tracer::getExpressionID(SymExpr expression) {
return std::to_string(reinterpret_cast<uintptr_t>(expression));
}
58 changes: 58 additions & 0 deletions runtime/Tracer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@

#ifndef SYMRUNTIME_TRACER_H
#define SYMRUNTIME_TRACER_H

#include "GarbageCollection.h"
#include "Runtime.h"
#include <cstdint>
#include <fstream>
#include <nlohmann/json.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json library has not been added to SymCC dependencies. Please update the README, dockerfile and github workflows.


/**
* Generates a trace of the symbolic execution.
*
* A trace contains a list of symbolic memory snapshots, a list of path constraints and a list of expressions.
* The trace is written to a json file at the end of the execution.
*/
class Tracer {
public:
/**
* Adds a dump of the current symbolic state to the trace.
*/
static void traceStep(uintptr_t pc);

/**
* Adds a path constraint to the trace.
*/
static void tracePathConstraint(SymExpr constraint, bool taken);

/**
* Adds an input generated by the symbolic backend.
* The input is associated to the last path constraint.
*/
static void traceNewInput(const unsigned char* input, size_t size);

/**
* Generates a json file containing the trace.
* Must be called before the program exits.
*/
static void writeTraceToDisk();

private:
/**
* Collects all expressions reachable from the given expression.
*/
static void recursivelyCollectExpressions(const shared_ptr<qsym::Expr>&expressionPtr);

/**
* Generates a string ID from the address of the given expression.
*/
static string getExpressionID(SymExpr expression);

static nlohmann::json currentTrace;
static nlohmann::json expressions;
static nlohmann::json pathConstraints;
static const std::string BACKEND_TRACE_FILE;
};

#endif // SYMRUNTIME_TRACER_H
8 changes: 7 additions & 1 deletion runtime/qsym_backend/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,13 @@ target_include_directories(SymRuntime SYSTEM PRIVATE
# We need to get the LLVM support component for llvm::APInt.
llvm_map_components_to_libnames(QSYM_LLVM_DEPS support)

target_link_libraries(SymRuntime ${Z3_LIBRARIES} ${QSYM_LLVM_DEPS})
include(FetchContent)

FetchContent_Declare(json URL https://github.com/nlohmann/json/releases/download/v3.11.2/json.tar.xz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Why this is in qsym_backend specific directory? Why you fetch the code instead of require nlohmann-json to be available in operating system? This is very popular package and is available on major distrubiutions, e.g:
Ubuntu https://packages.ubuntu.com/jammy/nlohmann-json3-dev
Debian https://packages.debian.org/bookworm/nlohmann-json3-dev
Arch Linux https://archlinux.org/packages/extra/any/nlohmann-json/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no experience working with CMake, I simply used a solution that was suggested in the README of nlohmann-json. If this is a bad solution maybe you could change it ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to prepare a PR till the end of week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I created a PR: damienmaier#1

FetchContent_MakeAvailable(json)


target_link_libraries(SymRuntime ${Z3_LIBRARIES} ${QSYM_LLVM_DEPS} nlohmann_json::nlohmann_json)

# We use std::filesystem, which has been added in C++17. Before its official
# inclusion in the standard library, Clang shipped the feature first in
Expand Down
8 changes: 7 additions & 1 deletion runtime/qsym_backend/Runtime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

#include "Runtime.h"
#include "GarbageCollection.h"
#include "Tracer.h"

// C++
#if __has_include(<filesystem>)
Expand Down Expand Up @@ -154,6 +155,10 @@ EnhancedQsymSolver *g_enhanced_solver;

} // namespace

std::map<SymExpr, qsym::ExprRef> &getAllocatedExpressions() {
return allocatedExpressions;
}

using namespace qsym;

#if HAVE_FILESYSTEM
Expand Down Expand Up @@ -321,6 +326,7 @@ void _sym_push_path_constraint(SymExpr constraint, int taken,
if (constraint == nullptr)
return;

Tracer::tracePathConstraint(constraint, taken != 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be calls to tracer put into compiler plugin code? So that it will be runtime independent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. What do you mean by compiler plugin code ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler directory in symcc repository root (https://github.com/eurecom-s3/symcc/tree/master/compiler). There is a code, which is executed on compilation phase as a plugin to clang.

Copy link
Contributor

@rmalmain rmalmain Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho it should also not be put in compiler, otherwise, SymQEMU will not be able to use tracing.
I guess the best option would be to have _sym_push_path_constraint wrapped into a runtime-agnostic function calling to the tracer and the runtime-specific corresponding function.
The extra call indirection should be optimized out at compile-time.

g_solver->addJcc(allocatedExpressions.at(constraint), taken != 0, site_id);
}

Expand Down Expand Up @@ -449,7 +455,7 @@ bool _sym_feasible(SymExpr expr) {
//

void _sym_collect_garbage() {
if (allocatedExpressions.size() < g_config.garbageCollectionThreshold)
// if (allocatedExpressions.size() < g_config.garbageCollectionThreshold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you left that comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracer identifies each expression with a unique ID, that is used in the JSON file. Currently, I used the memory address of the expression as ID and thus to avoid ID collision we need to disable the garbage collection.

As discussed above, I will put the code of this PR inside #ifdefs such that garbage collection is disabled only when performing tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe changing way of creating IDs will be a good idea? I am afraid that after turning off garbage collection the memory usage can explode.

return;

#ifdef DEBUG_RUNTIME
Expand Down
2 changes: 2 additions & 0 deletions runtime/qsym_backend/Runtime.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,6 @@
typedef qsym::Expr *SymExpr;
#include <RuntimeCommon.h>

std::map<SymExpr, qsym::ExprRef> &getAllocatedExpressions();

#endif
4 changes: 4 additions & 0 deletions util/symbolic_trace/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
build
symcctrace.egg-info
.idea
__pycache__
7 changes: 7 additions & 0 deletions util/symbolic_trace/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
This is a python package for parsing a trace generated by the tracing feature of SymCC.

Install it with

```
pip install .
```
3 changes: 3 additions & 0 deletions util/symbolic_trace/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[project]
name = "symcctrace"
version = "1.0"
2 changes: 2 additions & 0 deletions util/symbolic_trace/symcctrace/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .parsing import parse_trace
from .data import TraceData
Loading
Loading