-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symbolic tracing #157
base: master
Are you sure you want to change the base?
Symbolic tracing #157
Changes from 13 commits
4f425fb
22373f0
f3bc3bb
e1e9b5c
eeb921a
f4c799a
0c43f2b
eed93ba
cd03543
a2ee76c
fd687bc
b6962fd
85fff40
e84cf1a
d3b4ca5
ae6161b
e3f3793
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
|
||
#include "Tracer.h" | ||
#include "Shadow.h" | ||
#include "llvm/Support/raw_ostream.h" | ||
|
||
nlohmann::json Tracer::currentTrace; | ||
nlohmann::json Tracer::expressions; | ||
nlohmann::json Tracer::pathConstraints; | ||
const std::string Tracer::BACKEND_TRACE_FILE = "/tmp/backend_trace.json"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be either configurable or created in more generic way (e.g. |
||
|
||
void Tracer::traceStep(uintptr_t pc) { | ||
|
||
nlohmann::json newEntry; | ||
newEntry["pc"] = pc; | ||
newEntry["memory_to_expression_mapping"] = nlohmann::json::object(); | ||
|
||
/* dump shadow pages */ | ||
for (auto const &[pageAddress, _] : g_shadow_pages) { | ||
for (auto byteAddress = pageAddress; byteAddress < pageAddress + kPageSize; | ||
byteAddress++) { | ||
auto byteExpr = _sym_read_memory((u_int8_t *)byteAddress, 1, true); | ||
if (byteExpr != nullptr && !byteExpr->isConcrete()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can compare the current memory state with previous one to don't duplicate information in each step and only print diffs? This will reduce number of IO operations and needed memory. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would certainly be a useful improvement but unfortunately I will not have time to work on this. |
||
|
||
newEntry["memory_to_expression_mapping"][std::to_string(reinterpret_cast<uintptr_t>(byteAddress))] = | ||
getExpressionID(byteExpr); | ||
} | ||
} | ||
} | ||
|
||
/* dump regions that the client has registered with _sym_register_expression_region | ||
* for SymQEMU, this is env_exprs which contains the expressions for the global TCG variables (i.e., guest registers) */ | ||
for (auto &expressionRegion : getExpressionRegions()){ | ||
for (auto byteAddress = expressionRegion.first; byteAddress < expressionRegion.first + expressionRegion.second / sizeof(byteAddress); | ||
byteAddress++) { | ||
auto byteExpr = *byteAddress; | ||
if (byteExpr != nullptr && !byteExpr->isConcrete()) { | ||
|
||
newEntry["memory_to_expression_mapping"][std::to_string(reinterpret_cast<uintptr_t>(byteAddress))] = | ||
getExpressionID(byteExpr); | ||
} | ||
} | ||
} | ||
|
||
currentTrace.push_back(newEntry); | ||
} | ||
|
||
|
||
void Tracer::tracePathConstraint(SymExpr constraint, bool taken) { | ||
if (pathConstraints.empty()) { | ||
symcc_set_test_case_handler( | ||
reinterpret_cast<TestCaseHandler>(traceNewInput)); | ||
} | ||
|
||
nlohmann::json newEntry; | ||
newEntry["expression"] = getExpressionID(constraint); | ||
newEntry["after_step"] = currentTrace.size() - 1; | ||
newEntry["new_input_value"] = nlohmann::json(); | ||
newEntry["taken"] = taken; | ||
|
||
pathConstraints.push_back(newEntry); | ||
} | ||
|
||
void Tracer::traceNewInput(const unsigned char *input, size_t size) { | ||
for (size_t i = 0; i < size; i++) { | ||
pathConstraints[pathConstraints.size() - 1]["new_input_value"].push_back( | ||
input[i]); | ||
} | ||
} | ||
|
||
void Tracer::writeTraceToDisk() { | ||
for (auto const &[_, expressionPtr] : getAllocatedExpressions()) { | ||
recursivelyCollectExpressions(expressionPtr); | ||
} | ||
|
||
nlohmann::json dataToSave; | ||
dataToSave["trace"] = currentTrace; | ||
dataToSave["expressions"] = expressions; | ||
dataToSave["path_constraints"] = pathConstraints; | ||
|
||
std::ofstream o(BACKEND_TRACE_FILE); | ||
o << std::setw(4) << dataToSave << std::endl; | ||
} | ||
|
||
void Tracer::recursivelyCollectExpressions(const shared_ptr<qsym::Expr>&expressionPtr) { | ||
string expressionID = getExpressionID(expressionPtr.get()); | ||
if (expressions.count(expressionID) > 0) { | ||
return; | ||
} | ||
|
||
expressions[expressionID]["operation"]["kind"] = expressionPtr->kind(); | ||
expressions[expressionID]["operation"]["properties"] = nlohmann::json::object(); | ||
if (expressionPtr->kind() == qsym::Constant){ | ||
auto value_llvm_int = static_pointer_cast<qsym::ConstantExpr>(expressionPtr)->value(); | ||
std::string value_str; | ||
llvm::raw_string_ostream rso(value_str); | ||
value_llvm_int.print(rso, false); | ||
|
||
expressions[expressionID]["operation"]["properties"]["value"] = value_str; | ||
} | ||
expressions[expressionID]["size_bits"] = expressionPtr->bits(); | ||
expressions[expressionID]["input_byte_dependency"] = *expressionPtr->getDependencies(); | ||
expressions[expressionID]["args"] = nlohmann::json::array(); | ||
for (int child_i = 0; child_i < expressionPtr->num_children(); child_i++) { | ||
shared_ptr<qsym::Expr> child = expressionPtr->getChild(child_i); | ||
string childID = getExpressionID(child.get()); | ||
expressions[expressionID]["args"].push_back(childID); | ||
recursivelyCollectExpressions(child); | ||
} | ||
} | ||
|
||
string Tracer::getExpressionID(SymExpr expression) { | ||
return std::to_string(reinterpret_cast<uintptr_t>(expression)); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
|
||
#ifndef SYMRUNTIME_TRACER_H | ||
#define SYMRUNTIME_TRACER_H | ||
|
||
#include "GarbageCollection.h" | ||
#include "Runtime.h" | ||
#include <cstdint> | ||
#include <fstream> | ||
#include <nlohmann/json.hpp> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Json library has not been added to SymCC dependencies. Please update the README, dockerfile and github workflows. |
||
|
||
/** | ||
* Generates a trace of the symbolic execution. | ||
* | ||
* A trace contains a list of symbolic memory snapshots, a list of path constraints and a list of expressions. | ||
* The trace is written to a json file at the end of the execution. | ||
*/ | ||
class Tracer { | ||
public: | ||
/** | ||
* Adds a dump of the current symbolic state to the trace. | ||
*/ | ||
static void traceStep(uintptr_t pc); | ||
|
||
/** | ||
* Adds a path constraint to the trace. | ||
*/ | ||
static void tracePathConstraint(SymExpr constraint, bool taken); | ||
|
||
/** | ||
* Adds an input generated by the symbolic backend. | ||
* The input is associated to the last path constraint. | ||
*/ | ||
static void traceNewInput(const unsigned char* input, size_t size); | ||
|
||
/** | ||
* Generates a json file containing the trace. | ||
* Must be called before the program exits. | ||
*/ | ||
static void writeTraceToDisk(); | ||
|
||
private: | ||
/** | ||
* Collects all expressions reachable from the given expression. | ||
*/ | ||
static void recursivelyCollectExpressions(const shared_ptr<qsym::Expr>&expressionPtr); | ||
|
||
/** | ||
* Generates a string ID from the address of the given expression. | ||
*/ | ||
static string getExpressionID(SymExpr expression); | ||
|
||
static nlohmann::json currentTrace; | ||
static nlohmann::json expressions; | ||
static nlohmann::json pathConstraints; | ||
static const std::string BACKEND_TRACE_FILE; | ||
}; | ||
|
||
#endif // SYMRUNTIME_TRACER_H |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,7 +83,13 @@ target_include_directories(SymRuntime SYSTEM PRIVATE | |
# We need to get the LLVM support component for llvm::APInt. | ||
llvm_map_components_to_libnames(QSYM_LLVM_DEPS support) | ||
|
||
target_link_libraries(SymRuntime ${Z3_LIBRARIES} ${QSYM_LLVM_DEPS}) | ||
include(FetchContent) | ||
|
||
FetchContent_Declare(json URL https://github.com/nlohmann/json/releases/download/v3.11.2/json.tar.xz) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? Why this is in qsym_backend specific directory? Why you fetch the code instead of require nlohmann-json to be available in operating system? This is very popular package and is available on major distrubiutions, e.g: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no experience working with CMake, I simply used a solution that was suggested in the README of nlohmann-json. If this is a bad solution maybe you could change it ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will try to prepare a PR till the end of week. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, |
||
FetchContent_MakeAvailable(json) | ||
|
||
|
||
target_link_libraries(SymRuntime ${Z3_LIBRARIES} ${QSYM_LLVM_DEPS} nlohmann_json::nlohmann_json) | ||
|
||
# We use std::filesystem, which has been added in C++17. Before its official | ||
# inclusion in the standard library, Clang shipped the feature first in | ||
|
@@ -92,5 +98,5 @@ target_link_libraries(SymRuntime ${Z3_LIBRARIES} ${QSYM_LLVM_DEPS}) | |
# some current LTS distributions ship a GCC that requires libstdc++fs for | ||
# std::filesystem - we catch this case in order to enable users of such systems | ||
# to build with the default compiler. | ||
find_package(Filesystem COMPONENTS Final Experimental) | ||
target_link_libraries(SymRuntime std::filesystem) | ||
#find_package(Filesystem COMPONENTS Final Experimental) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this is commented out? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was not intended to go in this PR, I reverted it |
||
#target_link_libraries(SymRuntime std::filesystem) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
|
||
#include "Runtime.h" | ||
#include "GarbageCollection.h" | ||
#include "Tracer.h" | ||
|
||
// C++ | ||
#if __has_include(<filesystem>) | ||
|
@@ -75,11 +76,6 @@ z3::context *g_z3_context; | |
|
||
} // namespace qsym | ||
|
||
namespace { | ||
|
||
/// Indicate whether the runtime has been initialized. | ||
std::atomic_flag g_initialized = ATOMIC_FLAG_INIT; | ||
|
||
/// A mapping of all expressions that we have ever received from QSYM to the | ||
/// corresponding shared pointers on the heap. | ||
/// | ||
|
@@ -91,6 +87,17 @@ std::atomic_flag g_initialized = ATOMIC_FLAG_INIT; | |
/// workload. | ||
std::map<SymExpr, qsym::ExprRef> allocatedExpressions; | ||
|
||
std::map<SymExpr, qsym::ExprRef> &getAllocatedExpressions() { | ||
return allocatedExpressions; | ||
} | ||
|
||
namespace { | ||
|
||
/// Indicate whether the runtime has been initialized. | ||
std::atomic_flag g_initialized = ATOMIC_FLAG_INIT; | ||
|
||
|
||
|
||
SymExpr registerExpression(const qsym::ExprRef &expr) { | ||
SymExpr rawExpr = expr.get(); | ||
|
||
|
@@ -135,18 +142,30 @@ class EnhancedQsymSolver : public qsym::Solver { | |
} | ||
|
||
void saveValues(const std::string &suffix) override { | ||
auto values = getConcreteValues(); | ||
if (auto handler = g_test_case_handler) { | ||
auto values = getConcreteValues(); | ||
|
||
// The test-case handler may be instrumented, so let's call it with | ||
// argument expressions to meet instrumented code's expectations. | ||
// Otherwise, we might end up erroneously using whatever expression was | ||
// last registered for a function parameter. | ||
_sym_set_parameter_expression(0, nullptr); | ||
_sym_set_parameter_expression(1, nullptr); | ||
handler(values.data(), values.size()); | ||
} else { | ||
Solver::saveValues(suffix); | ||
} | ||
// } else { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why you left that comment? |
||
Solver::saveValues(suffix); | ||
puts("New testcase hex : "); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why you added these prints? I suppose that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again this is no supposed to come with this PR, I reverted it |
||
for (unsigned char value : values) { | ||
printf("%02x", value); | ||
} | ||
puts(""); | ||
puts("New testcase ascii : "); | ||
for (unsigned char value : values) { | ||
printf("%c", value); | ||
} | ||
puts(""); | ||
// } | ||
} | ||
}; | ||
|
||
|
@@ -321,6 +340,7 @@ void _sym_push_path_constraint(SymExpr constraint, int taken, | |
if (constraint == nullptr) | ||
return; | ||
|
||
Tracer::tracePathConstraint(constraint, taken != 0); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't be calls to tracer put into compiler plugin code? So that it will be runtime independent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get it. What do you mean by compiler plugin code ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imho it should also not be put in |
||
g_solver->addJcc(allocatedExpressions.at(constraint), taken != 0, site_id); | ||
} | ||
|
||
|
@@ -449,7 +469,7 @@ bool _sym_feasible(SymExpr expr) { | |
// | ||
|
||
void _sym_collect_garbage() { | ||
if (allocatedExpressions.size() < g_config.garbageCollectionThreshold) | ||
// if (allocatedExpressions.size() < g_config.garbageCollectionThreshold) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why you left that comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tracer identifies each expression with a unique ID, that is used in the JSON file. Currently, I used the memory address of the expression as ID and thus to avoid ID collision we need to disable the garbage collection. As discussed above, I will put the code of this PR inside There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So maybe changing way of creating IDs will be a good idea? I am afraid that after turning off garbage collection the memory usage can explode. |
||
return; | ||
|
||
#ifdef DEBUG_RUNTIME | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
build | ||
symcctrace.egg-info | ||
.idea | ||
__pycache__ |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You added information how to install it, but not how to use it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I will improve the README |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
This is a python package for parsing a trace generated by the tracing feature of SymCC. | ||
|
||
Install it with | ||
|
||
``` | ||
pip install . | ||
``` |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to add tests. Also type and format checking will be useful (e.g. pyright, black). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[project] | ||
name = "symcctrace" | ||
version = "1.0" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .parsing import parse_trace | ||
from .data import TraceData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
RuntimeCommon.cpp
there is a goal to do the runtime thread local. You use here the global state. Could you refactor your code so that it can be run in multiple threads in future?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to understand. Are we talking about threads of the tested program ? In my understanding, the shadow memory pages, the set of accumulated path constraints, etc. should be shared across threads of the target program.
As this tracing feature simply takes snapshots of this state, I do not see how it could be thread local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in case of multi-thread tested programs the set of accumulated path constraints should be thread local, because each thread will execute potentially different path, so the set should be different. According to the shadow memory, you are right.
My idea was to have the possibility to execute the multiple runtimes in different threads to implement #14 without forking, but I don't know if that has a much sense.
Nevertheless if there is a possibility to doesn't have global state we should work in that direction, because programs with such state doesn't scale well.