Skip to content

Commit

Permalink
i#5505 kernel trace: Upgrade dr$sim to record syscall's kernel PT (#5594
Browse files Browse the repository at this point in the history
)

Updates drpt2ir and drpttracer and adds a new module syscall_pt_trace that provides an API for syscall's kernel PT tracing:

(1) Updates drpttracer extension.
Adds drpttracer_create_tracer() and drpttracer_destroy_tracer().
In the original implementation, the client will get a tracer by calling drpttracer_start_tracing() and destroy the tracer when calling drpttracer_end_tracing(). If the client traces multiple times, the tracer's creation and destruction will cause a big overhead. Currently, the client can use drpttracer_create_tracer() to get a tracer and use drpttracer_start_tracing() and drpttracer_end_tracing() to start and stop a tracing processes, then use drpttracer_destory_tracer() to destroy the tracer. In the future implementation, multiple traces can share one tracer, and the overhead is reduced to an acceptable level.

(2) Adds the syscall_pt_trace module for dr$sim to record syscall's kernel PT trace.
This module wraps a PT tracer as a class and provides APIs for thread-local syscall tracing. Also, it provides APIs to check the currently tracing syscall's sysnum and last recorded syscall's id. In this PR, syscall_pt_trace will create a trace handle for each syscall, making the decoding process simple for the decoder.
In the current implementation, syscall_pt_trace module can cause significant overhead. A pending optimization can reduce the overhead caused by pttracer initialization. The optimization method is the module use one pttracer handle for all syscalls in one thread and dumps the PT segments into one file. This method needs to upgrade drpt2ir to support decoding PT fragments. We will merge the optimized code after we finish implementing drpt2ir's patch.

(3) Updates dr$sim to support kernel tracing.
In the current implementation, every monitored thread will create a tracer during the thread start, and dr$sim will start the syscall kernel tracing in the pre-syscall callback and stop the syscall kernel tracing at post-syscall callbacks. And every syscall will dump the PT data and metadata to files. Also, before the process end, dr$sim will use the 'perf record --kcore' command to copy out kcore and kallsyms used in the PT post processer.

(4) Updates drpt2ir.
The current implementation supports using the metadata file to initiate an instance of pt2ir_config_t and replaces some c pointer parameters with c++ reference parameters.

(5) Updates drpt2trace.
Adds a new option to support directly reading metadata files to initialize pt2ir_config_t. In the current implementation, drpt2trace will first use a metadata file to init the instance of pt2ir_config_t. Then if the user specifies some options, it will overwrite the corresponding field in the instance of pt2ir_config_t.

(6) Add a new test for dr$sim to check that the kernel PT output is correct.
Defines a new macro for kernel feature check and adds a simple test for dr$sim. The macro will run dr$sim to generate kernel PT and drpt2trace to check whether the generated kernel PT can be decoded.

Issue: #5505
  • Loading branch information
dolanzhao authored Aug 12, 2022
1 parent 1e6f3ea commit c0985a2
Show file tree
Hide file tree
Showing 20 changed files with 1,125 additions and 282 deletions.
2 changes: 2 additions & 0 deletions api/docs/CMake_doxyfile.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ endforeach (dir)

# Add drcachesim dirs (i#2006).
set(ext_dirs ${ext_dirs} "${proj_bindir}/clients/include/drmemtrace")
# Add drpt2trace dirs
set(ext_dirs ${ext_dirs} "${proj_srcdir}/clients/drcachesim/drpt2trace")

include("${srcdir}/CMake_doxyutils.cmake")
set(input_paths srcdir proj_srcdir header_dir gendox_dir outdir)
Expand Down
8 changes: 8 additions & 0 deletions api/docs/release.dox
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,14 @@ Further non-compatibility-affecting changes include:
online filtering for only instruction or only data entries respectively. The
old option -L0_filter is deprecated but still supported for backward
compatibility. It simply sets both the new options.
- Added a new DR extension, namely "drpttracer", which provides clients with tracing
functionality via Intel's PT instruction tracing feature. This feature is still
experimental and available only on Intel processors.
- Added new drmemtrace options -enable_kernel_tracing that allow recording each
syscall's Kernel PT and write every syscall's PT and metadata to files in
-outdir/kernel.raw/ for later offline analysis. This feature is still experimental
and available only on Intel processors that support the Intel@ Processor Trace
feature.

The changes between version 9.0.1 and 9.0.0 include the following compatibility
changes:
Expand Down
10 changes: 9 additions & 1 deletion clients/drcachesim/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,6 @@ add_exported_library(directory_iterator STATIC common/directory_iterator.cpp)
add_dependencies(directory_iterator api_headers)
target_link_libraries(directory_iterator drfrontendlib)

# Right now drpt2trace only works on Linux x86_64.
if (BUILD_PT_POST_PROCESSOR)
add_subdirectory(drpt2trace)
endif (BUILD_PT_POST_PROCESSOR)
Expand Down Expand Up @@ -303,6 +302,12 @@ macro(add_drmemtrace name type)
tracer/func_trace.cpp
${client_and_sim_srcs}
)
if (BUILD_PT_TRACER)
set(drmemtrace_srcs ${drmemtrace_srcs}
tracer/syscall_pt_trace.cpp
)
add_definitions(-DBUILD_PT_TRACER)
endif ()
if (libsnappy)
set(drmemtrace_srcs ${drmemtrace_srcs}
tracer/snappy_file_writer.cpp
Expand All @@ -322,6 +327,9 @@ macro(add_drmemtrace name type)
use_DynamoRIO_extension(${name} droption)
use_DynamoRIO_extension(${name} drcovlib${ext_sfx})
use_DynamoRIO_extension(${name} drbbdup${ext_sfx})
if (BUILD_PT_TRACER)
use_DynamoRIO_extension(${name} drpttracer${ext_sfx})
endif ()
if (libsnappy)
target_link_libraries(${name} snappy)
endif ()
Expand Down
10 changes: 10 additions & 0 deletions clients/drcachesim/common/options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -640,3 +640,13 @@ droption_t<bool> op_enable_drstatecmp(
DROPTION_SCOPE_CLIENT, "enable_drstatecmp", false, "Enable the drstatecmp library.",
"When true, this option enables the drstatecmp library that performs state "
"comparisons to detect instrumentation-induced bugs due to state clobbering.");

#ifdef BUILD_PT_TRACER
droption_t<bool> op_enable_kernel_tracing(
DROPTION_SCOPE_ALL, "enable_kernel_tracing", false, "Enable Kernel Intel PT tracing.",
"By default, offline tracing only records a userspace trace. If this option is "
"enabled, offline tracing will record each syscall's Kernel PT and write every "
"syscall's PT and metadata to files in -outdir/kernel.raw/ for later offline "
"analysis. And this feature is available only on Intel CPUs that support Intel@ "
"Processor Trace.");
#endif
3 changes: 3 additions & 0 deletions clients/drcachesim/common/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -141,4 +141,7 @@ extern droption_t<unsigned int> op_miss_count_threshold;
extern droption_t<double> op_miss_frac_threshold;
extern droption_t<double> op_confidence_threshold;
extern droption_t<bool> op_enable_drstatecmp;
#ifdef BUILD_PT_TRACER
extern droption_t<bool> op_enable_kernel_tracing;
#endif
#endif /* _OPTIONS_H_ */
6 changes: 6 additions & 0 deletions clients/drcachesim/common/trace_entry.h
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,12 @@ typedef enum {
*/
TRACE_MARKER_TYPE_PAGE_SIZE,

/**
* This marker is emitted prior to each system call when -enable_kernel_tracing is
* specified. The marker value contains a unique system call identifier.
*/
TRACE_MARKER_TYPE_SYSCALL_ID,

// ...
// These values are reserved for future built-in marker types.
// ...
Expand Down
143 changes: 71 additions & 72 deletions clients/drcachesim/drpt2trace/drpt2trace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,11 @@ static droption_t<unsigned long long> op_elf_base(
"This is an optional option in ELF Mode. Specifies the runtime load address of "
"the elf file. For kernel cases, this always should be 0x0, so it is not required. "
"But if -elf specified file's runtime load address is not 0x0, it must be set.");
static droption_t<std::string> op_raw_pt_metadata(
DROPTION_SCOPE_FRONTEND, "raw_pt_metadata", "",
"[Optional] Path to the metadata file of PT raw trace",
"Specifies the file path of the metadata file of PT raw trace. This file is "
"generated by the client drcachesim.");

static droption_t<std::string>
op_primary_sb(DROPTION_SCOPE_FRONTEND, "primary_sb", "",
Expand Down Expand Up @@ -117,74 +122,74 @@ static droption_t<std::string>
*/
static droption_t<int> op_pt_cpu_family(
DROPTION_SCOPE_FRONTEND, "pt_cpu_family", 0,
"[libipt Required] set cpu family for PT raw trace",
"[libipt Optional] set cpu family for PT raw trace",
"Set cpu family to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this option "
"from the data generated by the perf record command.");
static droption_t<int> op_pt_cpu_model(
DROPTION_SCOPE_FRONTEND, "pt_cpu_model", 0,
"[libipt Required] set cpu model for PT raw trace",
"[libipt Optional] set cpu model for PT raw trace",
"Set cpu model to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this option "
"from the data generated by the perf record command.");
static droption_t<int> op_pt_cpu_stepping(
DROPTION_SCOPE_FRONTEND, "pt_cpu_stepping", 0,
"[libipt Required] set cpu stepping for PT raw trace",
"[libipt Optional] set cpu stepping for PT raw trace",
"Set cpu stepping to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this option "
"from the data generated by the perf record command.");
static droption_t<int>
op_pt_mtc_freq(DROPTION_SCOPE_FRONTEND, "pt_mtc_freq", 0,
"[libipt Required] set mtc frequency for PT raw trace",
"[libipt Optional] set mtc frequency for PT raw trace",
"Set mtc frequency to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this "
"option from the data generated by the perf record command.");
static droption_t<int>
op_pt_nom_freq(DROPTION_SCOPE_FRONTEND, "pt_nom_freq", 0,
"[libipt Required] set nom frequency for PT raw trace",
"[libipt Optional] set nom frequency for PT raw trace",
"Set nom frequency to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this "
"option from the data generated by the perf record command.");
static droption_t<int> op_pt_cpuid_0x15_eax(
DROPTION_SCOPE_FRONTEND, "pt_cpuid_0x15_eax", 0,
"[libipt Required] set the value of cpuid[0x15].eax for PT raw trace",
"[libipt Optional] set the value of cpuid[0x15].eax for PT raw trace",
"Set the value of cpuid[0x15].eax to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this option from the "
"data generated by the perf record command.");
static droption_t<int> op_pt_cpuid_0x15_ebx(
DROPTION_SCOPE_FRONTEND, "pt_cpuid_0x15_ebx", 0,
"[libipt Required] set the value of cpuid[0x15].ebx for PT raw trace",
"[libipt Optional] set the value of cpuid[0x15].ebx for PT raw trace",
"Set the value of cpuid[0x15].ebx to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this option from the "
"data generated by the perf record command.");
static droption_t<std::string>
op_sb_sysroot(DROPTION_SCOPE_FRONTEND, "sb_sysroot", "",
"[libipt-sb Optional] set sysroot for sideband stream",
"Set sysroot to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this "
"option from the data generated by the perf record command.");
static droption_t<unsigned long long>
op_sb_sample_type(DROPTION_SCOPE_FRONTEND, "sb_sample_type", 0x0,
"[libipt-sb Required] set sample type for sideband stream",
"Set sample type to the given value(the given value must be a "
"hexadecimal integer and default: 0x0). Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this "
"option from the data generated by the perf record command.");
static droption_t<std::string>
op_sb_sysroot(DROPTION_SCOPE_FRONTEND, "sb_sysroot", "",
"[libipt-sb Optional] set sysroot for sideband stream",
"Set sysroot to the given value. Please run the "
"libipt/script/perf-get-opts.bash script to get the value of this "
"option from the data generated by the perf record command.");
static droption_t<unsigned long long>
op_sb_time_zero(DROPTION_SCOPE_FRONTEND, "sb_time_zero", 0,
"[libipt-sb Required] set time zero for sideband stream",
"[libipt-sb Optional] set time zero for sideband stream",
"Set perf_event_mmap_page.time_zero to the given value. Please run "
"the libipt/script/perf-get-opts.bash script to get the value of "
"this option from the data generated by the perf record command.");
static droption_t<unsigned int>
op_sb_time_shift(DROPTION_SCOPE_FRONTEND, "sb_time_shift", 0,
"[libipt-sb Required] set time shift for sideband stream",
"[libipt-sb Optional] set time shift for sideband stream",
"Set perf_event_mmap_page.time_shift to the given value. Please run "
"the libipt/script/perf-get-opts.bash script to get the value of "
"this option from the data generated by the perf record command.");
static droption_t<unsigned int>
op_sb_time_mult(DROPTION_SCOPE_FRONTEND, "sb_time_mult", 1,
"[libipt-sb Required] set time mult for sideband stream",
"[libipt-sb Optional] set time mult for sideband stream",
"Set perf_event_mmap_page.time_mult to the given value. Please run "
"the libipt/script/perf-get-opts.bash script to get the value of "
"this option from the data generated by the perf record command.");
Expand All @@ -208,9 +213,14 @@ static droption_t<unsigned long long> op_sb_kernel_start(
*/

static void
print_results(IN instrlist_t *ilist)
print_results(IN instrlist_autoclean_t &ilist)
{
instr_t *instr = instrlist_first(ilist);
if (ilist.data == nullptr) {
std::cerr << "The list to store decoded instructions is not initialized."
<< std::endl;
return;
}
instr_t *instr = instrlist_first(ilist.data);
uint64_t count = 0;
while (instr != NULL) {
count++;
Expand All @@ -219,7 +229,7 @@ print_results(IN instrlist_t *ilist)

if (op_print_trace.specified()) {
/* Print the disassemble code of the trace. */
instrlist_disassemble(GLOBAL_DCONTEXT, 0, ilist, STDOUT);
instrlist_disassemble(GLOBAL_DCONTEXT, 0, ilist.data, STDOUT);
}
std::cout << "Number of Instructions: " << count << std::endl;
}
Expand Down Expand Up @@ -254,23 +264,11 @@ option_init(int argc, const char *argv[])
print_usage();
return false;
}
std::vector<droption_parser_t *> required_op_list;
required_op_list.push_back(&op_raw_pt);
required_op_list.push_back(&op_pt_cpu_family);
required_op_list.push_back(&op_pt_cpu_model);
required_op_list.push_back(&op_pt_cpu_stepping);
required_op_list.push_back(&op_pt_mtc_freq);
required_op_list.push_back(&op_pt_nom_freq);
required_op_list.push_back(&op_pt_cpuid_0x15_eax);
required_op_list.push_back(&op_pt_cpuid_0x15_ebx);

for (auto &op : required_op_list) {
if (!op->specified()) {
std::cerr << CLIENT_NAME << ": option " << op->get_name() << " is required."
<< std::endl;
print_usage();
return false;
}
if (!op_raw_pt.specified()) {
std::cerr << CLIENT_NAME << ": option " << op_raw_pt.get_name() << " is required."
<< std::endl;
print_usage();
return false;
}

/* Because Intel PT doesn't save instruction bytes or memory contents, the converter
Expand All @@ -296,23 +294,22 @@ option_init(int argc, const char *argv[])
}

/* Check if the required options for sideband mode are specified. */
if (op_primary_sb.specified()) {
std::vector<droption_parser_t *> sb_required_op_list;
sb_required_op_list.push_back(&op_sb_sample_type);
sb_required_op_list.push_back(&op_sb_time_zero);
sb_required_op_list.push_back(&op_sb_time_shift);
sb_required_op_list.push_back(&op_sb_time_mult);
for (auto &op : sb_required_op_list) {
if (!op->specified()) {
std::cerr << CLIENT_NAME << ": option " << op->get_name()
<< " is required in sideband mode." << std::endl;
print_usage();
return false;
}
}
if (op_primary_sb.specified() && !op_sb_sample_type.specified()) {
std::cerr << CLIENT_NAME << ": option " << op_sb_sample_type.get_name()
<< " is required in sideband mode." << std::endl;
print_usage();
return false;
}
return true;
}

#define IF_SPECIFIED_THEN_SET(__OP_VARIABLE__, __TO_SET_VARIABLE__) \
do { \
if (__OP_VARIABLE__.specified()) { \
__TO_SET_VARIABLE__ = __OP_VARIABLE__.get_value(); \
} \
} while (0)

/****************************************************************************
* Main Function
*/
Expand All @@ -326,6 +323,9 @@ main(int argc, const char *argv[])
}

pt2ir_config_t config = {};
if (op_raw_pt_metadata.specified()) {
config.init_with_metadata(op_raw_pt_metadata.get_value());
}
config.raw_file_path = op_raw_pt.get_value();
config.elf_file_path = op_elf.get_value();
config.elf_base = op_elf_base.get_value();
Expand All @@ -336,34 +336,34 @@ main(int argc, const char *argv[])
std::back_inserter(config.sb_secondary_file_path_list));
config.kcore_path = op_kcore.get_value();

config.pt_config.cpu.family = op_pt_cpu_family.get_value();
config.pt_config.cpu.model = op_pt_cpu_model.get_value();
config.pt_config.cpu.stepping = op_pt_cpu_stepping.get_value();
if (op_pt_cpu_family.get_value() != 0) {
config.pt_config.cpu.vendor = CPU_VENDOR_INTEL;
} else {
config.pt_config.cpu.vendor = CPU_VENDOR_UNKNOWN;
}
config.pt_config.cpuid_0x15_eax = op_pt_cpuid_0x15_eax.get_value();
config.pt_config.cpuid_0x15_ebx = op_pt_cpuid_0x15_ebx.get_value();
config.pt_config.mtc_freq = op_pt_mtc_freq.get_value();
config.pt_config.nom_freq = op_pt_nom_freq.get_value();
config.sb_config.sample_type = op_sb_sample_type.get_value();
config.sb_config.sysroot = op_sb_sysroot.get_value();
config.sb_config.time_zero = op_sb_time_zero.get_value();
config.sb_config.time_shift = op_sb_time_shift.get_value();
config.sb_config.time_mult = op_sb_time_mult.get_value();
config.sb_config.tsc_offset = op_sb_tsc_offset.get_value();
config.sb_config.kernel_start = op_sb_kernel_start.get_value();
/* If the user specifies the following options, drpt2trace will overwrite the
* corresponding fields in the config.
*/
IF_SPECIFIED_THEN_SET(op_pt_cpu_family, config.pt_config.cpu.family);
IF_SPECIFIED_THEN_SET(op_pt_cpu_model, config.pt_config.cpu.model);
IF_SPECIFIED_THEN_SET(op_pt_cpu_stepping, config.pt_config.cpu.stepping);
IF_SPECIFIED_THEN_SET(op_pt_cpuid_0x15_eax, config.pt_config.cpuid_0x15_eax);
IF_SPECIFIED_THEN_SET(op_pt_cpuid_0x15_ebx, config.pt_config.cpuid_0x15_ebx);
IF_SPECIFIED_THEN_SET(op_pt_mtc_freq, config.pt_config.mtc_freq);
IF_SPECIFIED_THEN_SET(op_pt_nom_freq, config.pt_config.nom_freq);
IF_SPECIFIED_THEN_SET(op_sb_sample_type, config.sb_config.sample_type);
IF_SPECIFIED_THEN_SET(op_sb_sysroot, config.sb_config.sysroot);
IF_SPECIFIED_THEN_SET(op_sb_time_zero, config.sb_config.time_zero);
IF_SPECIFIED_THEN_SET(op_sb_time_shift, config.sb_config.time_shift);
IF_SPECIFIED_THEN_SET(op_sb_time_mult, config.sb_config.time_mult);
IF_SPECIFIED_THEN_SET(op_sb_tsc_offset, config.sb_config.tsc_offset);
IF_SPECIFIED_THEN_SET(op_sb_kernel_start, config.sb_config.kernel_start);
config.pt_config.cpu.vendor =
config.pt_config.cpu.family != 0 ? CPU_VENDOR_INTEL : CPU_VENDOR_UNKNOWN;

/* Convert the PT raw trace to DR IR. */
std::unique_ptr<pt2ir_t> ptconverter(new pt2ir_t());
if (!ptconverter->init(config)) {
std::cerr << CLIENT_NAME << ": failed to initialize pt2ir_t." << std::endl;
return FAILURE;
}
instrlist_t *ilist = nullptr;
pt2ir_convert_status_t status = ptconverter->convert(&ilist);
instrlist_autoclean_t ilist = { GLOBAL_DCONTEXT, nullptr };
pt2ir_convert_status_t status = ptconverter->convert(ilist);
if (status != PT2IR_CONV_SUCCESS) {
std::cerr << CLIENT_NAME << ": failed to convert PT raw trace to DR IR."
<< "[error status: " << status << "]" << std::endl;
Expand All @@ -373,6 +373,5 @@ main(int argc, const char *argv[])
/* Print the count and the disassemble code of DR IR. */
print_results(ilist);

instrlist_clear_and_destroy(GLOBAL_DCONTEXT, ilist);
return SUCCESS;
}
Loading

0 comments on commit c0985a2

Please sign in to comment.