pinball2elf is a tool to create stand-alone Linux binaries from checkpoints for arbitrary portions of execution of other Linux programs . The checkpoints it converts are called ‘pinballs’ and they are generated by “Program Record Replay Toolkit” (aka PinPlay kit) that is available at Intel PinPlay. pinball2elf was developed by Alexander Isaev between 2015-2017 while he was at Intel. Some enhancements and use-case development was done by Harish Patil (Intel) with input from Trevor Carlson(NUS), Karthik Sankaranarayanan(Intel), and Wim Heirman(Intel).
[8-threaded SPEC2017(OpenMP) ELFies] (TBD)
ELFie CGO2021 presentation recording
Make sure:'perf' is installed and /usr/include/linux/hw_breakpoint.h exists and is not empty.
pinball2elf has been tested with the following OS/tool/hardware combinations:
-
OS: Ubuntu 18.04, gcc: 7.5.0, g++: 7.5.0, 'perf' 4.15.18, GNU ld 2.30 : Skylake
-
OS: Ubuntu 16.04, gcc: 5.4.0, g++: 5.4.0, 'perf' 4.4.228, GNU ld 2.26.1 : Broadwell
-
OS: Ubuntu 16.04, gcc: 8.4.0, g++: 8.4.0, GNU ld 2.26.1
Loader errors: relocation truncated to fit: R_X86_64_32S against
data
-
AMX registers introduced in Sapphire Rapids are not yet handled by pinball2elf. Build your test binaries without AMX instructions and when generating pinballs, pass "-skl" to sde/sde64 specifying 'SkyLake' as the architecture to emulate.
Loader errors: relocation truncated to fit: R_X86_64_32S against
data
-
Large disk space usage while generating ELFies (being debugged).
-
Build pinball2elf
cd src
make all
cd ..
-
Test with a single-threaded pinball
cd examples/ST
./testST.sh
This shows:
Running ../../scripts/pinball2elf.basic.sh pinball.st/log_0
Running ./pinball.st/log_0.elfie
ELFIE_COREBASE=0
core_base: 0
pid: 4203
process_callback() [ inside ELFie] called. Num_threads: 1
tid: 4203
thread_callback() [ inside ELFie] called for thread 0
Hello world 1
---------------------
Running ../../scripts/pinball2elf.perf.sh pinball.st/log_0 st
export ELFIE_PERFLIST=0:0,0:1,1:1
[ based on /usr/include/linux/perf_event.h
Comma separated pairs 'perftype:counter'
perftype: 0 --> HW 1 --> SW
HW counter: 0 --> PERF_COUNT_HW_CPU_CYCLES
HW counter: 1 --> PERF_COUNT_HW_CPU_INSTRUCTIONS
SW counter: 0 --> PERF_COUNT_SW_CPU_CLOCK
... <see perf_event.h:'enum perf_hw_ids' and 'enum perf_sw_ids']
Running ./pinball.st/log_0.perf.elfie
ELFIE_COREBASE=0
ELFIE_PERFLIST=0:0,0:1,1:1
Will set affinity using core_base: 0
elfie tid: 0 core id: 0
------------------------------------------------
------------------------------------------------
Hello world 1
graceful exit SUCCESS
Performance counters reported in st.0.perf.txt
ROI start: TSC 74744470196596112
Thread start: TSC 74744470231500862
------------------------------------------------
Simulation end: TSC 74744470232488632
Sim-end-icount 3922
hw_cpu_cycles:45584 hw_instructions:3964 sw_task_clock:134934
------------------------------------------------
Thread end: TSC 74744470232782076
ROI end: TSC 74744470232944672
hw_cpu_cycles:49566 hw_instructions:4647 sw_task_clock:184142
The tool pinball2elf, supports three types of callbacks:
-p FUNC, --process-cbk FUNC name of process start callback
-e FUNC, --process-exit-cbk FUNC name of process exit callback
-t FUNC, --thread-cbk FUNC name of thread start callback
scripts/
├── pinball2elf.basic.sh
├── pinball2elf.perf.sh
└── pinball2elf.sim.sh
These scripts use the environment variable P2E_TEMP as the location (default '/tmp') of the directory to use to store intermediate temporary files.
instrumentation/
├── basic_callbacks.c
├── environ.c
├── perf_callbacks.c
├── set_heap.c
└── sim_callbacks.c
environ.c : supports the processing of ELFIE_* environment variables.
set_heap.c : supports the handling of brk() system call behavior based on SYSSTATE
A custom callback file is created for each ELFie using the instrumentation files above. That file is copied to the current working directory for debugging/viewing.
- basic ELFie
script: pinball2elf.basic.sh
Usage: scripts/pinball2elf.basic.sh pinball
instrumentation: basic_callbacks.c
callbacks used:
-p elfie_on_start: outputs informational messages; processes SYSSTATE
-t elfie_on_thread_start: outputs informational messages
Graceful exit : NO
Customized basic_callbacks.c file is copied to the working directory for debugging/viewing.
- Simulator(sim) ELFie
script: pinball2elf.sim.sh
Usage: scripts/pinball2elf.sim.sh pinball <message>
instrumentation: sim_callbacks.c
callbacks used:
-p elfie_on_start: processes SYSSTATE; checks if ELFIE_COREBASE is a positive number
-t elfie_on_thread_start: sets affinity for thread etid to 'COREBASE+apptid'
Graceful exit : NO ( the simulator is supposed to end ELFie execution)
Customized sim_callbacks.c file is copied to the working directory for debugging/viewing.
- Performance monitoring(perf) ELFie
script: pinball2elf.perf.sh
Usage : scripts/pinball2elf.perf.sh pinball filename-prefix <use_warmup>
instrumentation: perf_callbacks.c
callbacks used:
-p elfie_on_start: processes SYSSTATE; checks if ELFIE_COREBASE is a positive number
Opens per-thread perf stats files
-t elfie_on_thread_start: sets affinity for thread etid to 'COREBASE+apptid'
* sets up warmup/simulation end handlers
* Uses "ELFIE_WARMUP" to decide whether to use warmup.
* Uses "ELFIE_PCCONT" to decide how to end warmup/simulation regions
* sets up performance counters to end warmup/simulation regions
either using instruction counts or PC + count
* Based on ELFIE_PERFLIST, enables performance counting
( based on /usr/include/linux/perf_event.h
perftype: 0 --> HW 1 --> SW
HW counter: 0 --> PERF_COUNT_HW_CPU_CYCLES
HW counter: 1 --> PERF_COUNT_HW_CPU_INSTRUCTIONS
SW counter: 0 --> PERF_COUNT_SW_CPU_CLOCK
... <see perf_event.h:'enum perf_hw_ids' and 'enum perf_sw_ids')
-e elfie_on_exit: runs in the monitor thread, outputs final performance counter
values for each application thread.
A monitor thread is first created, it creates the main application thread and waits
for it to exit. The main application thread creates other application threads if
needed
Graceful exit : YES: either icount of pc+count used for exiting each thread.
Customized perf_callbacks.c file is copied to the working directory for debugging/viewing.
User-level checkpoints known as pinballs are input to the pinbll2elf tool. These are created using the PinPlay tool kit. A pinball is a collection of files capturing the execution state of a "Region of Interest" (ROI) from an application execution. A typical sequence of commands to generate and test a pinball suitable for pinball2elf is shown below:
(Assuming 'bash' shell)
export SDE_BUILD_KIT=<absolute path to the SDE kit (version 9.0 or higher>
export LD_BIND_NOW=1 #to load any application dynamic libraries early
$SDE_BUILD_KIT/sde -log -log:fat -log:mt -log:compressed bzip2 -log:basename pinball/foo <ROI specification> -- <application arguments>
This step makes sure the pinball can be replayed without injections of any system call side-effects.
$SDE_BUILD_KIT/sde -replay -replay:addr_trans -replay:injection 0 -replay:basename pinball/foo -- $SDE_BUILD_KIT/intel64/nullapp
If the ROI does any file inpout, this replay step may fail. You will need to extract from the pinball the necessary OS state that the ROI uses.
- Build the pintool pinball-sysstate.so
setenv SDE_BUILD_KIT to point to SDE version 9.0 or higher
cd *pinball2elf-directory*
cd pintool/PinballSYSState
make
This will create $SDE_BUILD_KIT/intel64/sde-pinball-sysstate.so.
$SDE_BUILD_KIT/sde64 -t64 $SDE_BUILD_KIT/intel64/sde-pinball-sysstate.so -replay -replay:addr_trans -replay:basename pinball.st/log_0 -sysstate:out pinball.st/log_0 -- $SDE_BUILD_KIT/intel64/nullapp
Will produce sysstate in pinball.st/log_0.sysstate
Producer pinball.st/log_0
Hello world 1
ROI used pre-existing FD : 1
Creates:
pinball.st/log_0.sysstate/
└── workdir
└── FD_1
- During multi-threaded pinball replay, an extra monitoring thread is created to detect 'deadlocks' hence tools relying on thread count may get confused. Adding "-replay:deadlock_timeout 0" during replay prevents the creation of the extra monitoring thread.
A 'pinball' has the register state captured at recording time. This is either for the native x86 micro-architecture (if generated with Pin) or the emulated micro-architecture (if generated with SDE). The pinball2elf tool transfers the register state in a pinball to the ELFie it creates. This register state gets restored before each ELFie-created thread jumps to its application code embedded inside the ELFie.
- An ELFie will only run correctly on machines where the embedded register state is valid.
- The binary used to generate a pinball is compiled for a specific micro-architecture. When creating a pinball using SDE, make sure to use the right micro-architecture flag (e.g. sde -skx for a binary compiled for Skylake server) to embedd the right register state in the pinball and hence in the corresponding ELFie. This is important to preserve the performance characteristics of the original binary.
t.tid.txt* files for pinball
- Build the sde/pintool event_counter.so
setenv SDE_BUILD_KIT to point to SDE version 9.0 or higher
cd *pinball2elf-directory*
cd pintool/EventCounter
make
This will create $SDE_BUILD_KIT/intel64/sde-event-counter.so
- warmup ends/simulation starts at icount 3000 : -control start:icount:3000:global
- simulation ends at icount 3500 : -control stop:icount:3500:global
$SDE_BUILD_KIT/sde64 -t64 $SDE_BUILD_KIT/intel64/sde-event-counter.so -thread_count 1 -prefix pinball.st/log_0 -control start:icount:3000:global -control stop:icount:3500:global -replay -replay:addr_trans -replay:basename pinball.st/log_0 -- $SDE_BUILD_KIT/intel64/nullapp
Hello world 1
global icount 2998 Sim-Start
global icount 3496 Sim-End
Creates event_icount file
cat pinball.st/log_0.event_icount.0.txt
Sim-Start global_icount: 2998
Sim-Start tid: 0 icount 2998
Sim-End global_icount: 3496
Sim-End tid: 0 icount 3496
Fini global_icount: 3922
Fini tid: 0 icount 3922
- Specify warmup/simulation end using PC+count
-control start:address:0x000814187:count84738:global -control stop:address:0x0009decb1:count34069:global
- Also monitor warmup-end and simulation-end PCs
-watch_addr 0x000814187 -watch_addr 0x0009decb1
RPB=<an 8-threaded pinball with known warmup-end/simulation end PC+count values>
$SDE_BUILD_KIT/sde64 -t $SDE_BUILD_KIT/sde-event-counter.so -thread_count 8 -prefix $RPB -watch_addr 0x000814187 -watch_addr 0x0009decb1 -control start:address:0x000814187:count84738:global -control stop:address:0x0009decb1:count34069:global -replay -replay:addr_trans -xyzzy -replay:deadlock_timeout 0 -replay:basename $RPB -- $SDE_BUILD_KIT/intel64/nullapp
Creates 8 $RPB.event_icount.tid.txt files: one for each thread
Example event_icount file:
cat $RPB.event_icount.3.txt
Sim-Start global_icount: 1597134619
Sim-Start tid: 3 icount 217651514
Marker 814187 global_addrcount: 84737
Marker 814187 tid: 3 addrcount 12816
Marker 9decb1 global_addrcount: 0
Marker 9decb1 tid: 3 addrcount 0
Sim-End global_icount: 2400630567
Sim-End tid: 3 icount 278389598
Marker 814187 global_addrcount: 116928
Marker 814187 tid: 3 addrcount 12960
Marker 9decb1 global_addrcount: 34067
Marker 9decb1 tid: 3 addrcount 4368
Fini global_icount: 2400722002
Fini tid: 3 icount 278420906
Addr 814187 global_addrcount: 116928
Addr 814187 tid: 3 addrcount 12960
Addr 9decb1 global_addrcount: 34070
Addr 9decb1 tid: 3 addrcount 4368
The pages inside an ELFie containing application code are marked as not loadable hence tools, such as the Gnu debugger (gdb), can not see application pages inside an ELFie right away after the initial loading of an ELFie. For setting a breakpoint at an application instruction, the suggested way is to first break at elfie_on_start() where all application pages are guaranteed to be in memory and then set a breakpoint at the desired application address(hex). Symbolic debugging of application code is currently not supported with ELFies although pinball2elf can be extended to add application debug information for symbolic debugging. ELFie generation scripts make sure debug information does exist for ELFie callback routines hence they can be debugged symbolically (use the customized callbacks.c file copied to the working directory). For debugging multi-threaded ELFies with gdb, first doing a set detach-on-fork off followed by *break elfie_on_thread_start’ and using info inferior and inferior N commands works well.
A perf ELFie, creates a monitor thread which spawns application main thread and waits for it. For debugging a perf help, use 'set follow-fork-mode' child to break on 'elfie_on_thread_start' routine which is executed in the application master thread.
ELFie execution sometimes ends pre-maturely (before reaching the expected instruction count). This could be because of many reasons:
- The execution diverges and tries to access code/data pages that were not captured.
- Operating system state is not fully captured for the system calls in the region to work correctly.
A common workaround is to find an alternate region which will hopefully avoid these issues. Contributions/suggestions to solve these open issues are most welcome!
- Help solve 'open issues' listed above.
- A more robust SYSSTATE capture: extend pintools/SYSSTATE/pinball-sysstate.cpp
- Extend pinball with OS state using ideas from CRIU
- Symbolic debugging of application code inside an ELFie Generate debug information for the application code inside the ELFie file.
- Similar tools for Windows and MacOS. Pinball generation is supported for Windows and MacOS as well. Consider writing a converter from pinball to Portable Executable (PE) format on Windows and Mach-O format on MacOS.
- Since the ELFie uses lots of mmap calls to allocate each 4KiB page, it's possible to overrun the max number of vm maps. You can "fix" this by adding vm.max_map_count = 2097152 to /etc/sysctl.conf (from Jason Lowe-Power). See the link to reload the configuration after setting the new value.
- Genearating pinballs for SPEC2017 ref inputs requires a lot of memory so if pinball generation fails because of memory issues, try a machine with a larger physical memory.
- Build your binary with for a generic x86 processor architecture: (gcc/g++) -march=x86_64 or -march=core2.
- Adding '-static' will make sure all libraries are linked in statically so no surprises on moving to a different machine.
- Generate pinball using "sde -nhm" so no new registers are saved in the initial pinball state.
- Public Release and Validation of SPEC CPU2017 PinPoints
- We have released the representative ELFies of a subset of the multi-threaded SPEC CPU2017 benchmarks here