-
Notifications
You must be signed in to change notification settings - Fork 0
Detecting some memory corruption errors in C by intercepting accesses to heap objects for bounds checking
License
michelrdagenais/MallocSan
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is an experiment to use tainted pointers in order to check memory accesses. Based on some criteria (see the relevant environment variables below), pointers returned by malloc are tainted, adding a taint in the 2 MS bytes unused bits. When a tainted pointer is accessed, a SIGSEGV occurs. The SIGSEGV handler then needs to untaint the offending register, check the access versus the object bounds, make the access, and retaint the offending register. Getting back the control after the access, to retaint the register, is the tricky part. The access instruction is executed out of line such that the retainting code can be executed after. This is achieved using libpatch and libolx. As an optimisation, the instructions accessing tainted registers car be patched to jump to a "pre" handler to check and untaint the tainted registers, execute the access instruction out of line, and then jump to a "post" handler that retaints the tainted registers, before continuing with the instructions that follow the access. It is much faster than hitting a SIGSEGV. However, if only a small fraction of the pointers are tainted, saving registers and calling pre and post handlers every time for nothing may end up being longer. The following environment variables control the pointer tainting and the execution: DW_MIN_SIZE only malloc calls for size larger or equal to this number of bytes will be tainted. DW_MAX_SIZE only malloc calls for size smaller or equal to this number of bytes will be tainted. DW_MAX_NB_PROTECTED maximum number of malloc calls that will be tainted. DW_FIRST_PROTECTED first malloc call to be tainted, the calls before that number will not be tainted. DW_INSN_ENTRIES number of instructions that we expect to patch or trap. The hash table to store entries will be about twice that size. DW_LOG_LEVEL determine the verbosity of the output, from 0, no output, to 4, errors, warnings, info and debug output DW_STATS_FILE name of the file that will contain statistics about patched instructions, by default .taintstats.txt DW_STRATEGY patching strategy is TRAP for 0 and JUMP for 1. Currently, we concentrate on TRAP because of limitations in libpatch. DW_CHECK_HANDLING when 1, more consistency checking is performed, default is 0. A sample invocation of the application "simple" can look like the following: time LD_PRELOAD=./libmallocsan.so DW_STATS_FILE=stats.txt DW_STRATEGY=1 DW_LOG_LEVEL=0 ./simple 10 1000000 2>out.txt Debugging the application under GDB with LD_PRELOAD does not work very well. It is easier when library libmallocsan.so is forced as a dependency with: patchelf --add-needed <absolute path>/libmallocsan.so ./simple Even though running the application with gdb is not very practical, because of the repeated segmentation violations causing traps handling the access to tainted pointers, at least analyzing the core dump with gdb works well. There are currently a number of limitations. a) The current version of Libpatch does not handle patching adjacent short (smaller than 5 bytes) instructions. b) In addition, when a short instruction is patched, two or more instructions are replaced and executed out of line. However, the post handler is called after those instructions, instead of directly after the patched instruction. For this reason, the retainting may not work properly (retainting too late, missing a tainted access or corrupting a register that now contains something else). c) Furthermore, if a call or jump instruction accesses a tainted pointer (e.g. calling a pointer to function stored in a tainted malloc object), libpatch cannot call a post handler after executing the instruction out of line. Hopefully, those limitations will be removed in subsequent versions of libpatch. d) The pre and post handlers currently assume that memory accesses are performed using general purpose registers for the base and index. However, VEX instructions can use VSIB memory accesses and use a vector register as index. When this is used without a base, the vector register index contains several pointers, depending on the mask register. This case should be added to the pre and post handlers, untainting and retainting multiple pointers in a vector register. e) Another problem is that the tainted pointers can eventually reach the kernel and the SIGSEGV handler will not get called there. One solution is to use a special version of the Linux kernel that checks and untaints pointers. Another solution is to wrap all the libc functions that implement system calls to check and untaint pointers. A number of libc functions are wrapped in dw-wrap-glibc, but there are cases that are difficult to handle, and currently only a subset of functions are wrapped. Retainting can be done in two different ways. The tainted register can be saved, untainted, accessed, and restored to its previous value after the access. Alternatively, the taint can be extracted and saved, the register untainted and then accessed, and finally the taint can be readded. This is what is implemented. Indeed, there are cases where the tainted register is autoincremented, typically rsi and rdi for instructions with the rep prefix. With the TRAP strategy, this access checking library was able to execute correctly several benchmarks from the SPEC cpu2017 suite. The TRAP strategy does not suffer from some of the limitations, a) and b). The other limitations remain but did not affect the correct execution of the few benchmarks attempted. However, the TRAP strategy is much, much slower than the JUMP strategy. Nonetheless, with the TRAP strategy we can verify if the current untainting and retainting algorithm works well, and get statistics about the number of access instructions affected and the number of times they are executed. The more limited cases where we can use the JUMP strategy gives an idea of the performance that could be obtained if limitations a) and b) were removed and the JUMP strategy could be used for non-trivial programs.
About
Detecting some memory corruption errors in C by intercepting accesses to heap objects for bounds checking
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published