Proposal: Copy and Micropatch based target #24

calebh · 2024-04-05T18:01:09Z

My coworker, Phil Zucker, came up with a new clever method of creating micropatches using an ordinary C compiler. The strategy is called "copy and micropatch" and operates similarly to copy and patch JITs. The strategy is based around abuse of the calling convention to force values into certain registers.

The easiest way to illustrate the concept is with an example (taken from Phil's blog):

#include <stdint.h>
uint64_t CALLBACK(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9);
uint64_t PATCHCODE(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9){
    // Some random patch code here
    if(rcx >= r8){
        rdi = rsi * rdx; 
    }
    // End patchcode
    return CALLBACK(rdi, rsi, rdx, rcx, r8, r9);
}

The calling convention for this snippet ensures that the PATCHCODE receives certain registers as inputs, and the CALLBACK at the end ensures that the variables are placed into the correct registers once the function terminates.

The code is passed through an ordinary C compiler, and the body of PATCHCODE is extracted and inserted somewhere where there is space. This process requires tail-call optimization turned on, which turns the call to CALLBACK into a jump. Through the use of a linker script we could set the CALLBACK symbol to be placed at the detour return point.

With the __attribute__((preserve_none)) tag built into the latest version of Clang, we can get control over many registers (at least on x64). Note that the preserve_none is brand new, I don't think it has landed into any release versions of Clang yet. As an alternative to preserve_none, we could add shims to push/pop registers to ensure the data gets to the right place.

For more info, see Phil's blog here: https://www.philipzucker.com/permutation_compile/

I'm willing to put the time into developing this target for integration into patcherex2. Is there anything that we need to know before forking and getting started? Using the version of Clang with support for preserve_none would be highly desirable.

See also:

The text was updated successfully, but these errors were encountered:

DennyDai · 2024-04-06T19:36:12Z

@calebh Thanks for bringing this idea up! This does indeed look like a very clever way to do instruction-level patching using C code.

I agree using preserve_none is way better / cleaner than adding code to push/pop registers. My main concern is that clang-19 hasn't been officially released yet and preserve_none is currently only supported for x64. So I think it would be great to add clang-19 as another compiler component and keep the current clang compiler component unchanged.

I think a good way to implement this is to add an optional argument language to InsertInstructionPatch, as it will still ultimately behave like an instruction-level patch. Here's a rough idea of what the usage might look like:

p = Patcherex("some_binary", target_opts={"compiler": "clang19"}) # clang 19 component to be implemented

c_code = """
if(rcx >= r8){
    rdi = rsi * rdx; 
}
"""
p.patches.append(InsertInstructionPatch(0xdeadbeef, c_code, language="C"))

Let me know if you have a better idea on how to integrate it into patcherex2 :)

calebh · 2024-04-09T20:07:03Z

An initial implementation for x64 optionally using preserve_none is now working in our fork. See the example here: https://github.com/draperlaboratory/Patcherex2/blob/main/examples/insert_instruction_patch_c/patch.py

Here is the general strategy that I have implemented:

For the most part the logic is the same as an assembly InsertInstructionPatch, except that we compile C instead of assembly. The inserted code consists of the compiled C code concatenated with the moved instructions. This is followed by a jump back to just after the insertion point. The CALLBACK function called by the C code simply jumps the program 1 instruction ahead to the moved instructions (which means it is essentially a nop.) The location of the extern CALLBACK itself is defined using a symbol passed to the linker script.

There are changes in a few different places:

In archinfo, the Amd64Info class now has calling convention and subregister information.
In the InsertInstructionPatch class, the apply method has been split into _apply_asm and _apply_c. The _apply_c function builds the C code required to compile the micropatch, then passes the code string to p.utils.insert_trampoline_code. insert_trampoline_code has been modified to additionally accept a C string as the instrs argument.

The user can also use subregisters by passing them as appropriate to c_in_regs and c_out_regs. For example, the following is okay:

from patcherex2 import *

p = Patcherex("add", target_opts={"compiler": "clang19"})

c_str = """
edi += edi;
edi += 5;
"""

p.patches.append(InsertInstructionPatch(0x114d, c_str, language="C", c_in_regs=["edi"], c_out_regs=["edi"]))
p.apply_patches()

p.binfmt_tool.save_binary()

However you cannot use both rdi and edi at the same time.

What remains to be done:

Add support for floating point registers. On x64 these are registers xmm0 to xmm7
Add support for more architectures by specifying their calling conventions. In particular aarch64 will needed to be updated when preserve_none lands in clang19. Note that copy and micropatch works fine without preserve_none, you just get a lot fewer registers under your control.

Primary files changed:

Here is the current generated C for the example program (the user never sees this):

#include <stdint.h>
extern void __attribute__((preserve_none)) _CALLBACK(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9, uint64_t r11, uint64_t r12, uint64_t r13, uint64_t r14, uint64_t r15, uint64_t rax);

#define return return _CALLBACK(rdi, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy)
void __attribute__((preserve_none)) _MICROPATCH(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9, uint64_t r11, uint64_t r12, uint64_t r13, uint64_t r14, uint64_t r15, uint64_t rax) {
uint64_t _dummy;

rdi += rdi;
rdi += 5;

return;
}
#undef return

DennyDai · 2024-04-10T04:42:04Z

@calebh Thanks for the great work on this so far! The overall code looks pretty good to me. A few thoughts:

It would be great if the generated C code can be shown in the log (maybe at the DEBUG level) to make it easier to debug and understand what's being generated.
I'd personally prefer the get_cc and get_subregisters functions to be in the archinfo component instead of targets, as they provide architecture specific information rather than target specific information.

Let me know if there's anything you need from me to help wrap this up. Once you feel it's ready, please go ahead and open a PR against the main branch. I'll do a thorough code review and testing pass, and then we can get it merged.

Thanks again for driving this forward!

calebh · 2024-04-24T17:42:02Z

The fork is currently in good shape, nearly ready to merge back into the main repo. Most of the remaining tasks revolve around different architectures. What's left to do:

Change the preserve_none register list for x64 once this LLVM pull request lands: Try to use non-volatile registers for preserve_none parameters llvm/llvm-project#88333
Add support for Aarch64 (ARM64) once preserve_none for that platform lands. I have not seen any pull requests for this architecture, so the timeline for adding this seems unclear.
Test other architectures. The method is still somewhat useful for architectures where preserve_none is not supported. However in general you will have less registers under your control.

I do not currently have access to any systems that are not x64. Do you have any system for testing non x64 architectures?

DennyDai · 2024-04-24T19:20:28Z

@calebh Thank you for your efforts to make this happen!
I don't currently have access to any non-x64 systems, for now all the non-x64 archs are being tested with QEMU.
For example tests/test_aarch64.py#L298-L303.
.github/actions/install-patcherex2/action.yml#L14-L20 lists the dependencies required for QEMU tests.

DennyDai · 2024-05-22T00:49:04Z

Implemented in #31

DennyDai added the enhancement New feature or request label Apr 6, 2024

calebh mentioned this issue Apr 26, 2024

Compiler crash when using float register annotation in conjunction with ASM blocks on Aarch64 llvm/llvm-project#90123

Open

calebh mentioned this issue May 21, 2024

Insert instruction patch, lang="C" via copy-and-micropatch strategy #31

Merged

DennyDai closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Copy and Micropatch based target #24

Proposal: Copy and Micropatch based target #24

calebh commented Apr 5, 2024

DennyDai commented Apr 6, 2024

calebh commented Apr 9, 2024

DennyDai commented Apr 10, 2024

calebh commented Apr 24, 2024

DennyDai commented Apr 24, 2024

DennyDai commented May 22, 2024

Proposal: Copy and Micropatch based target #24

Proposal: Copy and Micropatch based target #24

Comments

calebh commented Apr 5, 2024

DennyDai commented Apr 6, 2024

calebh commented Apr 9, 2024

DennyDai commented Apr 10, 2024

calebh commented Apr 24, 2024

DennyDai commented Apr 24, 2024

DennyDai commented May 22, 2024