dynamic: Time Travel Debugging (TTD) integration #1649

atxr · 2023-07-18T11:23:16Z

Summary

Develop a TTD exctractor and add keywords to the rules to use trace files generated by TTD to improve capa dynamic analysis and defeat packers.

Motivation

Because capa is trying to develop some dynamic analysis features, I would like to suggest using Microsoft TTD. Thanks to TTD, you can generate trace files that record the context of the binary at each instruction.
I could develop a TTD extractor that would add new features to capa from the trace file.

In the end, one could scan a binary sample with a TTD trace and use new rules to select a position in a trace where capa should work.
The TTD extractor would need time indicators to know when to scan in the timeline. These indicators can be TTD cursors (time position) or functions that would be hooked in the trace.
Also, the extractor would require a memory range to scan. Hence, several optimizations can be developed like scanning the heap, the module memory, the stack...

Here is a quick look of what a rule could look like:

...
- ttd:
  - time:
    - cursor: ["100:0", "200:a"]                                       # provide hardcoded time position
    - hook: ["ntdll!NtCreateThreadEx", "ntdll!NtCreateUserProcess"]    # provide functions to hook
  - memory: ["heap", "module"]                                         # select memory ranges to scan
...

This rule tries to detect thread and process creation, and scan the heap at these time positions to search some shellcode that could be loaded by a packer, or some useful strings loaded dynamically.

Alternative projects

I'm currently working on https://github.com/airbus-cert/yara-ttd which aims to apply yara rules on TTD trace files thanks to these TTD bindings.
The tool is currently working and has many use cases when dealing with yara rules on a packed binaries.

I also read the dynamic-feature-extraction branch you are working on to integrate CAPE.
The TTD integration could work alongside this project to provide a more precise analysis and an in-depth dynamic memory scan.

I didn't started yet to implement the extractor because I wanted some validation from the capa team before. Of course, I will welcome any kind of advice!

The text was updated successfully, but these errors were encountered:

williballenthin · 2023-07-18T14:23:59Z

hey @atxr!

I am very keen on integrating TTD with capa. Like you said, the technology might make it easier to analyze samples that are packed.

It's an interesting idea to specify a collection of hooks/events at which point to analyze the state of the process and find capabilities. This makes me think of using capa against a memory dump (a good idea, but something we don't have today). So, I think this is feasible, and would enable some complementary enhancements.

I'm not yet convinced of the proposal to extend the rule format to specify the hook locations. This sort of thing seems orthogonal to the description of a capability. That is, the author describing how to find browser cookie stealing behavior shouldn't have to be an expert in VMProtect and decide where to hook. Instead, I'd recommend this be provided either as a CLI argument (in the case of the TTD cursor ID) or use some reasonable defaults (like ExitProcess, WriteFile, etc.).

For awareness, I had previously been thinking that we'd use TTD as a sort of sandbox that we can use to capture the API trace and feed into the dynamic analyzer that @yelhamer is working on. I think this idea can be independent of what you propose here and we should explore both.

mr-tz · 2023-07-18T19:36:09Z

I think capa + TTD could be amazing! We've also discussed this before with @xusheng6, so tagging him here.

atxr · 2023-07-19T07:59:27Z

Thank you for your feedback! I totally agree with your proposal!

If I summarize a bit this TTD integration:

It should extend capa to analyze memory dumps based on defaults hooks
These hooks/positions should be tweakable thanks to extra command line args in capa

In the end, it should look like:

capa sample.exe sample.run                                       # analyze TTD trace sample.run with the default hooks
capa sample.exe sample.run --ttd-hook ntdll!NtCreateUserProcess  # specify a hook
capa sample.exe sample.run --ttd-cursor 100:1A                   # specify a cursor position

I have few questions though regarding the implementation:

Should I create a new TTD feature extractor like @yelhamer did for CAPE?
If so, should I base my PR on master or on the dynamic-feature-extraction branch? Even if CAPE and TTD aren't linked, I saw you discussed and work a lot on how to integrate these dynamic features in this branch, and I was wondering if there was some code that could be necessary for me in this branch.

I'm still trying to familiarize myself with the project to figure out how I'll integrate TTD, I might come with other questions soon 🙂

williballenthin · 2023-07-19T09:03:45Z

If I summarize a bit this TTD integration:

It should extend capa to analyze memory dumps based on defaults hooks

These hooks/positions should be tweakable thanks to extra command line args in capa

Both of these sound great, and so do the proposed command lines.

Should I create a new TTD feature extractor like @yelhamer did for CAPE?

I think it should look more like the Binary Ninja feature extractor that @xusheng6 added in #1343. That's because I'd recommend that you focus on static analysis of memory snapshots, not dynamic analysis API traces. Conceptually, capa static analysis is things like functions/basic blocks/instructions while its (proposed) dynamic analysis is things like API calls found in threads and processes. In this issue, lets focus on static analysis of memory snapshots derived from TTD traces. I'll open another issue (#1655) to track the use of TTD for dynamic analysis. If you'd prefer to work on that feature, no problem! (Though, I'd suggest we wait until the CAPE implementation is done and lessons are learned.)

Given that the idea is to have capa analyze snapshots of memory at specific points of time in TTD traces, I wonder if we can start by building:

TTD memory snapshot exporter: given a TTD trace and cursor position (or later, hook specification), write a memory snapshot(s) and metadata to a file(s). Ideally we could use a common format, like minidump or similar, but not required.
a memory snapshot feature extractor for capa. static analysis of memory dumps to find capabilities #1654

These could be built in parallel as temporarily separate utilities. Then we can wire 1 and 2 within capa and add the CLI arguments, etc. The benefit is that we might also be able to provide memory snapshots from other systems, like sandboxes, which would be neat. I also suspect the TTD memory snapshot exporter might be generally useful for other things like dumping unpacked executables.

I'm just brainstorming here. What do you think?

williballenthin · 2023-07-19T09:11:18Z

added #1654 to track static analysis of memory snapshots

williballenthin · 2023-07-19T09:20:17Z

added #1655 to track dynamic analysis via TTD traces.

atxr · 2023-07-19T09:31:13Z

First of all, I think your ideas are really great!

TTD memory snapshot exporter: given a TTD trace and cursor position (or later, hook specification), write a memory snapshot(s) and metadata to a file(s). Ideally we could use a common format, like minidump or similar, but not required.

Just to be sure, should this snapshot exporter be part of capa or should it be a dependency that I could develop in another repo?
For the second point, I'll start looking at BN feature extractor to understand better then.

williballenthin · 2023-07-19T10:13:59Z

should this snapshot exporter be part of capa or should it be a dependency that I could develop in another repo?

I think this can be up to you. If you can find other consumers for the library, then maybe it makes sense to be external. Or maybe capa is just a good central place to store and distribute this. shrug. It's also fine to start in your own repo and then merge into capa when you're happy.

For the second point, I'll start looking at BN feature extractor to understand better then.

Great!

I'm also going to do a bit of background research on what we'd need to implement a memory snapshot feature extractor. At least so I can talk intelligently with you about it, and/or to write code alongside you :-)

atxr · 2023-07-19T10:40:57Z

Awesome! Then I'll start with an external repo and see next if it makes sense to merge into capa!
I saw your links in #1654 I'll take a look!
Thanks again for your interest in this feature!

N3mes1s · 2024-02-02T16:01:19Z

FYI I think you could use the programmatic api to instrument and run the capas

TTD live recorder API sample
This is a sample demonstrating how a program can use TTD's live recording API to record portions of itself.

https://github.com/microsoft/WinDbg-Samples/tree/master/TTD/LiveRecorderApiSample

williballenthin mentioned this issue Jul 19, 2023

static analysis of memory dumps to find capabilities #1654

Open

williballenthin mentioned this issue Jul 19, 2023

dynamic analysis via TTD traces #1655

Open

williballenthin added the enhancement New feature or request label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic: Time Travel Debugging (TTD) integration #1649

dynamic: Time Travel Debugging (TTD) integration #1649

atxr commented Jul 18, 2023 •

edited

Loading

williballenthin commented Jul 18, 2023

mr-tz commented Jul 18, 2023

atxr commented Jul 19, 2023

williballenthin commented Jul 19, 2023 •

edited

Loading

williballenthin commented Jul 19, 2023

williballenthin commented Jul 19, 2023

atxr commented Jul 19, 2023

williballenthin commented Jul 19, 2023

atxr commented Jul 19, 2023

N3mes1s commented Feb 2, 2024

dynamic: Time Travel Debugging (TTD) integration #1649

dynamic: Time Travel Debugging (TTD) integration #1649

Comments

atxr commented Jul 18, 2023 • edited Loading

Summary

Motivation

Alternative projects

williballenthin commented Jul 18, 2023

mr-tz commented Jul 18, 2023

atxr commented Jul 19, 2023

williballenthin commented Jul 19, 2023 • edited Loading

williballenthin commented Jul 19, 2023

williballenthin commented Jul 19, 2023

atxr commented Jul 19, 2023

williballenthin commented Jul 19, 2023

atxr commented Jul 19, 2023

N3mes1s commented Feb 2, 2024

atxr commented Jul 18, 2023 •

edited

Loading

williballenthin commented Jul 19, 2023 •

edited

Loading