-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: instruction scope #930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
CHANGELOG updated or no update needed, thanks! 😄
Really cool! This is definitely something we should eventually merge and a great preview for related I think I'd like to see For - operand[{0,1,n}].number: ...
- operand[{0,1,n}].offset: ... For - operand[{0,1,n}].number: ...
- operand[{0,1,n}].offset: ...
- operand[{0,1,n}].string: ...
- operand[{0,1,n}].substring: ...
- operand[{0,1,n}].bytes: ... which enables rules like - call:
- api: WinExec
- operand[0].string: "ipconfig.exe /all" Also, where do the - operand[1].number/x32: 0x100 My preference would be to eliminate the - instruction:
- arch: i386
- mnemonic: mov
- operand[1].offset: 0x10 versus - instruction:
- mnemonic: mov
- operand[1].offset/x32: 0x10 |
this makes sense to me. I used |
I think this is pretty reasonable. Initially I had imagined trying to infer the type based on the value (e.g. I don't know that |
These are not supported for operands right now. They're just extracted as number/offset features right now. I expect
I think this might be a good idea now that we have ❯ rg "(offset|number)/x" -l
linking/runtime-linking/access-peb-ldr_data.yml
linking/runtime-linking/get-ntdll-base-address.yml
linking/runtime-linking/get-kernel32-base-address.yml
nursery/log-keystrokes-via-raw-input-data.yml
communication/socket/tcp/send/obtain-transmitpackets-callback-function-via-wsaioctl.yml
host-interaction/hardware/cpu/get-number-of-processors.yml
host-interaction/process/create/create-a-process-with-modified-io-handles-and-window.yml
host-interaction/process/get-process-heap-force-flags.yml
host-interaction/process/get-process-heap-flags.yml
lib/peb-access.yml
anti-analysis/anti-forensic/patch-process-command-line.yml
anti-analysis/anti-debugging/debugger-detection/check-for-peb-ntglobalflag-flag.yml
load-code/pe/enumerate-pe-sections.yml
load-code/pe/rebuild-import-table.yml
load-code/pe/parse-pe-header.yml It'll also reduce the number of features we extract and match against. Created issue #932 to track this proposal. |
@mike-hunhoff I've renamed the operand feature from "immediate" to "number" as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once test pass, I'm in favor of merging this!
I've xfail'd the SMDA tests in recognition of #937 |
awesome! |
closes #767
closes #931
This PR introduces support for a new matching scope "instruction" and two features for matching operand values (immediate constants and memory offsets). The instruction scope is intended to enable matching of mnemonic + operand value combinations, such as
cmp ???, 0x11223344
. This should enable more precise rules to replace existing logic like:which may become:
You can use the instruction scope in the
rule.meta.scope
field or as a subscope (via blockinstruction:
) within another rule (as above). When used as a subscope, a top leveland:
is implied, so the following are equivalent:is equivalent to the more verbose form:
And, of course, you can have complex logic in the instruction scope:
Within the instruction scope, you can reference all the existing features that were already extracted per instruction, like API, number, offset, string, bytes, and many characteristics (like "cross section flow"). You can also use two new features: operand number and operand offset features. Both of these are specified with operand indices, like
operand[0].number
, which lets you match source and/or destination operations.operand[{0, 1, 2}].number
matches operands that are immediate constants, like0x123
in the instructionmov eax, 0x123
. Like the existingnumber
feature, valid addresses are filtered out.operand[{0, 1, 2}].offset
matches the offset portion of memory reference operands, like0x10
in the instructionmov eax, [ebx+0x10]
. Like the existingoffset
feature, suspected stack variable references are filtered out.Register, displacement, and computed address features are not supported at this time, since I haven't imagined any common use cases yet.
This is a breaking change because old versions of capa will not understand the instruction scope.
performance impact
When considering mimikatz.exe, the vivisect backend extracts 322,647 total features before this change, and with this change, 49,188 (+15%) operand features: 16,661 numbers and 32,527 offsets.
Somehow, the total runtime against mimikatz doesn't change much:
Once we convert some rules over to using the instruction scope, we should re-evaluate this.
Maybe 4% slower during feature extraction?
In fact, there are fewer evaluations during matching with these changes, probably due to the fix for #931 (don't use global features for optimizer down selection) which perhaps makes up for the additional features generated:
TODO