-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM: add support for arm architecture #1803
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
🥳 so happy to see this reopened! |
We registered on Google CLA, what is the next step now ? |
The CLA check succeeds now, thanks! We'll review this shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a first set of comments. My main concern is around mixing the architectures and how to handle this to keep the code maintainable (little code duplication) and readable.
Great work overall and I look forward to other reviews and your feedback.
@@ -5,8 +5,11 @@ | |||
### New Features | |||
- ghidra: add Ghidra feature extractor and supporting code #1770 @colton-gabertan | |||
- ghidra: add entry script helping users run capa against a loaded Ghidra database #1767 @mike-hunhoff | |||
- binja: add support for forwarded exports #1646 @xusheng6 | |||
binja: add support for forwarded exports #1646 @xusheng6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binja: add support for forwarded exports #1646 @xusheng6 | |
- binja: add support for forwarded exports #1646 @xusheng6 |
if isinstance(dst, (envi.archs.i386.disasm.i386SibOper, envi.archs.i386.disasm.i386RegMemOper)): | ||
rname = dst._dis_regctx.getRegisterName(dst.reg) | ||
else: | ||
rname = dst.reg | ||
if rname not in ["ebp", "rbp", "esp", "rsp", envi.archs.arm.disasm.REG_SP, envi.archs.arm.disasm.REG_BP]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be simplified to just use the constant or the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we import envi.archs.arm.disasm.REG_SP as REG_SP and just use REG_SP for example ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mr-tz i'm not sure i agree, because then it won't be obvious to a reader if the SP/BP are related to arm vs x86 vs ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to use:
- envi.archs.i386.disasm.REG_EBP (or similar [untested])
- envi.archs.amd64.disasm.REG_RBP (or similar [untested])
- etc.
Instead of "ebp", "rbp", etc.
not isinstance(dst, envi.archs.i386.disasm.i386SibOper) | ||
and not isinstance(dst, envi.archs.i386.disasm.i386RegMemOper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not isinstance(dst, envi.archs.i386.disasm.i386SibOper) | |
and not isinstance(dst, envi.archs.i386.disasm.i386RegMemOper) | |
not isinstance(dst, (envi.archs.i386.disasm.i386SibOper, envi.archs.i386.disasm.i386RegMemOper)) |
@@ -390,10 +392,13 @@ def is_supported_format(sample: Path) -> bool: | |||
return len(list(capa.features.extractors.common.extract_format(taste))) == 1 | |||
|
|||
|
|||
def is_supported_arch(sample: Path) -> bool: | |||
def is_supported_arch(sample: Path) -> Union[str, int, float, bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still return bool
. get_arch
can return the string/handle the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change that to use get_arch instead
super().__init__() | ||
self.vw = vw | ||
self.path = path | ||
self.arm = arm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to use arch
here instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I will change it
add_library_function(extractor.vw, bb, get_library_function_name_arm) | ||
|
||
|
||
Sys_x64 = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to do this for the other backends as well, we can move the mappings to a file one directory up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want us to create this file now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't mind, please... and thanks!
def interface_extract_instruction_XXX( | ||
fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle | ||
) -> Iterator[Tuple[Feature, Address]]: | ||
""" | ||
parse features from the given instruction. | ||
|
||
args: | ||
fh: the function handle to process. | ||
bbh: the basic block handle to process. | ||
ih: the instruction handle to process. | ||
|
||
yields: | ||
(Feature, Address): the feature and the address at which its found. | ||
""" | ||
raise NotImplementedError | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def interface_extract_instruction_XXX( | |
fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle | |
) -> Iterator[Tuple[Feature, Address]]: | |
""" | |
parse features from the given instruction. | |
args: | |
fh: the function handle to process. | |
bbh: the basic block handle to process. | |
ih: the instruction handle to process. | |
yields: | |
(Feature, Address): the feature and the address at which its found. | |
""" | |
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the separation, on the other hand seems like we have quite some duplicated code?
Do you have a feel for if only using insn.py
would be more maintainable/less duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We first began with a repository viv_arm/ and then merged it with viv/. insn.py was the most difficult one to merge. Many functions are very different but we import the common functions from insn.py. We can maybe merge other functions but I"m not sure if it is possible for all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be reasonable to:
- move most of
insn.py
today toinsn_x86.py
, then - add the new
insn_arm.py
, and then - add new
insn.py
that dispatches to the appropriateinsn_*
file
?
once we have two supported arches, i would generally prefer that they are treated the same, not "x86 is primary and ARM is second class".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think having some duplication is ok, especially when we don't expect every contributor to be an expert in both x86 and arm (i am certainly not!). if we keep the code together, then few people can maintain it. if we have duplicates, each set of experts can work on their own code. of course, we'll want to keep things in sync, but it shouldn't prevent us from doing a reasonably good job of fixing bugs and adding features.
if read_memory(f.vw, target, 4) != b"\x00\xc6\x8f\xe2": | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, I did not review most of this file
Thank you for this first review. I tried to give a feedback for as many comments as possible but I need to consult my team for the other points. I will commit the changes when all the comments are answered. |
from capa.features.extractors.base_extractor import BBHandle, FeatureExtractor | ||
|
||
|
||
def getSyscallName(num: int, arch: bool) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def getSyscallName(num: int, arch: bool) -> str: | |
def get_syscall_name(num: int, arch: bool) -> str: |
and maybe document this can raise KeyError
when not found ?
Hey @gauthier-wiemann, I've created a new feature branch for ARM support: https://github.com/mandiant/capa/tree/arm-support |
Hi, we have no problem with it. |
The suggested enhancement involves adding support for ARM architecture in the CAPA tool and improving its capabilities on ELF files.
Here are the steps involved:
Changes to
capa/features/common.py
andcapa/main.py
: Simple changes have been made to add ARM architecture to the list of supported architectures in CAPA.Changes to
capa/features/extractors/viv/
: More complex changes have been made to this directory to enable disassembly of ARM binaries.Specifically, a new file,
capa/features/extractors/viv/insn_arm.py
, has been added.This file replicates the logic of
capa/features/extractors/viv/insn.py
, but adapts its functions to ARM mnemonics and patterns specific to ARM binaries.Testing: Each feature has been tested using binaries written in assembler or compiled with GCC (version 9.4.0), and the final result of the modifications has been evaluated over Linux Arm Malware samples, mostly Mirai.
Improving capabilities on ELF files: The modifications also aim to improve CAPA's capabilities on ELF files. Specifically, the following changes have been made:
capa/features/extractors/elf.py
.capa/features/extractors/viv/syscall.py
.Note:
While adding support for ARM architecture in CAPA, it was found that some ARM samples produce a lot of warnings in the provenance of the vivesct disassembler. However, these warnings do not seem to interfere with the final result of CAPA's analysis.
As for x86/x84 binaries, it was observed that the statement of some features is dependent on specific patterns, which may be related to the type and version of the compiler used. This dependency could potentially limit CAPA's abilities when analyzing these binaries.
Checklist
Sorry for the new PR, we had to do it with another organization.
This PR is related to this issue #1774 and this PR #1796