Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract api names from ELF debug symbols [vivisect] #1443

Merged
merged 39 commits into from
Jun 5, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c71cb55
insn extractor: Add static api extraction using .symtab
yelhamer Apr 14, 2023
21f2cb6
Update CHANGELOG.md
yelhamer Apr 14, 2023
44254bf
Update CHANGELOG.md
yelhamer Apr 17, 2023
97c8fd0
Update CHANGELOG.md
yelhamer Apr 21, 2023
e7ccea4
Shdr: add a constructor for vivisect's shdr representation
yelhamer Apr 22, 2023
b766d95
insn.py: rewire symbol parsing to use SymTab instead of vivisect
yelhamer Apr 22, 2023
b32a8ca
insn.py: Get the symtab api extractor to yield FunctionName features …
yelhamer Apr 23, 2023
ee881ab
code style: Fix the format of the committed code
yelhamer Apr 23, 2023
695508a
insn.py: Update extract_insn_api_features() to optimize by means of v…
yelhamer Apr 25, 2023
c7b65cf
Shdr constructor: Use direct member access to get vstruct's section h…
yelhamer Apr 25, 2023
64ef2c8
add tests for vivisect's usage of debug symbols
yelhamer Jun 1, 2023
f10a43a
fix style issues
yelhamer Jun 1, 2023
0d42ac3
add missing function-name feature testing
yelhamer Jun 1, 2023
ce8e15a
Merge branch 'master' into feature-static-api-names
williballenthin Jun 1, 2023
5738681
use ELF class member instead of vsGetField()
yelhamer Jun 1, 2023
ffb1cb3
rename strtab to strtab_section
yelhamer Jun 1, 2023
ab089c0
fetch section data by offset (not name)
yelhamer Jun 1, 2023
f9291d4
extract symtab-api names before processing library functions
yelhamer Jun 1, 2023
994edf6
return the target's address for the function-name feature
yelhamer Jun 1, 2023
8d1e1cc
fix strtab naming
yelhamer Jun 1, 2023
d85d01e
use the function-handle's cache instead of the VivWorkspace file meta…
yelhamer Jun 1, 2023
1cec768
fix strtab renaming error
yelhamer Jun 1, 2023
64dca7d
Merge branch 'master' into feature-static-api-names
williballenthin Jun 2, 2023
dde76e3
add a method to construct SymTab objects from Elf objects
yelhamer Jun 2, 2023
9467ee6
add FunctionName extraction at the function scope
yelhamer Jun 2, 2023
41c5126
update symtab-based FunctionName feature extraction
yelhamer Jun 2, 2023
0b834a1
delete functionName extraction at instruction level
yelhamer Jun 2, 2023
4976375
elf.py: fix identation error
yelhamer Jun 2, 2023
151ef95
remove usage of vsGetField
yelhamer Jun 2, 2023
764fda8
add missing Shdr.from_viv() method
yelhamer Jun 2, 2023
6b2710a
fix broken logic in extract_function_symtab_names()
yelhamer Jun 2, 2023
5b903ca
add error handling to SymTab and its callers
yelhamer Jun 2, 2023
be5ada2
fix code style
yelhamer Jun 3, 2023
7dff76b
Merge branch 'master' into feature-static-api-names
yelhamer Jun 3, 2023
e971bc4
fix codestyle issues
yelhamer Jun 5, 2023
65f18ae
fix mypy typing issues
yelhamer Jun 5, 2023
103b384
fix viv/extractor.py codestyle imports
yelhamer Jun 5, 2023
9b0fb74
fix typo: "Elf" to "elf"
yelhamer Jun 5, 2023
5b260c0
fix symtab FunctionName feature scope address
yelhamer Jun 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Change Log

## master (unreleased)
- extract function and API names from ELF symtab entries @yelhamer https://github.com/mandiant/capa-rules/issues/736
yelhamer marked this conversation as resolved.
Show resolved Hide resolved

### New Features
- Utility script to detect feature overlap between new and existing CAPA rules [#1451](https://github.com/mandiant/capa/issues/1451) [@Aayush-Goel-04](https://github.com/aayush-goel-04)
Expand Down
36 changes: 36 additions & 0 deletions capa/features/extractors/elf.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,21 @@ class Shdr:
entsize: int
buf: bytes

@classmethod
def from_viv(cls, section, buf: bytes) -> "Shdr":
return cls(
section.sh_name,
section.sh_type,
section.sh_flags,
section.sh_addr,
section.sh_offset,
section.sh_size,
section.sh_link,
section.sh_entsize,
buf,
)



class ELF:
def __init__(self, f: BinaryIO):
Expand Down Expand Up @@ -695,6 +710,27 @@ def get_symbols(self) -> Iterator[Symbol]:
for symbol in self.symbols:
yield symbol

@classmethod
def from_Elf(cls, ElfBinary) -> "SymTab":
endian = "<" if ElfBinary.getEndian() == 0 else ">"
bitness = ElfBinary.bits

SHT_SYMTAB = 0x2
for section in ElfBinary.sections:
if section.sh_info & SHT_SYMTAB:
strtab_section = ElfBinary.sections[section.sh_link]
sh_symtab = Shdr.from_viv(section, ElfBinary.readAtOffset(section.sh_offset, section.sh_size))
sh_strtab = Shdr.from_viv(strtab_section, ElfBinary.readAtOffset(strtab_section.sh_offset, strtab_section.sh_size))

try:
return cls(endian, bitness, sh_symtab, sh_strtab)
except NameError:
return None
except:
# all exceptions that could be encountered by
# cls._parse() imply a faulty symbol's table.
raise CorruptElfFile("malformed symbol's table")


def guess_os_from_osabi(elf: ELF) -> Optional[OS]:
return elf.ei_osabi
Expand Down
3 changes: 2 additions & 1 deletion capa/features/extractors/viv/extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,9 @@ def extract_file_features(self):
yield from capa.features.extractors.viv.file.extract_features(self.vw, self.buf)

def get_functions(self) -> Iterator[FunctionHandle]:
cache = {}
for va in sorted(self.vw.getFunctions()):
yield FunctionHandle(address=AbsoluteVirtualAddress(va), inner=viv_utils.Function(self.vw, va))
yield FunctionHandle(address=AbsoluteVirtualAddress(va), inner=viv_utils.Function(self.vw, va), ctx={"cache":cache})

def extract_function_features(self, fh: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
yield from capa.features.extractors.viv.function.extract_features(fh)
Expand Down
28 changes: 27 additions & 1 deletion capa/features/extractors/viv/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,28 @@ def interface_extract_function_XXX(fh: FunctionHandle) -> Iterator[Tuple[Feature
raise NotImplementedError


def extract_function_symtab_names(fh: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
if fh.inner.vw.metadata["Format"] == "Elf":
# the file's symbol table gets added to the metadata of the vivisect workspace.
# this is in order to eliminate the computational overhead of refetching symtab each time.
if "symtab" not in fh.ctx["cache"]:
try:
fh.ctx["cache"]["symtab"] = SymTab.from_Elf(fh.inner.vw.parsedbin)
except:
fh.ctx["cache"]["symtab"] = None

symtab = fh.ctx["cache"]["symtab"]
if symtab:
yelhamer marked this conversation as resolved.
Show resolved Hide resolved
for symbol in symtab.get_symbols():
sym_name = symtab.get_name(symbol)
sym_value = symbol.value
sym_info = symbol.info

STT_FUNC = 0x2
if sym_value == fh.address and sym_info & STT_FUNC != 0:
yield FunctionName(sym_name), fh.address


def extract_function_calls_to(fhandle: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
f: viv_utils.Function = fhandle.inner
for src, _, _, _ in f.vw.getXrefsTo(f.va, rtype=vivisect.const.REF_CODE):
Expand Down Expand Up @@ -79,4 +101,8 @@ def extract_features(fh: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
yield feature, addr


FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop)
FUNCTION_HANDLERS = (
extract_function_symtab_names,
extract_function_calls_to,
extract_function_loop,
)
22 changes: 22 additions & 0 deletions capa/features/extractors/viv/insn.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@

import capa.features.extractors.helpers
import capa.features.extractors.viv.helpers
from capa.features.file import FunctionName
from capa.features.insn import API, MAX_STRUCTURE_SIZE, Number, Offset, Mnemonic, OperandNumber, OperandOffset
from capa.features.common import MAX_BYTES_FEATURE_SIZE, THUNK_CHAIN_DEPTH_DELTA, Bytes, String, Feature, Characteristic
from capa.features.address import Address, AbsoluteVirtualAddress
from capa.features.extractors.elf import Shdr, SymTab
from capa.features.extractors.base_extractor import BBHandle, InsnHandle, FunctionHandle
from capa.features.extractors.viv.indirect_calls import NotFoundError, resolve_indirect_call

Expand Down Expand Up @@ -109,6 +111,26 @@ def extract_insn_api_features(fh: FunctionHandle, bb, ih: InsnHandle) -> Iterato
if not target:
return

if f.vw.metadata["Format"] == "elf":
if "symtab" not in fh.ctx["cache"]:
# the symbol table gets stored as a function's attribute in order to avoid running
# this code everytime the call is made, thus preventing the computational overhead.
try:
fh.ctx["cache"]["symtab"] = SymTab.from_Elf(f.vw.parsedbin)
except:
fh.ctx["cache"]["symtab"] = None

symtab = fh.ctx["cache"]["symtab"]
if symtab:
for symbol in symtab.get_symbols():
sym_name = symtab.get_name(symbol)
sym_value = symbol.value
sym_info = symbol.info

STT_FUNC = 0x2
if sym_value == target and sym_info & STT_FUNC != 0:
yield API(sym_name), ih.address

if viv_utils.flirt.is_library_function(f.vw, target):
name = viv_utils.get_function_name(f.vw, target)
yield API(name), ih.address
Expand Down
41 changes: 41 additions & 0 deletions tests/fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -761,6 +761,47 @@ def parametrize(params, values, **kwargs):
key=lambda t: (t[0], t[1]),
)

# this list should be merged into the one above (FEATURE_PRESENSE_TESTS)
# once the debug symbol functionality has been added to all backends
FEATURE_SYMTAB_FUNC_TESTS = [
(
"2bf18d",
"function=0x4027b3,bb=0x402861,insn=0x40286d",
capa.features.insn.API("__GI_connect"),
True,
),
(
"2bf18d",
"function=0x4027b3,bb=0x402861,insn=0x40286d",
capa.features.insn.API("connect"),
True,
),
(
"2bf18d",
"function=0x4027b3,bb=0x402861,insn=0x40286d",
capa.features.insn.API("__libc_connect"),
True,
),
(
"2bf18d",
"function=0x40286d",
capa.features.file.FunctionName("__GI_connect"),
True,
),
(
"2bf18d",
"function=0x40286d",
capa.features.file.FunctionName("connect"),
True,
),
(
"2bf18d",
"function=0x40286d",
capa.features.file.FunctionName("__libc_connect"),
True,
),
]

FEATURE_PRESENCE_TESTS_DOTNET = sorted(
[
("b9f5b", "file", Arch(ARCH_I386), True),
Expand Down
2 changes: 1 addition & 1 deletion tests/test_viv_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

@fixtures.parametrize(
"sample,scope,feature,expected",
fixtures.FEATURE_PRESENCE_TESTS,
fixtures.FEATURE_PRESENCE_TESTS + fixtures.FEATURE_SYMTAB_FUNC_TESTS,
williballenthin marked this conversation as resolved.
Show resolved Hide resolved
indirect=["sample", "scope"],
)
def test_viv_features(sample, scope, feature, expected):
Expand Down