Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Address abstraction #986

Merged
merged 110 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from 100 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
b5be876
feat: start dotnet detection (#955)
mr-tz Apr 6, 2022
97e76a8
fix: imports and add tests
mr-tz Apr 6, 2022
6555257
Update dotnet-main (#979)
mr-tz Apr 7, 2022
c8a772d
test: update dotnet dirs and sync master (#984)
mr-tz Apr 8, 2022
6355fb3
add Address abstraction to handle various ways of identifing things i…
williballenthin Apr 8, 2022
1b79aae
extractor: introduce standardized handles for function, bb, insn
williballenthin Apr 8, 2022
fc1709b
extractor: add types throughout
williballenthin Apr 8, 2022
bfb01e3
extractor: viv: use handles throughout
williballenthin Apr 8, 2022
9164713
Merge branch 'dotnet-main' of github.com:mandiant/capa into feature-981
williballenthin Apr 8, 2022
31977e6
changelog
williballenthin Apr 8, 2022
a3d1b14
address: fix min value for unsigned addresses
williballenthin Apr 8, 2022
7e7740c
viv: insn: use handles for code merged from master
williballenthin Apr 8, 2022
65b462f
render: format various address types differently
williballenthin Apr 8, 2022
43b8ad8
pefile: extract Addresses
williballenthin Apr 8, 2022
2b00bc0
pep8
williballenthin Apr 8, 2022
ae87fa1
elf: use addresses
williballenthin Apr 8, 2022
87d3d6c
smda: use Addresses
williballenthin Apr 8, 2022
ed10090
Merge branch 'master' of github.com:mandiant/capa into feature-981
williballenthin Apr 8, 2022
808b7fb
dnfile: fix types
williballenthin Apr 9, 2022
70c3487
address: better implement .NET token
williballenthin Apr 9, 2022
d9ede95
dnfile: use Address
williballenthin Apr 9, 2022
e029547
show-features: learn to use Addresses
williballenthin Apr 9, 2022
723efe1
address: better implement .NET token
williballenthin Apr 9, 2022
bfb6d4d
dn: fix access to ctx
williballenthin Apr 9, 2022
c236293
features: insn: number: allow floats, too
williballenthin Apr 9, 2022
a734a04
dnfile: address: use rva
williballenthin May 11, 2022
71cf19b
render: handle dn tokens
williballenthin May 11, 2022
78e9280
Merge branch 'master' into feature-981
williballenthin May 11, 2022
7b05fc4
pep8 + mypy
williballenthin May 11, 2022
716a73d
feat: add handles and type annotations
mr-tz May 12, 2022
d8c9941
fix: filter address
mr-tz May 12, 2022
b2853cc
feat: update dnfile tests and extractor
mr-tz May 12, 2022
83cae29
ci: temporarily test on PR
mr-tz May 12, 2022
8e1f710
fix: add __str__
mr-tz May 12, 2022
7642db3
Merge pull request #1029 from mandiant/feat/981-add-ida-handles
williballenthin May 17, 2022
a4f0c1c
fix: rule generator handles
mr-tz May 19, 2022
4a577fa
Merge pull request #1031 from mandiant/fix/ida-plugin
williballenthin May 23, 2022
e4caa1d
base extractor: use handles
williballenthin May 24, 2022
fc9681f
helpers: fix import loop
williballenthin May 24, 2022
6b6dd70
freeze: use address abstraction
williballenthin May 24, 2022
d728869
freeze: mypy and pep8
williballenthin May 24, 2022
b35fe6c
json, render: work with and serialize addresses
williballenthin May 24, 2022
a4003d7
tests: fix scripts using json document
williballenthin May 24, 2022
d7cfa4e
features: make features implement __lt__
williballenthin May 25, 2022
b1fa5be
show-features: render features in a tree to better group scopes
williballenthin May 25, 2022
adb425a
freeze: use pydantic for (de)serialization
williballenthin May 25, 2022
02cef82
pep8
williballenthin May 25, 2022
6b633ef
freeze: fix schema to support overlapping functions
williballenthin May 25, 2022
eb6de90
changelog
williballenthin May 25, 2022
3879e33
freeze: model each features separately
williballenthin May 25, 2022
b2318ce
features: remove freeze_(de)serialize with preference to freeze module
williballenthin May 25, 2022
9236a36
rule: factor out is subscope check
williballenthin May 26, 2022
4ae4bab
lint: use meta.authors
williballenthin May 26, 2022
2dec484
typing fixes
williballenthin May 31, 2022
8080752
freeze: pass descriptions around
williballenthin May 31, 2022
42e2c53
wip: pydantic result document
williballenthin May 31, 2022
5d6c12d
sync rules
williballenthin May 31, 2022
5084cb0
Merge branch 'feature-981' of github.com:mandiant/capa into feature-981
williballenthin May 31, 2022
867662b
rules: remove unused `rule-category` meta
williballenthin Jun 6, 2022
ab4177f
render: default: fix rendering of mbc/att&ck
williballenthin Jun 6, 2022
cb44704
features: bb: add description to BasicBlock feature
williballenthin Jun 6, 2022
f58966a
address: implement repr, not str
williballenthin Jun 6, 2022
afc2953
frz: address: make sortable
williballenthin Jun 6, 2022
59e0518
pep8/mypy
williballenthin Jun 6, 2022
5960f51
result document: fix type of statement node
williballenthin Jun 6, 2022
f8b10a2
render: verbose: update to use new result document
williballenthin Jun 6, 2022
dcdc70d
Merge branch 'feature-981' of github.com:mandiant/capa into feature-981
williballenthin Jun 6, 2022
1a290a3
Merge branch 'master' into feature-981
williballenthin Jun 6, 2022
dddcec4
setup: fix dep spec
williballenthin Jun 6, 2022
a66c6c9
setup: fix pydantic dep version
williballenthin Jun 6, 2022
1b951aa
*: remove unused imports
williballenthin Jun 6, 2022
9a8d28d
viv: remove old handle implementation
williballenthin Jun 6, 2022
c73db05
fixtures: add path to extractors
williballenthin Jun 6, 2022
0987141
tests: add tests demonstrating rending of .NET samples
williballenthin Jun 6, 2022
9fdaa91
render: vverbose: fixup rendering of imports
williballenthin Jun 6, 2022
3ef126f
show-features: fix rendering addresses
williballenthin Jun 6, 2022
9c09923
main: fix .NET format detection
williballenthin Jun 6, 2022
bfda997
freeze: support Class and Namespace features, too
williballenthin Jun 6, 2022
f35a825
Update capa/features/extractors/dnfile/insn.py
williballenthin Jun 8, 2022
96b522c
Update capa/features/address.py
williballenthin Jun 8, 2022
9433d41
Update capa/features/address.py
williballenthin Jun 8, 2022
2767660
features: substring: correctly record no captures
williballenthin Jun 8, 2022
ad15349
address: implement __eq__
williballenthin Jun 8, 2022
c6144a1
freeze: address: fix .NET address sorting
williballenthin Jun 8, 2022
faf414e
tests: add more dotnet tests
williballenthin Jun 8, 2022
c3418fd
tests: json: fix address representation
williballenthin Jun 8, 2022
6a5271c
remove old file
williballenthin Jun 10, 2022
67221e5
freeze: fix (de)serialization of tokens
williballenthin Jun 10, 2022
5b5ac16
render: fix rendering of .NET tokens
williballenthin Jun 10, 2022
1c771da
pep8
williballenthin Jun 10, 2022
6568189
freeze: fix sorting of addresses
williballenthin Jun 10, 2022
3103307
tests: fix reference error
williballenthin Jun 10, 2022
8031be7
render: fix computation of subrule matches
williballenthin Jun 10, 2022
9c77488
ida: meta: extract os/format/platform
williballenthin Jun 14, 2022
269f056
ida: use new ResultDocument structures
williballenthin Jun 14, 2022
aff6191
ida: meta: provide [] as argv
williballenthin Jun 14, 2022
df101e5
Update capa/features/extractors/dnfile/extractor.py
williballenthin Jun 14, 2022
bb74c73
sync rules
williballenthin Jun 14, 2022
c417b5d
merge master
williballenthin Jun 14, 2022
f5b79c0
Update .github/workflows/tests.yml
williballenthin Jun 14, 2022
0ff3bf1
Update .github/workflows/tests.yml
williballenthin Jun 14, 2022
ee5c869
extractor: clarify base address handling
williballenthin Jun 14, 2022
6b5e125
extractors: mypy
williballenthin Jun 14, 2022
af9049d
dnfile: return NO_ADDRESS for base_address
williballenthin Jun 14, 2022
a5979d3
Merge branch 'feature-981' of github.com:fireeye/capa into feature-981
williballenthin Jun 14, 2022
246ef58
tests: fix render test for ATT&CK metadata
williballenthin Jun 20, 2022
a453258
tests: fix render test for MBC
williballenthin Jun 20, 2022
9ebea05
show-capabilities-by-function: use new ResultDocument
williballenthin Jun 20, 2022
e3804a0
main: add types for collect_metadata
williballenthin Jun 20, 2022
be2dffe
bulk-process: use new ResultDocument json
williballenthin Jun 20, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
# TODO cleanup these
branches: [ master, dotnet-main, feature-981 ]
williballenthin marked this conversation as resolved.
Show resolved Hide resolved

# save workspaces to speed up testing
env:
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@
- add unmanaged call characteristic for dotnet files #1023 @mike-hunhoff
- add mixed mode characteristic feature extraction for dotnet files #1024 @mike-hunhoff
- emit class and namespace features for dotnet files #1030 @mike-hunhoff
- render: support Addresses that aren't simple integers, like .NET token+offset #981 @williballenthin

### Breaking Changes

- instruction scope and operand feature are new and are not backwards compatible with older versions of capa
- Python 3.7 is now the minimum supported Python version #866 @williballenthin
- remove /x32 and /x64 flavors of number and operand features #932 @williballenthin
- the tool now accepts multiple paths to rules, and JSON doc updated accordingly @williballenthin
- extractors must use handles to identify functions/basic blocks/instructions #981 @williballenthin
- the freeze file format schema was updated, including format version bump to v2 #986 @williballenthin

### New Rules (7)

Expand Down
13 changes: 7 additions & 6 deletions capa/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import capa.perf
import capa.features.common
from capa.features.common import Result, Feature
from capa.features.address import Address

if TYPE_CHECKING:
# circular import, otherwise
Expand All @@ -26,7 +27,7 @@
# to collect the locations of a feature, do: `features[Number(0x10)]`
#
# aliased here so that the type can be documented and xref'd.
FeatureSet = Dict[Feature, Set[int]]
FeatureSet = Dict[Feature, Set[Address]]


class Statement:
Expand Down Expand Up @@ -257,10 +258,10 @@ def evaluate(self, ctx, **kwargs):
# inspect(match_details)
#
# aliased here so that the type can be documented and xref'd.
MatchResults = Mapping[str, List[Tuple[int, Result]]]
MatchResults = Mapping[str, List[Tuple[Address, Result]]]


def index_rule_matches(features: FeatureSet, rule: "capa.rules.Rule", locations: Iterable[int]):
def index_rule_matches(features: FeatureSet, rule: "capa.rules.Rule", locations: Iterable[Address]):
"""
record into the given featureset that the given rule matched at the given locations.

Expand All @@ -277,7 +278,7 @@ def index_rule_matches(features: FeatureSet, rule: "capa.rules.Rule", locations:
namespace, _, _ = namespace.rpartition("/")


def match(rules: List["capa.rules.Rule"], features: FeatureSet, va: int) -> Tuple[FeatureSet, MatchResults]:
def match(rules: List["capa.rules.Rule"], features: FeatureSet, addr: Address) -> Tuple[FeatureSet, MatchResults]:
"""
match the given rules against the given features,
returning an updated set of features and the matches.
Expand Down Expand Up @@ -315,10 +316,10 @@ def match(rules: List["capa.rules.Rule"], features: FeatureSet, va: int) -> Tupl
# sanity check
assert bool(res) is True

results[rule.name].append((va, res))
results[rule.name].append((addr, res))
# we need to update the current `features`
# because subsequent iterations of this loop may use newly added features,
# such as rule or namespace matches.
index_rule_matches(features, rule, [va])
index_rule_matches(features, rule, [addr])

return (features, results)
110 changes: 110 additions & 0 deletions capa/features/address.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
import abc

from dncil.clr.token import Token


class Address(abc.ABC):
@abc.abstractmethod
def __eq__(self, other):
...

@abc.abstractmethod
def __lt__(self, other):
# implement < so that addresses can be sorted from low to high
...

@abc.abstractmethod
def __hash__(self):
# implement hash so that addresses can be used in sets and dicts
...

@abc.abstractmethod
def __repr__(self):
# implement repr to help during debugging
...


class AbsoluteVirtualAddress(int, Address):
"""an absolute memory address"""

def __new__(cls, v):
assert v >= 0
return int.__new__(cls, v)

def __repr__(self):
return f"absolute(0x{self:x})"


class RelativeVirtualAddress(int, Address):
"""a memory address relative to a base address"""

def __repr__(self):
return f"relative(0x{self:x})"


class FileOffsetAddress(int, Address):
"""an address relative to the start of a file"""

def __new__(cls, v):
assert v >= 0
return int.__new__(cls, v)

def __repr__(self):
return f"file(0x{self:x})"


class DNTokenAddress(Address):
"""a .NET token"""

def __init__(self, token: Token):
self.token = token

williballenthin marked this conversation as resolved.
Show resolved Hide resolved
def __eq__(self, other):
return self.token.value == other.token.value

def __lt__(self, other):
return self.token.value < other.token.value

def __hash__(self):
return hash(self.token.value)

def __repr__(self):
return f"token(0x{self.token.value:x})"


class DNTokenOffsetAddress(Address):
"""an offset into an object specified by a .NET token"""

def __init__(self, token: Token, offset: int):
assert offset >= 0
self.token = token
self.offset = offset

williballenthin marked this conversation as resolved.
Show resolved Hide resolved
def __eq__(self, other):
return (self.token.value, self.offset) == (other.token.value, other.offset)

def __lt__(self, other):
return (self.token.value, self.offset) < (other.token.value, other.offset)

def __hash__(self):
return hash((self.token.value, self.offset))

def __repr__(self):
return f"token(0x{self.token.value:x})+(0x{self.offset:x})"


class _NoAddress(Address):
def __eq__(self, other):
return True

def __lt__(self, other):
return False

def __hash__(self):
return hash(0)

def __repr__(self):
return "no address"


NO_ADDRESS = _NoAddress()
11 changes: 2 additions & 9 deletions capa/features/basicblock.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,11 @@


class BasicBlock(Feature):
def __init__(self):
super(BasicBlock, self).__init__(None)
def __init__(self, description=None):
super(BasicBlock, self).__init__(None, description=description)

def __str__(self):
return "basic block"

def get_value_str(self):
return ""

def freeze_serialize(self):
return (self.__class__.__name__, [])

@classmethod
def freeze_deserialize(cls, args):
return cls()
69 changes: 24 additions & 45 deletions capa/features/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import codecs
import logging
import collections
from typing import TYPE_CHECKING, Set, Dict, List, Union
from typing import TYPE_CHECKING, Set, Dict, List, Union, Optional, Sequence

if TYPE_CHECKING:
# circular import, otherwise
Expand All @@ -20,6 +20,7 @@
import capa.perf
import capa.features
import capa.features.extractors.elf
from capa.features.address import Address

logger = logging.getLogger(__name__)
MAX_BYTES_FEATURE_SIZE = 0x100
Expand Down Expand Up @@ -70,20 +71,13 @@ def __init__(
success: bool,
statement: Union["capa.engine.Statement", "Feature"],
children: List["Result"],
locations=None,
locations: Optional[Set[Address]] = None,
):
"""
args:
success (bool)
statement (capa.engine.Statement or capa.features.Feature)
children (list[Result])
locations (iterable[VA])
"""
super(Result, self).__init__()
self.success = success
self.statement = statement
self.children = children
self.locations = locations if locations is not None else ()
self.locations = locations if locations is not None else set()

def __eq__(self, other):
if isinstance(other, bool):
Expand All @@ -98,7 +92,7 @@ def __nonzero__(self):


class Feature(abc.ABC):
def __init__(self, value: Union[str, int, bytes], description=None):
def __init__(self, value: Union[str, int, float, bytes], description=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Args:
value (any): the value of the feature, such as the number or string.
Expand All @@ -116,6 +110,15 @@ def __hash__(self):
def __eq__(self, other):
return self.name == other.name and self.value == other.value

def __lt__(self, other):
# TODO: this is a huge hack!
import capa.features.freeze.features

return (
capa.features.freeze.features.feature_from_capa(self).json()
< capa.features.freeze.features.feature_from_capa(other).json()
)

williballenthin marked this conversation as resolved.
Show resolved Hide resolved
def get_value_str(self) -> str:
"""
render the value of this feature, for use by `__str__` and friends.
Expand All @@ -137,27 +140,10 @@ def __str__(self):
def __repr__(self):
return str(self)

def evaluate(self, ctx: Dict["Feature", Set[int]], **kwargs) -> Result:
def evaluate(self, ctx: Dict["Feature", Set[Address]], **kwargs) -> Result:
capa.perf.counters["evaluate.feature"] += 1
capa.perf.counters["evaluate.feature." + self.name] += 1
return Result(self in ctx, self, [], locations=ctx.get(self, []))

def freeze_serialize(self):
return (self.__class__.__name__, [self.value])

@classmethod
def freeze_deserialize(cls, args):
# as you can see below in code,
# if the last argument is a dictionary,
# consider it to be kwargs passed to the feature constructor.
if len(args) == 1:
return cls(*args)
elif isinstance(args[-1], dict):
kwargs = args[-1]
args = args[:-1]
return cls(*args, **kwargs)
else:
return cls(*args)
return Result(self in ctx, self, [], locations=ctx.get(self, set()))


class MatchedRule(Feature):
Expand Down Expand Up @@ -230,7 +216,7 @@ def evaluate(self, ctx, short_circuit=True):
# instead, return a new instance that has a reference to both the substring and the matched values.
return Result(True, _MatchedSubstring(self, matches), [], locations=locations)
else:
return Result(False, _MatchedSubstring(self, None), [])
return Result(False, _MatchedSubstring(self, {}), [])

def __str__(self):
return "substring(%s)" % self.value
Expand All @@ -244,11 +230,11 @@ class _MatchedSubstring(Substring):
note: this type should only ever be constructed by `Substring.evaluate()`. it is not part of the public API.
"""

def __init__(self, substring: Substring, matches):
def __init__(self, substring: Substring, matches: Dict[str, Set[Address]]):
"""
args:
substring (Substring): the substring feature that matches.
match (Dict[string, List[int]]|None): mapping from matching string to its locations.
substring: the substring feature that matches.
match: mapping from matching string to its locations.
"""
super(_MatchedSubstring, self).__init__(str(substring.value), description=substring.description)
# we want this to collide with the name of `Substring` above,
Expand Down Expand Up @@ -327,7 +313,7 @@ def evaluate(self, ctx, short_circuit=True):
# see #262.
return Result(True, _MatchedRegex(self, matches), [], locations=locations)
else:
return Result(False, _MatchedRegex(self, None), [])
return Result(False, _MatchedRegex(self, {}), [])

def __str__(self):
return "regex(string =~ %s)" % self.value
Expand All @@ -341,11 +327,11 @@ class _MatchedRegex(Regex):
note: this type should only ever be constructed by `Regex.evaluate()`. it is not part of the public API.
"""

def __init__(self, regex: Regex, matches):
def __init__(self, regex: Regex, matches: Dict[str, Set[Address]]):
"""
args:
regex (Regex): the regex feature that matches.
match (Dict[string, List[int]]|None): mapping from matching string to its locations.
regex: the regex feature that matches.
matches: mapping from matching string to its locations.
"""
super(_MatchedRegex, self).__init__(str(regex.value), description=regex.description)
# we want this to collide with the name of `Regex` above,
Expand Down Expand Up @@ -389,13 +375,6 @@ def evaluate(self, ctx, **kwargs):
def get_value_str(self):
return hex_string(bytes_to_str(self.value))

def freeze_serialize(self):
return (self.__class__.__name__, [bytes_to_str(self.value).upper()])

@classmethod
def freeze_deserialize(cls, args):
return cls(*[codecs.decode(x, "hex") for x in args])


# other candidates here: https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types
ARCH_I386 = "i386"
Expand Down
Loading