Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use metadata type, not dict, in main #1411

Closed
williballenthin opened this issue Mar 28, 2023 · 6 comments
Closed

use metadata type, not dict, in main #1411

williballenthin opened this issue Mar 28, 2023 · 6 comments
Assignees
Labels
breaking-change introduces a breaking change that should be released in a major version enhancement New feature or request

Comments

@williballenthin
Copy link
Collaborator

williballenthin commented Mar 28, 2023

while this is what's required to make this work today, i don't like how it encourages us to keep using untyped Dicts to represent our intermediate results. i wonder if we should update the surrounding code (such as whats used in main.py) to use the result_document.Metadata objects instead.

Originally posted by @williballenthin in #1396 (comment)

specifically, routines like collect_metadata return a Dict rather than a specific type:

capa/capa/main.py

Lines 727 to 777 in af15008

def collect_metadata(
argv: List[str],
sample_path: str,
format_: str,
os_: str,
rules_path: List[str],
extractor: capa.features.extractors.base_extractor.FeatureExtractor,
):
md5 = hashlib.md5()
sha1 = hashlib.sha1()
sha256 = hashlib.sha256()
with open(sample_path, "rb") as f:
buf = f.read()
md5.update(buf)
sha1.update(buf)
sha256.update(buf)
if rules_path != [RULES_PATH_DEFAULT_STRING]:
rules_path = [os.path.abspath(os.path.normpath(r)) for r in rules_path]
format_ = get_format(sample_path) if format_ == FORMAT_AUTO else format_
arch = get_arch(sample_path)
os_ = get_os(sample_path) if os_ == OS_AUTO else os_
return {
"timestamp": datetime.datetime.now().isoformat(),
"version": capa.version.__version__,
"argv": argv,
"sample": {
"md5": md5.hexdigest(),
"sha1": sha1.hexdigest(),
"sha256": sha256.hexdigest(),
"path": os.path.normpath(sample_path),
},
"analysis": {
"format": format_,
"arch": arch,
"os": os_,
"extractor": extractor.__class__.__name__,
"rules": rules_path,
"base_address": extractor.get_base_address(),
"layout": {
# this is updated after capabilities have been collected.
# will look like:
#
# "functions": { 0x401000: { "matched_basic_blocks": [ 0x401000, 0x401005, ... ] }, ... }
},
},
}

we should maybe just return result_document.Metadata here and use it throughout the code.

this will be a breaking change, so it should be included in v6.

@williballenthin williballenthin added enhancement New feature or request breaking-change introduces a breaking change that should be released in a major version labels Mar 28, 2023
@manasghandat
Copy link
Contributor

@williballenthin Can you assign me this issue. Would like to work on it.

@Aayush-Goel-04
Copy link
Contributor

Aayush-Goel-04 commented May 31, 2023

@williballenthin , Regarding updating meta of type(Metadata) I think it would be better to create from_capa apis for other classes like Layout, Feature_Counts in result_document.py. Since while updating meta from dictionary, it causes type error between dict and tuple. What are your thoughts on this?

@Aayush-Goel-04
Copy link
Contributor

Aayush-Goel-04 commented May 31, 2023

we should maybe just return result_document.Metadata here and use it throughout the code.

If we are using meta as type Metadata then

capa/capa/main.py

Lines 1232 to 1239 in af15008

if args.json:
print(capa.render.json.render(meta, rules, capabilities))
elif args.vverbose:
print(capa.render.vverbose.render(meta, rules, capabilities))
elif args.verbose:
print(capa.render.verbose.render(meta, rules, capabilities))
else:
print(capa.render.default.render(meta, rules, capabilities))

Here in capa.render.json/verbose/vverbose/default.render meta is passed to ResultDocument.from_capa(meta) where it is processed as Metadata.from_capa(meta) thus meta should be of type dict. I guess some changes will also be required in render too.

@williballenthin
Copy link
Collaborator Author

I think it would be better to create from_capa apis for other classes like Layout, Feature_Counts in result_document.py

Yup, that sounds great.

@williballenthin
Copy link
Collaborator Author

I guess some changes will also be required in render too.

yes, i think thats right. we should prefer to use the Metadata type wherever possible and avoid the dict representation (unless actively serializing to a file).

@mr-tz
Copy link
Collaborator

mr-tz commented Jun 9, 2023

closed by #1502, thanks @Aayush-Goel-04!

@mr-tz mr-tz closed this as completed Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change introduces a breaking change that should be released in a major version enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants