Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract api names from ELF debug symbols [vivisect] #1443

Merged
merged 39 commits into from
Jun 5, 2023

Conversation

yelhamer
Copy link
Collaborator

@yelhamer yelhamer commented Apr 14, 2023

This is a draft PR to add support for utilizing the symbol table's entries to extract the names of statically linked apis. See discussion. also #1445.

I am using the vivisct engine to fetch the symbol names. Please let me know if this is the correct approach, and what are your thoughts on it.

Tests on sample 2bf18d0403677378adad9001b1243211:

before:
image

after:
image

Checklist

  • No CHANGELOG update needed

  • No new tests needed

  • No documentation update needed

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@github-actions github-actions bot dismissed their stale review April 14, 2023 03:25

CHANGELOG updated or no update needed, thanks! 😄

@yelhamer yelhamer changed the title Add support for api extraction from statically linked libraries. Add support for api extraction from statically linked libraries Apr 15, 2023
Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for asking for input early on. i've provided my vision inline, but i'm open to discussion on any of the topics. don't hesitate to suggest alternatives!

i'd recommend developing a few tests right away so we can agree what features should be extracted - and then we can dig into the implementation more. maybe assert that API features __GI_connect, __libc_connect, and connect are all found at the address of the call instruction (0x40286D in 72f1b91327ffda4cf18a2bf64913b673d39ebbff8cbe50c9cd354b1dcd312bcc).

also, (new feature) we should apply function-name features to the functions themselves. this is one thing we do with the FLIRT matches, so you can say thing like "look for this pattern, unless its found within OpenSSL::create_context" or similar. example here:

yield FunctionName(name), addr

CHANGELOG.md Outdated Show resolved Hide resolved
capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved
capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved
Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
CHANGELOG.md Show resolved Hide resolved
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
@yelhamer yelhamer force-pushed the feature-static-api-names branch 2 times, most recently from cd39f73 to 97c8fd0 Compare April 21, 2023 23:49
@yelhamer
Copy link
Collaborator Author

yelhamer commented Apr 22, 2023

I could not get api matches when using the FunctionName feature — as opposed to its API counterpart — for some reason. I will look into that and add tests on the next couple of commits.

@williballenthin
Copy link
Collaborator

williballenthin commented Apr 22, 2023 via email

@yelhamer
Copy link
Collaborator Author

I apologize for not providing much details @williballenthin. I have further elaborated bellow:

To my understanding, your previous recommendation to apply function-name features to the functions themselves meant using FunctionName() to yield the resolved functions, as opposed to API(). So in my case that means something like this on line 150 of my commit to insn.py:

...
if sym_value == target and sym_info & STT_FUNC != 0:
    yield FunctionName(sym_name), ih.address
...

However, doing this resulted in no capabilities — of symtab origin — being matched by our ruleset, while the usage of API to yield the functions resulted in the intended behavior of all the expected capabilities being extracted:

image

My questions are:

  • Could you please confirm whether I understood your recommendation of using function-name correctly?
  • and if I did, could you please direct me at what I should be doing in order to get this working as expected?

Thanks!

@williballenthin
Copy link
Collaborator

williballenthin commented Apr 22, 2023 via email

@yelhamer yelhamer force-pushed the feature-static-api-names branch 2 times, most recently from 5a4ca9c to b32a8ca Compare April 23, 2023 01:27
@yelhamer
Copy link
Collaborator Author

Thank you for the explanation! I wrongly assumed that function-name did the same job as API in this context, I take responsibility for that.

I have two more questions:

  • Regarding tests — should I combine the testing of symtab-based api extraction as well as the possible identification of common gcc wrappers (as discussed here) into a single test file? or should I test the api extraction using symtab information as part of the viv feature extraction tests here?
  • In my commit, I define a new function attribute in order to avoid redoing the symbol table extraction process each time the function gets called; however, this is causing the mypy code_style check to fail (error message). What do you think of this idea — usage of function attributes; is it correct? or should I look for an alternative?

Also, please let me know if you think there is anything else that I should be doing differently.

@mr-tz
Copy link
Collaborator

mr-tz commented Apr 24, 2023

Use whichever test you find more logical. Ideally, each functionality is tested separately.

For the optimization the first idea is to implement it similarly to get_imports.

…eader information

Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
@yelhamer yelhamer changed the title Add support for api extraction from statically linked libraries Extract api names from ELF debug symbols [vivisect] May 31, 2023
@yelhamer yelhamer marked this pull request as ready for review June 1, 2023 01:03
Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress, some changes still needed, though.

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved
capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved
tests/fixtures.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some error handling remain and we're good to go. nice job @yelhamer

edit: and need to review code style. reach out if you don't have the linters configured and i can explain that.

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved
capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved
capa/features/extractors/viv/function.py Outdated Show resolved Hide resolved
@yelhamer
Copy link
Collaborator Author

yelhamer commented Jun 3, 2023

I believe all requested changes have been addressed except for the code style review, I couldn't get the linter working properly and will be asking for explanation on that after the weekend.

Also, please check if the way I am doing error handling is correct. I am assuming that all Exceptions that get raised during SymTab construction are due to the symbol's table being faulty, and I am not handling the case of an empty symtab since that'd just result in an empty symbols' generator (for the for symbol in symtab.get_symbols(): part)

@yelhamer yelhamer requested a review from williballenthin June 3, 2023 00:45
@williballenthin
Copy link
Collaborator

williballenthin commented Jun 5, 2023

isort --profile black --length-sort --line-width 120 --skip-glob "*_pb2.py" . ; \
black -l 120 --extend-exclude ".*_pb2.py" . ; \
ruff check --config .github/ruff.toml . ; \
pycodestyle --exclude="*_pb2.py" --show-source capa/ scripts/ tests/ ; \
mypy --config-file .github/mypy/mypy.ini --check-untyped-defs capa/ scripts/ tests/

via here:

- name: Lint with ruff
run: ruff check --config .github/ruff.toml .
- name: Lint with isort
run: isort --profile black --length-sort --line-width 120 --skip-glob "*_pb2.py" -c .
- name: Lint with black
run: black -l 120 --extend-exclude ".*_pb2.py" --check .
- name: Lint with pycodestyle
run: pycodestyle --exclude="*_pb2.py" --show-source capa/ scripts/ tests/
- name: Check types with mypy
run: mypy --config-file .github/mypy/mypy.ini --check-untyped-defs capa/ scripts/ tests/

there's also a helper script here: https://github.com/mandiant/capa/blob/master/scripts/ci.sh
though to be honest, i use the above command and not the script, so im not sure if its out of date.

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code style passes and tests pass, woohoo!

nice work @yelhamer

@williballenthin williballenthin merged commit 5709517 into mandiant:master Jun 5, 2023
@yelhamer yelhamer added the enhancement New feature or request label Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

3 participants