-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract api names from ELF debug symbols [vivisect] #1443
Extract api names from ELF debug symbols [vivisect] #1443
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
CHANGELOG updated or no update needed, thanks! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for asking for input early on. i've provided my vision inline, but i'm open to discussion on any of the topics. don't hesitate to suggest alternatives!
i'd recommend developing a few tests right away so we can agree what features should be extracted - and then we can dig into the implementation more. maybe assert that API features __GI_connect
, __libc_connect
, and connect
are all found at the address of the call instruction (0x40286D in 72f1b91327ffda4cf18a2bf64913b673d39ebbff8cbe50c9cd354b1dcd312bcc).
also, (new feature) we should apply function-name
features to the functions themselves. this is one thing we do with the FLIRT matches, so you can say thing like "look for this pattern, unless its found within OpenSSL::create_context" or similar. example here:
capa/capa/features/extractors/viv/file.py
Line 85 in 6ba5b2b
yield FunctionName(name), addr |
Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
cd39f73
to
97c8fd0
Compare
I could not get api matches when using the FunctionName feature — as opposed to its API counterpart — for some reason. I will look into that and add tests on the next couple of commits. |
Hm, I don’t quite understand. Do you mind explaining what didn’t work a little differently?
|
I apologize for not providing much details @williballenthin. I have further elaborated bellow: To my understanding, your previous recommendation to apply ...
if sym_value == target and sym_info & STT_FUNC != 0:
yield FunctionName(sym_name), ih.address
... However, doing this resulted in no capabilities — of symtab origin — being matched by our ruleset, while the usage of My questions are:
Thanks! |
Thanks for the clarification ! I had not explained clearly enough, so it’s my fault.
We should emit function-name *in addition* to API like you were doing before. function-name features are not used to derive API features; they are independent.
API features let an author say: find a call to this function and that function together.
function-name features let an author say: find this logic when we know the name of the current function isn’t (for example) memcpy.
In practice, we might want to use function-name features when we develop the Linux raw syscall rules, because we’ll be able to do things like “look for syscall 3 when the current function isn’t already named `open`”.
|
5a4ca9c
to
b32a8ca
Compare
Thank you for the explanation! I wrongly assumed that I have two more questions:
Also, please let me know if you think there is anything else that I should be doing differently. |
Use whichever test you find more logical. Ideally, each functionality is tested separately. For the optimization the first idea is to implement it similarly to |
…iv rather than function attributes
…eader information Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good progress, some changes still needed, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some error handling remain and we're good to go. nice job @yelhamer
edit: and need to review code style. reach out if you don't have the linters configured and i can explain that.
Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
I believe all requested changes have been addressed except for the code style review, I couldn't get the linter working properly and will be asking for explanation on that after the weekend. Also, please check if the way I am doing error handling is correct. I am assuming that all Exceptions that get raised during SymTab construction are due to the symbol's table being faulty, and I am not handling the case of an empty symtab since that'd just result in an empty symbols' generator (for the |
isort --profile black --length-sort --line-width 120 --skip-glob "*_pb2.py" . ; \
black -l 120 --extend-exclude ".*_pb2.py" . ; \
ruff check --config .github/ruff.toml . ; \
pycodestyle --exclude="*_pb2.py" --show-source capa/ scripts/ tests/ ; \
mypy --config-file .github/mypy/mypy.ini --check-untyped-defs capa/ scripts/ tests/ via here: capa/.github/workflows/tests.yml Lines 36 to 45 in 0cbe461
there's also a helper script here: https://github.com/mandiant/capa/blob/master/scripts/ci.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code style passes and tests pass, woohoo!
nice work @yelhamer
This is a draft PR to add support for utilizing the symbol table's entries to extract the names of statically linked apis. See discussion. also #1445.
I am using the vivisct engine to fetch the symbol names. Please let me know if this is the correct approach, and what are your thoughts on it.
Tests on sample 2bf18d0403677378adad9001b1243211:
before:
after:
Checklist
No CHANGELOG update needed
No new tests needed
No documentation update needed