Extract api names from ELF debug symbols [vivisect] #1443

yelhamer · 2023-04-14T03:22:44Z

This is a draft PR to add support for utilizing the symbol table's entries to extract the names of statically linked apis. See discussion. also #1445.

I am using the vivisct engine to fetch the symbol names. Please let me know if this is the correct approach, and what are your thoughts on it.

Tests on sample 2bf18d0403677378adad9001b1243211:

before:

after:

Checklist

No CHANGELOG update needed
No new tests needed
No documentation update needed

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

CHANGELOG updated or no update needed, thanks! 😄

williballenthin

thanks for asking for input early on. i've provided my vision inline, but i'm open to discussion on any of the topics. don't hesitate to suggest alternatives!

i'd recommend developing a few tests right away so we can agree what features should be extracted - and then we can dig into the implementation more. maybe assert that API features __GI_connect, __libc_connect, and connect are all found at the address of the call instruction (0x40286D in 72f1b91327ffda4cf18a2bf64913b673d39ebbff8cbe50c9cd354b1dcd312bcc).

also, (new feature) we should apply function-name features to the functions themselves. this is one thing we do with the FLIRT matches, so you can say thing like "look for this pattern, unless its found within OpenSSL::create_context" or similar. example here:

capa/capa/features/extractors/viv/file.py

Line 85 in 6ba5b2b

yield FunctionName(name), addr

CHANGELOG.md

capa/features/extractors/viv/insn.py

Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>

CHANGELOG.md

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

yelhamer · 2023-04-22T00:42:32Z

I could not get api matches when using the FunctionName feature — as opposed to its API counterpart — for some reason. I will look into that and add tests on the next couple of commits.

williballenthin · 2023-04-22T16:01:53Z

Hm, I don’t quite understand. Do you mind explaining what didn’t work a little differently?

yelhamer · 2023-04-22T17:15:26Z

I apologize for not providing much details @williballenthin. I have further elaborated bellow:

To my understanding, your previous recommendation to apply function-name features to the functions themselves meant using FunctionName() to yield the resolved functions, as opposed to API(). So in my case that means something like this on line 150 of my commit to insn.py:

...
if sym_value == target and sym_info & STT_FUNC != 0:
    yield FunctionName(sym_name), ih.address
...

However, doing this resulted in no capabilities — of symtab origin — being matched by our ruleset, while the usage of API to yield the functions resulted in the intended behavior of all the expected capabilities being extracted:

My questions are:

Could you please confirm whether I understood your recommendation of using function-name correctly?
and if I did, could you please direct me at what I should be doing in order to get this working as expected?

Thanks!

williballenthin · 2023-04-22T18:07:03Z

Thanks for the clarification ! I had not explained clearly enough, so it’s my fault. We should emit function-name *in addition* to API like you were doing before. function-name features are not used to derive API features; they are independent. API features let an author say: find a call to this function and that function together. function-name features let an author say: find this logic when we know the name of the current function isn’t (for example) memcpy. In practice, we might want to use function-name features when we develop the Linux raw syscall rules, because we’ll be able to do things like “look for syscall 3 when the current function isn’t already named `open`”.

…as well

yelhamer · 2023-04-23T02:47:00Z

Thank you for the explanation! I wrongly assumed that function-name did the same job as API in this context, I take responsibility for that.

I have two more questions:

Regarding tests — should I combine the testing of symtab-based api extraction as well as the possible identification of common gcc wrappers (as discussed here) into a single test file? or should I test the api extraction using symtab information as part of the viv feature extraction tests here?
In my commit, I define a new function attribute in order to avoid redoing the symbol table extraction process each time the function gets called; however, this is causing the mypy code_style check to fail (error message). What do you think of this idea — usage of function attributes; is it correct? or should I look for an alternative?

Also, please let me know if you think there is anything else that I should be doing differently.

mr-tz · 2023-04-24T10:35:25Z

Use whichever test you find more logical. Ideally, each functionality is tested separately.

For the optimization the first idea is to implement it similarly to get_imports.

…iv rather than function attributes

capa/features/extractors/elf.py

…eader information Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>

…data

williballenthin

good progress, some changes still needed, though.

capa/features/extractors/viv/insn.py

tests/fixtures.py

williballenthin

just some error handling remain and we're good to go. nice job @yelhamer

edit: and need to review code style. reach out if you don't have the linters configured and i can explain that.

capa/features/extractors/viv/insn.py

capa/features/extractors/viv/function.py

Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>

yelhamer · 2023-06-03T00:45:13Z

I believe all requested changes have been addressed except for the code style review, I couldn't get the linter working properly and will be asking for explanation on that after the weekend.

Also, please check if the way I am doing error handling is correct. I am assuming that all Exceptions that get raised during SymTab construction are due to the symbol's table being faulty, and I am not handling the case of an empty symtab since that'd just result in an empty symbols' generator (for the for symbol in symtab.get_symbols(): part)

capa/features/extractors/viv/function.py

williballenthin · 2023-06-05T09:00:03Z

isort --profile black --length-sort --line-width 120 --skip-glob "*_pb2.py" . ; \
black -l 120 --extend-exclude ".*_pb2.py" . ; \
ruff check --config .github/ruff.toml . ; \
pycodestyle --exclude="*_pb2.py" --show-source capa/ scripts/ tests/ ; \
mypy --config-file .github/mypy/mypy.ini --check-untyped-defs capa/ scripts/ tests/

via here:

capa/.github/workflows/tests.yml

Lines 36 to 45 in 0cbe461

    
           - name: Lint with ruff 
        
             run: ruff check --config .github/ruff.toml . 
        
           - name: Lint with isort 
        
             run: isort --profile black --length-sort --line-width 120 --skip-glob "*_pb2.py" -c . 
        
           - name: Lint with black 
        
             run: black -l 120 --extend-exclude ".*_pb2.py" --check . 
        
           - name: Lint with pycodestyle 
        
             run: pycodestyle --exclude="*_pb2.py" --show-source capa/ scripts/ tests/ 
        
           - name: Check types with mypy 
        
             run: mypy --config-file .github/mypy/mypy.ini --check-untyped-defs capa/ scripts/ tests/

there's also a helper script here: https://github.com/mandiant/capa/blob/master/scripts/ci.sh
though to be honest, i use the above command and not the script, so im not sure if its out of date.

williballenthin

code style passes and tests pass, woohoo!

nice work @yelhamer

insn extractor: Add static api extraction using .symtab

c71cb55

github-actions bot previously requested changes Apr 14, 2023

View reviewed changes

Update CHANGELOG.md

21f2cb6

yelhamer changed the title ~~Add support for api extraction from statically linked libraries.~~ Add support for api extraction from statically linked libraries Apr 15, 2023

williballenthin requested changes Apr 17, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved

Update CHANGELOG.md

44254bf

Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>

mr-tz reviewed Apr 21, 2023

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

Update CHANGELOG.md

97c8fd0

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

yelhamer force-pushed the feature-static-api-names branch 2 times, most recently from cd39f73 to 97c8fd0 Compare April 21, 2023 23:49

yelhamer added 2 commits April 22, 2023 01:33

Shdr: add a constructor for vivisect's shdr representation

e7ccea4

insn.py: rewire symbol parsing to use SymTab instead of vivisect

b766d95

insn.py: Get the symtab api extractor to yield FunctionName features …

b32a8ca

…as well

yelhamer force-pushed the feature-static-api-names branch 2 times, most recently from 5a4ca9c to b32a8ca Compare April 23, 2023 01:27

code style: Fix the format of the committed code

ee881ab

insn.py: Update extract_insn_api_features() to optimize by means of v…

695508a

…iv rather than function attributes

williballenthin reviewed Apr 25, 2023

View reviewed changes

capa/features/extractors/elf.py Outdated Show resolved Hide resolved

Shdr constructor: Use direct member access to get vstruct's section h…

c7b65cf

…eader information Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>

yelhamer changed the title ~~Add support for api extraction from statically linked libraries~~ Extract api names from ELF debug symbols [vivisect] May 31, 2023

yelhamer added 2 commits June 1, 2023 01:50

add tests for vivisect's usage of debug symbols

64ef2c8

fix style issues

f10a43a

yelhamer marked this pull request as ready for review June 1, 2023 01:03

yelhamer added 4 commits June 1, 2023 12:45

return the target's address for the function-name feature

994edf6

fix strtab naming

8d1e1cc

use the function-handle's cache instead of the VivWorkspace file meta…

d85d01e

…data

fix strtab renaming error

1cec768

williballenthin requested changes Jun 2, 2023

View reviewed changes

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved

tests/fixtures.py Outdated Show resolved Hide resolved

williballenthin and others added 4 commits June 2, 2023 09:26

Merge branch 'master' into feature-static-api-names

64dca7d

add a method to construct SymTab objects from Elf objects

dde76e3

add FunctionName extraction at the function scope

9467ee6

update symtab-based FunctionName feature extraction

41c5126

williballenthin requested changes Jun 2, 2023

View reviewed changes

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved

capa/features/extractors/viv/insn.py Outdated Show resolved Hide resolved

capa/features/extractors/viv/function.py Outdated Show resolved Hide resolved

yelhamer and others added 8 commits June 2, 2023 15:56

delete functionName extraction at instruction level

0b834a1

Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>

elf.py: fix identation error

4976375

remove usage of vsGetField

151ef95

add missing Shdr.from_viv() method

764fda8

fix broken logic in extract_function_symtab_names()

6b2710a

add error handling to SymTab and its callers

5b903ca

fix code style

be5ada2

Merge branch 'master' into feature-static-api-names

7dff76b

yelhamer requested a review from williballenthin June 3, 2023 00:45

williballenthin reviewed Jun 5, 2023

View reviewed changes

capa/features/extractors/viv/function.py Show resolved Hide resolved

yelhamer added 5 commits June 5, 2023 12:01

fix codestyle issues

e971bc4

fix mypy typing issues

65f18ae

fix viv/extractor.py codestyle imports

103b384

fix typo: "Elf" to "elf"

9b0fb74

fix symtab FunctionName feature scope address

5b260c0

williballenthin approved these changes Jun 5, 2023

View reviewed changes

williballenthin merged commit 5709517 into mandiant:master Jun 5, 2023

yelhamer added the enhancement New feature or request label Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract api names from ELF debug symbols [vivisect] #1443

Extract api names from ELF debug symbols [vivisect] #1443

yelhamer commented Apr 14, 2023 •

edited by williballenthin

Loading

github-actions bot left a comment

williballenthin left a comment •

edited

Loading

yelhamer commented Apr 22, 2023 •

edited

Loading

williballenthin commented Apr 22, 2023 via email

yelhamer commented Apr 22, 2023

williballenthin commented Apr 22, 2023 via email

yelhamer commented Apr 23, 2023

mr-tz commented Apr 24, 2023

williballenthin left a comment

williballenthin left a comment •

edited

Loading

yelhamer commented Jun 3, 2023

williballenthin commented Jun 5, 2023 •

edited

Loading

williballenthin left a comment

Extract api names from ELF debug symbols [vivisect] #1443

Extract api names from ELF debug symbols [vivisect] #1443

Conversation

yelhamer commented Apr 14, 2023 • edited by williballenthin Loading

Checklist

github-actions bot left a comment

Choose a reason for hiding this comment

williballenthin left a comment • edited Loading

Choose a reason for hiding this comment

yelhamer commented Apr 22, 2023 • edited Loading

williballenthin commented Apr 22, 2023 via email

yelhamer commented Apr 22, 2023

williballenthin commented Apr 22, 2023 via email

yelhamer commented Apr 23, 2023

mr-tz commented Apr 24, 2023

williballenthin left a comment

Choose a reason for hiding this comment

williballenthin left a comment • edited Loading

Choose a reason for hiding this comment

yelhamer commented Jun 3, 2023

williballenthin commented Jun 5, 2023 • edited Loading

williballenthin left a comment

Choose a reason for hiding this comment

yelhamer commented Apr 14, 2023 •

edited by williballenthin

Loading

williballenthin left a comment •

edited

Loading

yelhamer commented Apr 22, 2023 •

edited

Loading

williballenthin left a comment •

edited

Loading

williballenthin commented Jun 5, 2023 •

edited

Loading