Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Binja backend #1343

Merged
merged 24 commits into from
Mar 24, 2023
Merged

Add Binja backend #1343

merged 24 commits into from
Mar 24, 2023

Conversation

xusheng6
Copy link
Contributor

@xusheng6 xusheng6 commented Mar 3, 2023

Current status:

  1. All code written
  2. All unit test passing, except for the sample 3b13b https://github.com/Vector35/capa/blob/8a5082d31305231a58d5ef2d5dbcfbad4459454c/tests/fixtures.py#L691. Which BN takes a really long time to process. I have picked a different test file in dcb385f
  3. Formatting done
  4. Changelog done
  5. Pyinstaller fixed
  6. PyPI package no changes required

Todos:

  • configure BN in GH Actions so we can test the BN backend

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@mike-hunhoff mike-hunhoff marked this pull request as draft March 3, 2023 16:06
@mike-hunhoff
Copy link
Collaborator

This is great @xusheng6 . I've converted the PR to a draft, for now, while we discuss and collaborate 🚀

@github-actions github-actions bot dismissed their stale review March 4, 2023 10:07

CHANGELOG updated or no update needed, thanks! 😄

@xusheng6 xusheng6 marked this pull request as ready for review March 5, 2023 05:16
@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 5, 2023

@mike-hunhoff I think this is ready for review. Please take a look and let me know if there are any issues!

To get started, please first install the Binary Ninja Python API as in https://docs.binary.ninja/dev/batch.html#install-the-api.

To use the binja backend, run capa with -b binja.

To run the unit test (which could not be run in the GitHub CI, since there is no binja installation there), run pytest tests/test_binja_features.py.

@williballenthin williballenthin self-requested a review March 6, 2023 08:50
@williballenthin williballenthin added enhancement New feature or request binary-ninja labels Mar 6, 2023
@williballenthin
Copy link
Collaborator

i haven't had a chance to do a full review yet - just getting familiar :-)

@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 6, 2023 via email

@williballenthin
Copy link
Collaborator

binary ninja emits logging messages via its own infrastructure (not the same as Python's logging module). we should investigate whether these can be unified, and/or if we want to hide the logging messages from BN:

image

@williballenthin
Copy link
Collaborator

williballenthin commented Mar 23, 2023

tests pass for me!

image

viv tests take 118s, for comparison. but we haven't even begun to dig into performance of the BN backend so i'm not worried by this yet.

@williballenthin
Copy link
Collaborator

williballenthin commented Mar 23, 2023

example of Binary Ninja backend running on GH Actions CI/CD: https://github.com/mandiant/capa/actions/runs/4504334985/jobs/7928679764

image

it was really easy to do. see the key part here:

- name: install Binary Ninja
env:
BN_SERIAL: ${{ secrets.BN_SERIAL }}
run: |
mkdir ./.github/binja
curl "https://raw.githubusercontent.com/Vector35/binaryninja-api/6812c97/scripts/download_headless.py" -o ./.github/binja/download_headless.py
python ./.github/binja/download_headless.py --serial $BN_SERIAL --output .github/binja/BinaryNinja-headless.zip
unzip .github/binja/BinaryNinja-headless.zip -d .github/binja/
python .github/binja/binaryninja/scripts/install_api.py --install-on-root --silent
- name: Run tests
env:
BN_LICENSE: ${{ secrets.BN_LICENSE }}
run: pytest -v tests/test_binja_features.py # explicitly refer to the binja tests for performance. other tests run above.

@xusheng6
Copy link
Contributor Author

example of Binary Ninja backend running on GH Actions CI/CD: https://github.com/mandiant/capa/actions/runs/4504334985/jobs/7928679764

image

it was really easy to do. see the key part here:

- name: install Binary Ninja
env:
BN_SERIAL: ${{ secrets.BN_SERIAL }}
run: |
mkdir ./.github/binja
curl "https://raw.githubusercontent.com/Vector35/binaryninja-api/6812c97/scripts/download_headless.py" -o ./.github/binja/download_headless.py
python ./.github/binja/download_headless.py --serial $BN_SERIAL --output .github/binja/BinaryNinja-headless.zip
unzip .github/binja/BinaryNinja-headless.zip -d .github/binja/
python .github/binja/binaryninja/scripts/install_api.py --install-on-root --silent
- name: Run tests
env:
BN_LICENSE: ${{ secrets.BN_LICENSE }}
run: pytest -v tests/test_binja_features.py # explicitly refer to the binja tests for performance. other tests run above.

Excellent!

@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 24, 2023

tests pass for me!

image

viv tests take 118s, for comparison. but we haven't even begun to dig into performance of the BN backend so i'm not worried by this yet.

The Binja backend might be slower than vivisect for several reasons, e.g., it actually does more analysis than viv. On my M1 mbp air, the test typically takes 190 seconds to run, which I think is reasonable. The 360 seconds on your end is a bit longer than my expectation, I am not sure what hardware are you working on. Also I see there is an error message related to failure of PDB download. It could be slowing down the test in two ways: 1. BN waits for the download until it eventually times out, and wasted a lot of time; 2. there is a test binary kernel32 x86, that, when there is no debug symbol, takes quite while to fully analyze.

Oh btw, is your environment connected to the internet and can fetch the PDBs by default? You can test it with the headed version of BN which you bought earlier.

I see that the binja test takes 660 seconds to run on the Github runner. That is also longer than my expectation. I will check if our headless instance only utilizes one core or all available cores by default.

@xusheng6
Copy link
Contributor Author

binary ninja emits logging messages via its own infrastructure (not the same as Python's logging module). we should investigate whether these can be unified, and/or if we want to hide the logging messages from BN:

image

There is a way to redirect the log messages from BN to a file https://api.binary.ninja/binaryninja.log-module.html#binaryninja.log.log_to_file. I suggest we do not simply silent the log messages since they can provide valuable insights to test failures (if there is any). As a result, we may need to set up the CI to collect the log file as well

@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 24, 2023

Cannot wait to share this exciting new feature in latest BN dev build: automatic stack string detection and outlining!

image

I will update the binja backend to also take advantage of it later. Although I will not rush on this since we test capa with the stable build with BN.

@xusheng6
Copy link
Contributor Author

binary ninja emits logging messages via its own infrastructure (not the same as Python's logging module). we should investigate whether these can be unified, and/or if we want to hide the logging messages from BN:

image

@psifertex What is your take on this? Since I am not super familiar with our logger code

@williballenthin
Copy link
Collaborator

re: logging

from the perspective of an application (like capa) that uses BN as a library to do some analysis, it would be nice to register a callback with BN that is invoked with each log message and can handle emitting it. then, we could write a small shim that places the messages into Python's logging subsystem.

however, i recognize that this is not a common use of BN today and that this would introduce more complexity to the BN API. in most setups, BN is the "host application" and naturally wants to control the logs.

in the short term, perhaps we can write the logs into a temporary file and then replay them at the end of capa's execution or the test case. this would fix the issue of the BN logging output breaking our ASCII art animations.

@xusheng6
Copy link
Contributor Author

re: logging

from the perspective of an application (like capa) that uses BN as a library to do some analysis, it would be nice to register a callback with BN that is invoked with each log message and can handle emitting it. then, we could write a small shim that places the messages into Python's logging subsystem.

however, i recognize that this is not a common use of BN today and that this would introduce more complexity to the BN API. in most setups, BN is the "host application" and naturally wants to control the logs.

in the short term, perhaps we can write the logs into a temporary file and then replay them at the end of capa's execution or the test case. this would fix the issue of the BN logging output breaking our ASCII art animations.

It is definitely possible and not very complex to intercept the BN logs -- we just need to register a Logger from the Python API. Then, as you mentioned, we can redirect it to the Python logging system, so pytest will capture all of them and only print them out at the end. Is this what you intended?

@williballenthin
Copy link
Collaborator

williballenthin commented Mar 24, 2023

a couple strategies, sketched live here:

1: register callback with BN

LEVEL_DEBUG = 0
LEVEL_INFO = 1
...

logger = logging.getLogger("my-logger")

def my_logger(level, message):
    if level == LEVEL_DEBUG:
        logger.debug(message)
    elif level == LEVEL_INFO:
    ...

binaryninja.core_register_logging_callback(my_logger)

2: configure BN to use python logger

logger = logging.getLogger("my-logger")
binaryninja.core_register_logger(logger)

3: configure BN to use python logger by name

binaryninja.core_register_logger("my-logger")

1 seems more flexible but lower level. 2 seems more Pythonic. 3 is the shortest and most convenient.
3 would also let you use the provided prefix, so you could do things like log to my-logger.pdb and my-logger.hlil and let the client decide how each of these are handled.


open questions:

im not sure how you'd split logs into different streams/subsystems, like if you want to ignore PDB logs and print HLIL logs or whatever. maybe its up to the client to filter on the logging message content? this is possible with a logging.Handler object, i think.

whenever you register something, you also need to provide a way to unregister the thing. doesn't seem too hard here.

can multiple loggers be registered at the same time? "no" is probably ok.

@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 24, 2023 via email

@williballenthin

This comment was marked as outdated.

@williballenthin williballenthin merged commit ddc52fa into mandiant:master Mar 24, 2023
@williballenthin
Copy link
Collaborator

thank you @xusheng6, merged!

@xusheng6
Copy link
Contributor Author

@williballenthin Thanks for merging!

However, I was not expecting you to directly merge my PR after I merge your PR. I thought we were going to continue working on it. The git history now looks very messy. Is it fine? Or in other words I should have rebass-and-merge your PR rather than generating a merge commit. I was on mobile earlier when I merge it and I was planning to tell you about it.

@williballenthin
Copy link
Collaborator

ah, im sorry!

its true the git history is a little messy, though if you were to look at capa's history, you'd see... lots of messy commits (maybe 100+ with "pep8" as the description). we'll have to improve this with time. sorry i didn't give you a chance to help here.

i merged since i didn't have any immediately outstanding issues and i thought the same with you. lets continue to work together via issues and subsequent PRs when we have new changes (e.g. logging infrastructure, capa UI plugin, etc.).

@williballenthin
Copy link
Collaborator

in the future i'll be more careful and explicit with my plans :-)

@xusheng6
Copy link
Contributor Author

in the future i'll be more careful and explicit with my plans :-)

I will also better communicate my intentions in the future! Let us look forward and make new PRs!

@xusheng6 xusheng6 deleted the binja_backend branch March 24, 2023 11:28
@psifertex
Copy link
Contributor

Excited to see this land! Thanks to both of you for the work seeing it through.

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 28, 2023

Really cool. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary-ninja enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants