Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate profiling results from running capa in Ghidra #1736

Open
mike-hunhoff opened this issue Aug 17, 2023 · 4 comments
Open

evaluate profiling results from running capa in Ghidra #1736

mike-hunhoff opened this issue Aug 17, 2023 · 4 comments
Labels
enhancement New feature or request ghidra Related to Ghidra integration gsoc Work related to Google Summer of Code project.

Comments

@mike-hunhoff
Copy link
Collaborator

Here is a profiling snippet from running capa on mimikatz.exe_ in Ghidra. Let's review and see if there are opportunities to reduce the cumulative times for Ghidra-related functions:

         273564231 function calls (273198325 primitive calls) in 204.308 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006  204.326  204.326 main.py:1346(ghidra_main)
        1    0.140    0.140  202.915  202.915 main.py:248(find_capabilities)
     2115    0.563    0.000  158.903    0.075 main.py:189(find_code_capabilities)
    29154    1.346    0.000  121.782    0.004 main.py:149(find_basic_block_capabilities)
   187775    1.357    0.000   99.137    0.001 __init__.py:1375(match)
   375550    4.601    0.000   95.863    0.000 engine.py:290(match)
  3896087    4.826    0.000   89.836    0.000 __init__.py:768(evaluate)
1299570/1069305    5.880    0.000   54.602    0.000 engine.py:138(evaluate)
   156505    1.819    0.000   51.868    0.000 main.py:122(find_instruction_capabilities)
4208856/4130070    9.631    0.000   51.802    0.000 engine.py:105(evaluate)
        1    0.020    0.020   43.212   43.212 main.py:227(find_file_capabilities)
    11246    0.004    0.000   40.881    0.004 extractor.py:34(extract_file_features)
    11246    0.004    0.000   40.877    0.004 file.py:171(extract_features)
   515436   29.252    0.000   33.562    0.000 jepwrappers.py:102(wrapped)
   405942    0.327    0.000   33.242    0.000 extractor.py:64(extract_insn_features)
   405942    1.173    0.000   32.915    0.000 insn.py:410(extract_features)
        1    0.000    0.000   26.628   26.628 file.py:75(extract_file_embedded_pe)
        1    0.005    0.005   26.628   26.628 file.py:26(check_segment_for_pe)
      513    0.013    0.000   26.617    0.052 helpers.py:30(find_byte_sequence)
  6596961   12.473    0.000   23.514    0.000 common.py:169(evaluate)
 49141101    9.370    0.000   21.505    0.000 {built-in method builtins.isinstance}
  1783263    6.563    0.000   21.033    0.000 common.py:387(evaluate)
     9777    0.077    0.000   13.813    0.001 file.py:121(extract_file_strings)
        7   13.215    1.888   13.620    1.946 helpers.py:60(get_block_bytes)
 40463184    5.810    0.000   12.150    0.000 abc.py:117(__instancecheck__)
   284189    2.918    0.000   11.424    0.000 common.py:302(evaluate)
   185659    0.190    0.000   10.457    0.000 extractor.py:59(get_instructions)
   185659    4.195    0.000   10.267    0.000 helpers.py:92(get_insn_in_range)
    95834    9.859    0.000    9.914    0.000 helpers.py:207(check_addr_for_api)
   162176    0.253    0.000    9.744    0.000 insn.py:82(extract_insn_api_features)
 14778790    8.681    0.000    8.681    0.000 common.py:78(__init__)
    68626    0.968    0.000    7.793    0.000 insn.py:32(check_for_api_call)
 16627263    4.487    0.000    6.359    0.000 common.py:123(__hash__)
 40463184    6.338    0.000    6.340    0.000 {built-in method _abc._abc_instancecheck}
  9803117    2.691    0.000    5.676    0.000 {method 'get' of 'dict' objects}
   156647    0.688    0.000    5.401    0.000 insn.py:269(extract_insn_cross_section_cflow)
   614276    0.257    0.000    5.092    0.000 jepwrappers.py:52(get_script)
   620466    0.552    0.000    4.906    0.000 jepwrappers.py:44(get_state)
    13694    0.011    0.000    4.743    0.000 extractor.py:48(extract_function_features)
    13694    0.019    0.000    4.732    0.000 function.py:52(extract_features)
   139179    1.285    0.000    4.550    0.000 common.py:210(evaluate)
    58594    0.070    0.000    4.359    0.000 extractor.py:56(extract_basic_block_features)
    58594    0.090    0.000    4.289    0.000 basicblock.py:121(extract_features)
     2716    2.815    0.001    3.860    0.001 function.py:28(extract_function_loop)
   620466    3.804    0.000    3.804    0.000 jepwrappers.py:32(get_java_thread_id)
   192765    1.274    0.000    3.732    0.000 insn.py:139(extract_insn_offset_features)
   284189    0.344    0.000    3.699    0.000 common.py:356(__init__)
   801181    0.458    0.000    3.515    0.000 {built-in method builtins.any}
   284189    0.943    0.000    3.355    0.000 common.py:284(__init__)
   626020    0.562    0.000    3.311    0.000 helpers.py:234(is_call_or_jmp)
   161807    0.527    0.000    3.126    0.000 engine.py:188(evaluate)
   189149    1.141    0.000    3.081    0.000 insn.py:99(extract_insn_number_features)
   265558    1.161    0.000    3.080    0.000 common.py:437(evaluate)
    31269    0.042    0.000    2.913    0.000 extractor.py:51(get_basic_blocks)
    31269    2.685    0.000    2.871    0.000 helpers.py:84(get_function_blocks)
   156565    2.019    0.000    2.690    0.000 insn.py:161(extract_insn_bytes_features)
    29159    0.024    0.000    2.429    0.000 basicblock.py:99(extract_bb_stackstring)
    29154    1.296    0.000    2.405    0.000 basicblock.py:73(bb_contains_stackstring)
  1824852    2.390    0.000    2.390    0.000 helpers.py:235(<genexpr>)
   159545    2.101    0.000    2.168    0.000 insn.py:188(extract_insn_string_features)
 18060589    2.021    0.000    2.021    0.000 {built-in method builtins.hash}
   154382    1.755    0.000    1.755    0.000 helpers.py:238(is_sp_modified)
    29435    0.030    0.000    1.672    0.000 basicblock.py:107(extract_bb_tight_loop)
    29154    1.068    0.000    1.641    0.000 basicblock.py:87(_bb_has_tight_loop)
   156905    0.885    0.000    1.480    0.000 insn.py:217(extract_insn_obfs_call_plus_5_characteristic_features)
 14771099    1.427    0.000    1.427    0.000 common.py:96(__bool__)
   160217    0.776    0.000    1.307    0.000 helpers.py:245(is_stack_referenced)
    62991    0.295    0.000    1.190    0.000 jepwrappers.py:310(wrapped_currentProgram)
   758654    0.990    0.000    1.122    0.000 common.py:107(__init__)
[...]
@mike-hunhoff mike-hunhoff added enhancement New feature or request ghidra Related to Ghidra integration gsoc Work related to Google Summer of Code project. labels Aug 17, 2023
@williballenthin
Copy link
Collaborator

        1    0.000    0.000   26.628   26.628 file.py:75(extract_file_embedded_pe)
        1    0.005    0.005   26.628   26.628 file.py:26(check_segment_for_pe)

these stand out. 26 seconds to scan the file bytes for the MZ header? seems slow to me.

@williballenthin
Copy link
Collaborator

        7   13.215    1.888   13.620    1.946 helpers.py:60(get_block_bytes)

this one too

@mike-hunhoff
Copy link
Collaborator Author

        7   13.215    1.888   13.620    1.946 helpers.py:60(get_block_bytes)

this one too

see #1761

@mike-hunhoff
Copy link
Collaborator Author

        1    0.000    0.000   26.628   26.628 file.py:75(extract_file_embedded_pe)
        1    0.005    0.005   26.628   26.628 file.py:26(check_segment_for_pe)

these stand out. 26 seconds to scan the file bytes for the MZ header? seems slow to me.

see #1767 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ghidra Related to Ghidra integration gsoc Work related to Google Summer of Code project.
Projects
None yet
Development

No branches or pull requests

2 participants