Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Symbolic stack traces #23

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from
Draft

[WIP] Symbolic stack traces #23

wants to merge 26 commits into from

Conversation

neitsa
Copy link
Contributor

@neitsa neitsa commented Feb 13, 2023

Hi Ely 👋

This is a work in progress PR for symbolic stack trace (should close #4).

First of all, the whole PR is in python 3 as I haven't touched Python 2 since 15 years ago... (I'm also not used to six). Please let me know if it's a showstopper.

Design Decisions

Symbolic resolution relies on dbghelp.dll and symsrv.dll so it's only usable on Windows systems. The dbghelp.dll that ships with Windows is often an old version which may not be really useful as it lacks some functionalities. Windows doesn't ship symsrv.dll - which only comes with Debugging Tools For Windows (part of the Windows SDK) - and which is used to download symbol files (*.pdb or binaries) from the Microsoft Symbol Store.

The stack trace resolution (symbolic solving) goes like this:

    p = pathlib.Path(r"c:\temp\Logfile.PML")

    with p.open("rb") as f:
        plr = ProcmonLogsReader(f)
        sym = SymbolResolver(plr)  # automatically find dbghlp and symsrv if they are available.
        for idx, ev in enumerate(plr):
            frames = list(sym.resolve_stack_trace(ev))
            print(StackTraceInformation.pretty_print(frames))

The SymbolResolver() constructor does it best to find the two aforementioned DLLs by looking for the Debugging Tools for Windows or Windbg Preview installations (see DbgHelpUtils in symbol_resolver.py class which contains a few static functions).

The problem happens when it cannot find the two DLLs, in which case the end user will have to provide the path to a directory containing both of them.

Which brings me to the main design decision of why the resolution doesn't happen within Event.stacktrace (it would have been nice to have Event.stacktrace.resolve()). Since passing the two DLLs down the classes instances (from ProcmonLogsReader to Event.stacktrace) was too complicated, I chose to have an external resolver, to which you pass the stack trace to resolve.

Output

Here's a stack Trace from ProcMon:

image

Here's a stack trace from the PR:

K 0  FLTMGR.SYS            FltpPerformPreCallbacksWorker + 0x36c                                                                                                0xfffff80513d2648c   C:\WINDOWS\System32\drivers\FLTMGR.SYS
K 1  FLTMGR.SYS            FltpPreFsFilterOperation + 0x184                                                                                                     0xfffff80513d22804   C:\WINDOWS\System32\drivers\FLTMGR.SYS
K 2  ntoskrnl.exe          FsFilterPerformCallbacks + 0xe7                                                                                                      0xfffff8050f49d7e7   C:\WINDOWS\system32\ntoskrnl.exe
K 3  ntoskrnl.exe          FsRtlAcquireFileExclusiveCommon + 0x121                                                                                              0xfffff8050f889bb1   C:\WINDOWS\system32\ntoskrnl.exe
K 4  ntoskrnl.exe          FsRtlAcquireFileExclusive + 0x17                                                                                                     0xfffff8050f8898a7   C:\WINDOWS\system32\ntoskrnl.exe
K 5  ntoskrnl.exe          CcZeroEndOfLastPage + 0x45                                                                                                           0xfffff8050f49cfb5   C:\WINDOWS\system32\ntoskrnl.exe
K 6  ntoskrnl.exe          MiCreateSectionCommon + 0x25a                                                                                                        0xfffff8050f888a1a   C:\WINDOWS\system32\ntoskrnl.exe
K 7  ntoskrnl.exe          NtCreateSection + 0x5c                                                                                                               0xfffff8050f8887ac   C:\WINDOWS\system32\ntoskrnl.exe
K 8  ntoskrnl.exe          KiSystemServiceCopyEnd + 0x25                                                                                                        0xfffff8050f60d9f5   C:\WINDOWS\system32\ntoskrnl.exe
U 9  ntdll.dll             NtCreateSection + 0x14                                                                                                               0x7ffe6e44da04       C:\Windows\System32\ntdll.dll
U 10 KernelBase.dll        CreateFileMappingNumaW + 0x111                                                                                                       0x7ffe6baceef1       C:\Windows\System32\KernelBase.dll
U 11 KernelBase.dll        CreateFileMappingW + 0x20                                                                                                            0x7ffe6baced50       C:\Windows\System32\KernelBase.dll
U 12 CreateFileMapping.exe wmain + 0x13d, G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\CreateFileMapping\CreateFileMapping.cpp (44; col: 27) 0x7ff66305153d       G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\x64\Release\CreateFileMapping.exe
U 13 CreateFileMapping.exe __scrt_common_main_seh + 0x10c, D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl (288; col: 34)                   0x7ff663051fe0       G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\x64\Release\CreateFileMapping.exe
U 14 kernel32.dll          BaseThreadInitThunk + 0x14                                                                                                           0x7ffe6e2c7614       C:\Windows\System32\kernel32.dll
U 15 ntdll.dll             RtlUserThreadStart + 0x21                                                                                                            0x7ffe6e4026a1       C:\Windows\System32\ntdll.dll

Process finished with exit code 0

ProcMon's stack trace is kinda wrong

(and I don't know exactly why)

Here's the output - from the PR code - from the first frame (Frame 0) in the stack trace:

K 0  FLTMGR.SYS            FltpPerformPreCallbacksWorker + 0x36c                                                                                                0xfffff80513d2648c   C:\WINDOWS\System32\drivers\FLTMGR.SYS

Here's the output from ProcMon (visible in the above screenshot):

K 0  FLTMGR.SYS            FltDecodeParameters + 0x210c                                                                                                0xfffff80513d2648c   C:\WINDOWS\System32\drivers\FLTMGR.SYS

Kernel module bases (from the PML trace):

>>> m = [m for m in plr.processes()[0].modules if m.path.lower().endswith("ntoskrnl.exe")][0]
>>> hex(m.base_address)
'0xfffff8050f200000'
>>> m = [m for m in plr.processes()[0].modules if m.path.lower().endswith("fltmgr.sys")][0]
>>> hex(m.base_address)
'0xfffff80513d20000'

With a kernel debugger attached to the machine that performed the trace:

0: kd> lmDvm nt
Browse full module list
start             end                 module name
fffff805`0f200000 fffff805`10246000   nt         (pdb symbols)          g:\symbols\ntkrnlmp.pdb\E2BA44E0E506968538EA272B1E5030C51\ntkrnlmp.pdb
    Loaded symbol image file: ntkrnlmp.exe
    Image path: ntkrnlmp.exe
    Image name: ntkrnlmp.exe
    Browse all global symbols  functions  data
    Image was built with /Brepro flag.
    Timestamp:        33C5E414 (This is a reproducible build file hash, not a timestamp)
    CheckSum:         00A678A1
    ImageSize:        01046000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

0: kd> lmDvm fltmgr
Browse full module list
start             end                 module name
fffff805`13d20000 fffff805`13d8d000   FLTMGR     (deferred)
    Image path: \SystemRoot\System32\drivers\FLTMGR.SYS
    Image name: FLTMGR.SYS
    Browse all global symbols  functions  data
    Image was built with /Brepro flag.
    Timestamp:        B57CD7EF (This is a reproducible build file hash, not a timestamp)
    CheckSum:         00078ED2
    ImageSize:        0006D000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

So the address to resolve is 0xfffff80513d2648c:

0: kd> u 0xfffff80513d2648c
FLTMGR!FltpPerformPreCallbacksWorker+0x36c:
fffff805`13d2648c 894597          mov     dword ptr [rbp-69h],eax
fffff805`13d2648f 8bf0            mov     esi,eax
fffff805`13d26491 83f802          cmp     eax,2
fffff805`13d26494 0f8421060000    je      FLTMGR!FltpPerformPreCallbacksWorker+0x99b (fffff805`13d26abb)
fffff805`13d2649a 488b05b7430200  mov     rax,qword ptr [FLTMGR!FltGlobals+0x1258 (fffff805`13d4a858)]
fffff805`13d264a1 4883781800      cmp     qword ptr [rax+18h],0
fffff805`13d264a6 0f85e9a60000    jne     FLTMGR!FltpPerformPreCallbacksWorker+0xaa75 (fffff805`13d30b95)
fffff805`13d264ac 4883782000      cmp     qword ptr [rax+20h],0

and the correct output is FLTMGR!FltpPerformPreCallbacksWorker+0x36c; technically the output from ProcMon is kinda right, but not really right:

0: kd> u fltmgr!FltDecodeParameters + 0x210c
FLTMGR!FltpPerformPreCallbacksWorker+0x36c:
fffff805`13d2648c 894597          mov     dword ptr [rbp-69h],eax

🤷

ProcMon's Symbolic Source Information is kinda wrong

I did a dummy test with an executable that performs a CreateFileMappingW:

image

(Notice the call happens at line 44 in the source).

U 12 CreateFileMapping.exe wmain + 0x13d, G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\CreateFileMapping\CreateFileMapping.cpp (44; col: 27) 0x7ff66305153d       G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\x64\Release\CreateFileMapping.exe

Here's the output from ProcMon (visible in the above screenshot):

U 12 CreateFileMapping.exe wmain + 0x13d, G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\CreateFileMapping\CreateFileMapping.cpp (42) 0x7ff66305153d       G:\Appdata\Python\procmon-parser\tests\src\FileSystemOperations\x64\Release\CreateFileMapping.exe

ProcMon got it wrong because (for a reason that escapes me) uses the previous code line from the source, rather than the actual line...

Stuff Left To Do

The code is pretty much complete at the moment, there are a few TODOs left in the code, but I wanted to get your feedback before cleaning up the code. I'm planning to add a bit of documentation (in markdown) on how to get it working.

I guess there will also be some corner cases about which I didn't think of, but it's working quite OK from what I can tell.

Let me know what you think (and sorry for the wall of text 😄 ).


Just to keep track of things:

  • Remove TODOs.
  • Add documentation on how to use and set up symbol resolution.
  • Remove __main__ from symbol_resolver.py.
  • Add tests.
  • Add Comments to class and functions in symbol_resolver.py.
  • Remove __post_init__ in StackTraceFrameInformation (it was used just for debugging a problem).
  • Rework system modules check for the System pseudo process.
  • Move python 3 typing to be compatible w/ Python 2.7 as hinted by PEP484.
  • Rework symbol resolving if a fully qualified path to the source file is already given by SymGetLineFromAddrW64: there's no point in calling SymGetSourceFileW in this case.
  • Add SYMOPT_DEBUG handling code to debug problems with symbol handling at dbghelp internal level.
  • convert f-strings to string.format. (2 still left)
  • Remove python 3 idioms left in the code (i.e. pathlib.path and probably other things like that).

@eronnen
Copy link
Owner

eronnen commented Feb 14, 2023

Hi @neitsa, really impressive work! this is a great addition to the library.

The bug in procmon that they give the wrong function name with a large offset in is very interesting, I always thought it was because they use a wrong PDB, good to know we do something better than procmon itself here

I went through the code and I have a few comments before I dive deeper (ping me when you're ready after cleaning):

  • I agree with the decision to keep the stacktrace resolver external, also it's nice to have an automatic way to find the dll because most people have it installed in the same paths. I didn't notice any usage of symsrv.dll though, is this dll needed too?
  • the condition if process.process_name in ["System"] and process.company.lower().startswith("microsoft") didn't work for the tests/resources/CompressedLogFileUTC64FilesystemPML file where the system process had empty company, maybe you can change it to require that the user is 'NT AUTHORITY\SYSTEM' for example?
  • I think it's still worth to support python 2 as long as it's not to complex, because I know some people still use this library with Python 2. I can patch it myself once you finish, I believe it's not too difficult with the help of some helper libraries but I might need to remove the type annotations :(.

overall looks good, thank you for contributing!

@neitsa
Copy link
Contributor Author

neitsa commented Feb 15, 2023

Hey Ely ! Thanks a lot for your feedback!

Hi @neitsa, really impressive work! this is a great addition to the library.

Thanks a lot!

The bug in procmon that they give the wrong function name with a large offset in is very interesting, I always thought it was because they use a wrong PDB, good to know we do something better than procmon itself here

Yep strange... I got the basics by looking at ProcMon statically and a bit of debugging, but still don't get why it got it kind of wrong...

I agree with the decision to keep the stacktrace resolver external, also it's nice to have an automatic way to find the dll because most people have it installed in the same paths.

Great! I also looked at retrieving the two DLLs from a Visual studio installation but it proved more complicated than I thought since VS installation has changed over time and the various VS SKUs - Enterprise, Community, etc. - have different installation scheme. I'll put some indications on how to retrieve them manually in the documentation.

I didn't notice any usage of symsrv.dll though, is this dll needed too?

Yes! Basically, dbghelp does symbol resolution while symsrv does the download (binaries or PDBs) from the symbol store. dbghelp uses symsrv silently for this. It's a bit more complicated than that regarding symsrv but that's the gist of it.

the condition if process.process_name in ["System"] and process.company.lower().startswith("microsoft") didn't work for the tests/resources/CompressedLogFileUTC64FilesystemPML file where the system process had empty company, maybe you can change it to require that the user is 'NT AUTHORITY\SYSTEM' for example?

Damn :p Thank you! The check you proposed looks definitely fine - I guess we could go with just "System" as an executable called "system" would have a full path and would be called "system.exe" (with the extension) - but, better be safe than sorry ^^. That's the kind of stuff that should be picked by thorough testing, which I haven't really put in motion yet for this PR...

I think it's still worth to support python 2 as long as it's not to complex, because I know some people still use this library with Python 2. I can patch it myself once you finish, I believe it's not too difficult with the help of some helper libraries but I might need to remove the type annotations :(.

Of course, I totally understand! It's just I'm so used to what Python 3 offers, that I'm not proficient with Python 2 anymore. For the type annotations, I think most of tools understand them if they are placed as comments right below the function declaration as proposed in PEP 484. I can do that for most of the PR code.

Although, I don't know what I can use in replacement of dataclasses and enum modules... I'm sorry for the additional work that this PR will impose on you.

After further debugging, I think I still have a thing to change right now. The last call to SymGetSourceFileW (which is meant to return the fully qualified path of the source file, if any) seems to be superfluous in most conditions since the previous call to SymGetLineFromAddrW64 returns a fully qualified path in my case... Going to add a check about that right now!

@eronnen
Copy link
Owner

eronnen commented Feb 17, 2023

Thanks for the clarifications! let me know when I can start reviewing the code

* Remove most of f-strings format.
* Add a few missing comments.
* Rewrite some typing comments.
@neitsa
Copy link
Contributor Author

neitsa commented Feb 17, 2023

@eronnen : the PR code is "finished" in its current state (by that I mean it's working as intended with all the manual tests I could do).

A few stuff I couldn't really do:

  • There are two f-strings (using justification) left in symbol_resolver.py and I'm not sure how to convert them to format().
  • Pathlib.path() used pretty much everywhere there's a file or dir path...

Currently missing tests. I'll wait for your feedback to start adding them.

Copy link
Owner

@eronnen eronnen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few changes requests :)

by the way I'm not sure if this feature is really testable due to the fact that Procmon is wrong about most of the symbols, maybe only sanity to check that the module names are matching... also small PMLs are needed because the exported XMLs that contain the symbols tend to be huge 😅

procmon_parser/symbol_resolver/symbol_resolver.py Outdated Show resolved Hide resolved
procmon_parser/symbol_resolver/symbol_resolver.py Outdated Show resolved Hide resolved
procmon_parser/symbol_resolver/symbol_resolver.py Outdated Show resolved Hide resolved
procmon_parser/symbol_resolver/symbol_resolver.py Outdated Show resolved Hide resolved
procmon_parser/symbol_resolver/symbol_resolver.py Outdated Show resolved Hide resolved
procmon_parser/symbol_resolver/symbol_resolver.py Outdated Show resolved Hide resolved

if sys.version_info >= (3, 5, 0):
import typing
import pathlib # TODO: to be converted to py 2.7 equivalent.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add pathlib to setup.py requirements (I think it has a package for 2.7 too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. Do you have a stance concerning adding new packages? I find pathlib to be so much easier to use than its os counterpart, but that would require a new dependency, which you might frown upon.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can keep it, it's supported in python 2 so I don't see anything wrong here

Copy link
Contributor Author

@neitsa neitsa Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keeping this one open until everything works correctly on py 2.7 :)

@neitsa
Copy link
Contributor Author

neitsa commented Mar 4, 2023

@eronnen Working on fixing the last bits to support py 2.7. Problem found around supporting the enumeration (in dbghelp.py):

  • enum34 does not support IntFlags.
  • aenum (which support IntFlags) chokes on all of the enums because they have numbers that are greater than the max capacity for a (signed) long.

Haven't found a way to make aenum support unsigned long numbers...

@eronnen
Copy link
Owner

eronnen commented Mar 10, 2023

@neitsa maybe instead of IntFlags you can use the regular IntEnum? a little bit more hacky but should do the job

@mitchcapper
Copy link

Hi All. Love this tool for configuration file editing and noticed this PR. I am guessing part of your symbols off by one issue is related to just my code markers https://github.com/mitchcapper/ProductionStackTrace/blob/enhancements/ProductionStackTrace.Analyze/SymbolLoader.cs#L177-L181

between that and some auto generated state machines getting to a proper looking stack trace can take a bit of unwinding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make stacktrace human readable
3 participants