Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Symbolic stack traces #23

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
9715544
export maximum user land address, which is needed to resolve symbols.
neitsa Feb 13, 2023
ab25bc1
Add dbghelp wrapper.
neitsa Feb 13, 2023
b4858ae
Add symbol resolver.
neitsa Feb 13, 2023
f8f86a0
Remove __main__; remove __post__init__ in StackTraceFrameInformation.
neitsa Feb 14, 2023
60fd840
Add comments; Rename function:
neitsa Feb 14, 2023
b18d59f
Add conditional import in procmon_parser init module.
neitsa Feb 14, 2023
1e24bd9
Change dbghelp initialization and cleanup.
neitsa Feb 14, 2023
8f3ccb9
Fix _NT_SYMBOL_PATH wrong path.
neitsa Feb 14, 2023
869ba9d
Fix: prevent None deref if no module information is present.
neitsa Feb 14, 2023
b66af92
Change check to retrieve system modules from System process.
neitsa Feb 15, 2023
efbd01c
Don't call SymGetSourceFileW if we already have a fully qualified pat…
neitsa Feb 15, 2023
a47ec21
Remove usage of typing and dataclasses.
neitsa Feb 15, 2023
250dcd7
Fix rookie mistake of a comma in ctor...
neitsa Feb 15, 2023
28357e1
Add a way to override _NT_SYMBOL_PATH.
neitsa Feb 16, 2023
f52fecc
Add documentation on symbol resolving.
neitsa Feb 16, 2023
849dbb1
Add debug callback to debug symbol resolution problems.
neitsa Feb 16, 2023
31d2450
Fixes and Python 2.7 compat:
neitsa Feb 17, 2023
90eeff6
rm kernel32.py since it's not used.
neitsa Feb 18, 2023
bec1c22
Fixes following review:
neitsa Feb 22, 2023
704ccdb
Fix following review: use for/else construct for system modules.
neitsa Feb 22, 2023
8359bef
Fix following review: use MAX_PATH constant rather than hardcoded num…
neitsa Feb 22, 2023
55dc790
Fix following review: Add ERROR_FILE_NOT_FOUND constant.
neitsa Feb 22, 2023
928d261
Fix following review: rework symbol file retrieval.
neitsa Feb 22, 2023
161de34
Fix following review: raise if symbol path is not set and skip_symsrv…
neitsa Feb 23, 2023
cf82b9d
Fix following review: use a context manager to handle symbol engine i…
neitsa Feb 24, 2023
e57aaf9
More fixes for python 2.7.
neitsa Mar 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 283 additions & 0 deletions docs/StackTraceSymbolicResolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
# Resolving Stack Traces With `procmon-parser`

## Limitations and Constraints

Symbolic stack trace resolution has the following limitations:

* It is based on Windows libraries, thus it is only available on **Windows** systems.
* It requires an Internet connection.
* It connects and download files from a Microsoft Server.
- By doing so, it requires you to accept Microsoft License terms.

## Basics

All events in a ProcMon trace have a [stack trace](https://en.wikipedia.org/wiki/Stack_trace).

Below is an example of an event in a ProcMon capture:

![Event](./pictures/event.png)

The event captured is a thread creation in `explorer.exe`.

If you double-click the event, you are brought to a new window with 3 tabs, the interesting one in our case being the
stack trace tab. Below is an example of the aforementioned event with an **unresolved** stack trace:

![Stack Trace No Symbols](./pictures/stack_trace_no_symbols.png)

And the same event with a **resolved** stack trace:

![Stack Trace No Symbols](./pictures/stack_trace_with_symbols.png)

In the above pictures, after symbolic resolution of the addresses, the latter were resolved to their function names and
offsets and sometimes the source code position where the call happens (frames 12 and 13).

Stack Traces are composed of frames (there are 25 frames in the above example), and is read from bottom to top: the
oldest call happens at the bottom and goes to the top, traversing all the frames in-between. Once the top function
returns, all the frames are unstacked and eventually the code flow goes back to the first frame (at the bottom).

## Resolving a Stack Trace with `procmon-parser`

Resolving a stack trace in `procmon-parser` can be as simple as follows:

```python
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import pathlib
import sys

from procmon_parser import ProcmonLogsReader, SymbolResolver, StackTraceInformation

def main():
log_file = pathlib.Path(r"c:\temp\Logfile.PML")

with log_file.open("rb") as f:
procmon_reader = ProcmonLogsReader(f)
symbol_resolver = SymbolResolver(procmon_reader)
for idx, event in enumerate(procmon_reader):
if idx == 213:
frames = list(symbol_resolver.resolve_stack_trace(event))
print(StackTraceInformation.prettify(frames))

if __name__ == "__main__":
sys.exit(main())
```

## Setting Up Stack Trace Resolution In `procmon-parser`

### Obtaining Required Windows Libraries

Stack trace resolution uses 2 Windows DLLs:

* `dbghelp.dll` ([official documentation](https://learn.microsoft.com/en-us/windows/win32/debug/debug-help-library))
- Provide symbol resolution functionalities.
* `symsrv.dll` ([official documentation](https://learn.microsoft.com/en-us/windows/win32/debug/using-symsrv))
- Symbol file management (mostly downloading symbolic information from a symbol store).

While `dbghelp.dll` is provided with Windows systems (it's located in `%WINDIR%\system32`) this DLL might be out of date
on some systems and thus missing various functionalities (as [explained here](https://learn.microsoft.com/en-us/windows/win32/debug/dbghelp-versions)).
`symsrv.dll`, on the other hand, does not ship with Windows systems.

Both DLLs can be acquired from various Microsoft products, notably:

* [Debugging Tools for Windows](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/)
* [Windbg Preview](https://apps.microsoft.com/store/detail/windbg-preview/9PGJGD53TN86)
* [Visual Studio](https://visualstudio.microsoft.com/downloads/)

The official and [**recommended way**](https://learn.microsoft.com/en-us/windows/win32/debug/dbghelp-versions) is to
install the `Debugging Tools For Windows` from the Windows SDK (please note that the SDK installer allows to only
install the *Debugging Tools for Windows* and not the whole SDK).

**Important**: `procmon-parser` will try to find the correct path to the Debugging Tools For Windows and Windbg Preview
and then automatically provide the path to the DLLs matching the Python interpreter architecture. It does not, however,
try to find the DLLs from a Visual Studio installation.

Be sure to use the DLLs that matches your interpreter architecture. For example, the *Debugging Tools For Windows* comes
with 4 different architectures: x86, x64, arm(32) and arm64:

```
neitsa@lab:c/Program Files (x86)/Windows Kits/10/Debuggers$ tree -L 1
.
├── Redist
├── arm
├── arm64
├── ddk
├── inc
├── lib
├── x64
└── x86
```

You can get your Python interpreter architecture by using the `platform` module for example:

```
>>> import platform
>>> platform.architecture()
('64bit', 'WindowsPE')
```

Thus, the directory in the *Debugging Tools For Windows* would be the `x64` one since the Python interpreter is a 64-bit
one.

### Symsrv and Microsoft License Terms

Microsoft's symbol servers (located at https://msdl.microsoft.com/download/symbols/), provides access to
symbols for the operating system itself. The `symsrv.dll` library requires agreement to Microsoft's
*"Terms of Use for Microsoft Symbols and Binaries."* ([visible here](https://learn.microsoft.com/en-us/legal/windows-sdk/microsoft-symbol-server-license-terms)).

On your first usage of the symbolic resolution, the `symsrv.dll` may display a prompt requiring you to accept the
aforementioned *Terms of Use* if you wish to continue further.

To automatically indicate agreement to the terms, you may create a file called `symsrv.yes` (there's no need to put
something in the file) in the same directory as the `symsrv.dll` library (Note that `symsrv.dll` will also recognize a
`symsrv.no` file as indicating that you do not accept the terms; the `.yes` file takes priority over the `.no` file.).

It is also possible to view the terms from within the WinDbg debugger (included in the *Debugging Tools for Windows*)
by removing any `symsrv.yes` and `symsrv.no` files from WinDbg's directory, setting the symbol path to include
Microsoft's symbol server (using the `.sympath` command), and attempting to load symbols from their server (`.reload`
command).

## Advanced Usage

### Symbol Download Location

The [_NT_SYMBOL_PATH](https://learn.microsoft.com/en-us/windows/win32/debug/using-symsrv#setting-the-symbol-path)
environment variable is the official way to set the location where the symbols are going to be stored.

Symbols files may need to be downloaded from a symbol store, in which case the following algorithm takes place in the
`SymbolResolver` class:

* if `_NT_SYMBOL_PATH` environment variable is set:
- Use `_NT_SYMBOL_PATH` location to put symbol files.
- if `symbol_path` constructor argument is set:
- Do not use `_NT_SYMBOL_PATH` but use the provided symbol path instead.
* else
- Use `%TEMP%` directory.

Note that using the `%TEMP%` directory may require to download the symbol between each computer reboot. The is most of
the time a lengthy operation, even with a fast internet connection.

The basic syntax of the `_NT_SYMBOL_PATH` environment variable (and therefore the `symbol_path` constructor argument) is
as follows:

```
srv*<symbol_directory>*https://msdl.microsoft.com/download/symbols/
```

Where `<symbol_directory>` must be an **existing directory** which is **writable** by any user. For example:

```
srv*c:\symbols*https://msdl.microsoft.com/download/symbols/
```

For more information on the various possibilities for setting up the environment variable, please refer to the
[official documentation](https://learn.microsoft.com/en-us/windows/win32/debug/using-symsrv).

### Copying DLLs

If, for any reason, you do not wish to install the *Debugging Tools For Windows* on a particular machine (e.g. a virtual
machine) but already have it installed on another machine you can copy and paste both DLLs (`dbghelp.dll` and
`symsrv.dll`) from the *Debugging Tools For Windows* onto the target machine, preferably in their own (writable)
directory but **not** a system one (never erase the default one in `%WINDIR%\System32`). Both DLLs must reside alongside each other.

In case you would want to provide a different path for the DLLs, you can use the `dll_dir_path` parameter of the
`SymbolResolver` class:

```python
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import pathlib
import sys

from procmon_parser import ProcmonLogsReader, SymbolResolver

def main():
log_file = pathlib.Path(r"c:\temp\Logfile.PML")
dll_dir_path = r"c:\tmp\my_debug_dll_dir"

with log_file.open("rb") as f:
procmon_reader = ProcmonLogsReader(f)
# disable automatic retrieval of dbghelp.dll and symsrv.dll, and use the provided path instead.
# It must contain at least both DLLs:
# - from the same provider (e.g. Debugging tools for Windows)
# - and the same architecture (e.g. Debugging tools for Windows '\x64' directory).
symbol_resolver = SymbolResolver(procmon_reader, dll_dir_path=dll_dir_path)
# ...

if __name__ == "__main__":
sys.exit(main())
```

### Skipping `symsrv.dll` Check

The `SymbolResolver` class in `procmon-parser` checks if both `dbghelp` and `symsrv` DLLs are present in the provided
directory (if you pass it through the `dll_dir_path` parameter as explained above).

If you have offline symbols already available (for example, by having previously used the `symchk`
([documentation](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/using-symchk)) tool from the
*Debugging Tools For Windows*), and do not want to connect your machine to the Internet, you can skip the `symsrv.dll`
automatic check by using the `skip_symsrv` parameter of the `SymbolResolver` class:

```python
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import pathlib
import sys

from procmon_parser import ProcmonLogsReader, SymbolResolver

def main():
log_file = pathlib.Path(r"c:\temp\Logfile.PML")
# does not contain symsrv.dll
dll_dir_path = r"c:\tmp\my_debug_dll_dir"

with log_file.open("rb") as f:
procmon_reader = ProcmonLogsReader(f)
# disable automatic retrieval of dbghelp.dll and symsrv.dll, and use the provided path instead.
# skip entirely the check for symsrv.dll.
# Use **only** if you know that you already have the necessary symbols!
symbol_resolver = SymbolResolver(procmon_reader, dll_dir_path=dll_dir_path, skip_symsrv=True)
# ...

if __name__ == "__main__":
sys.exit(main())
```

## Debugging Symbol Resolution Problem

If symbol resolution is not working as expected, you can pass a callback function - using the `debug_callback` named
parameter - to the `SymbolResolver` constructor, as follows:

```python
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import pathlib
import sys

from procmon_parser import ProcmonLogsReader, SymbolResolver, CBA

def symbol_debug_callback(handle: int, action_code: CBA | int, callback_data: str, user_context: int):
if action_code == CBA.CBA_DEBUG_INFO:
print(f"[DEBUG MESSAGE DBGHELP: CBA_DEBUG_INFO] {callback_data}")
return 1
return 0

def main():
log_file = pathlib.Path(r"c:\temp\Logfile.PML")

with log_file.open("rb") as f:
procmon_reader = ProcmonLogsReader(f)
symbol_resolver = SymbolResolver(procmon_reader, debug_callback=symbol_debug_callback)
# ...

if __name__ == "__main__":
sys.exit(main())
```

The callback function mimics the [PSYMBOL_REGISTERED_CALLBACK64 ](https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nc-dbghelp-psymbol_registered_callback64)
Windows' API callback function. As of now only the `CBA.CBA_DEBUG_INFO` action code is handled internally.

To indicate success handling the `CBA` code, the function **must** return 1. To indicate failure handling the code,
return 0. If your code does not handle a particular code, you should also return 0. (Returning 1 in this case may have
unintended consequences.)

This will print a lot of information that is helpful debugging symbol retrieval problems.
Binary file added docs/pictures/event.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pictures/stack_trace_no_symbols.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pictures/stack_trace_with_symbols.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions procmon_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import sys

from six import PY2

from procmon_parser.configuration import *
Expand All @@ -11,6 +13,12 @@
'Rule', 'Column', 'RuleAction', 'RuleRelation', 'PMLError'
]

if sys.platform == "win32" and sys.version_info >= (3, 5, 0):
from procmon_parser.symbol_resolver.symbol_resolver import (
SymbolResolver, StackTraceFrameInformation, StackTraceInformation, CBA)

__all__.extend(['SymbolResolver', 'StackTraceFrameInformation', 'StackTraceInformation', 'CBA'])


class ProcmonLogsReader(object):
"""Reads procmon logs from a stream which in the PML format
Expand Down Expand Up @@ -44,6 +52,12 @@ def __getitem__(self, index):
def __len__(self):
return self._struct_readear.number_of_events

@property
def maximum_application_address(self):
"""Return the highest possible user land address.
"""
return self._struct_readear.maximum_application_address

def processes(self):
"""Return a list of all the known processes in the log file
"""
Expand Down
6 changes: 6 additions & 0 deletions procmon_parser/logs.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,12 @@ def get_event_at_offset(self, offset):
def number_of_events(self):
return self.header.number_of_events

@property
def maximum_application_address(self):
"""Return the highest possible user land address.
"""
return self.header.maximum_application_address

def processes(self):
"""Return a list of all the known processes in the log file
"""
Expand Down
4 changes: 3 additions & 1 deletion procmon_parser/stream_logs_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ def __init__(self, io):
# Docs of this table's layout are in "docs\PML Format.md"
self.icon_table_offset = read_u64(stream)

stream.seek(12, 1) # Unknown fields
self.maximum_application_address = read_u64(stream)

self.os_version_info_size = read_u32(stream)
self.windows_major_number = read_u32(stream)
self.windows_minor_number = read_u32(stream)
self.windows_build_number = read_u32(stream)
Expand Down
Empty file.
Loading