-
Notifications
You must be signed in to change notification settings - Fork 753
Add guidance for capturing ETW traces in Kubernetes pods #2344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: brianrob <6210322+brianrob@users.noreply.github.com>
brianrob
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot, please address these comments.
| <!-- *************** --> | ||
| <h5><a id="ProcessIsolationContainers">Capturing ETW Traces with Process-Isolation Windows Containers (Kubernetes)</a></h5> | ||
| <p> | ||
| When running Windows containers in Kubernetes using process-isolation mode (as opposed to Hyper-V isolation), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that process-isolation mode is the default and that if running in Hyper-V mode, these instructions are not required - just capture inside the container.
| workflow to capture and analyze traces for processes running inside these containers. | ||
| </p> | ||
| <p> | ||
| <strong>Important Limitation:</strong> Because ETW is a kernel-level feature and process-isolation containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel sessions specifically cannot be started from inside of a container. PerfView almost always captures a kernel session, so we should assume so here.
| must be initiated from the host node. | ||
| </p> | ||
|
|
||
| <h6>Step 1: Capture a Trace on the Host Node (Required)</h6> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <h6>Step 1: Capture a Trace on the Host Node (Required)</h6> | |
| <h6>Step 1: Capture a Trace on the Host Node</h6> |
| <strong>What happens if you don't use /EnableEventsInContainers:</strong> You will still capture all kernel | ||
| events (CPU sampling, context switches, etc.) for container processes, but you will miss user-mode events | ||
| like .NET garbage collection events, JIT events, exception events, and any custom EventSource events from | ||
| those processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will still get user-mode events from processes running directly on the host node (outside of containers).
| Then, inside the container, run the merge command to inject the necessary image identification data: | ||
| </p> | ||
| <ul> | ||
| <li>PerfViewCollect merge /ImageIDsOnly MyContainerTrace.etl.zip</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: PerfViewCollect needs to be built from source at https://github.com/microsoft/perfview. It is not currently shipped as a binary.
| <p> | ||
| <strong>What /ImageIDsOnly does:</strong> When you run merge with /ImageIDsOnly, PerfView reads through | ||
| the trace and for each DLL that was loaded by processes in the trace, it looks up the DLL's unique | ||
| identifier (signature/timestamp) and injects that information into the trace. This unique identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually looks up the PDB signature.
| <strong>What happens if you don't run merge with /ImageIDsOnly:</strong> If you skip this step and later | ||
| try to analyze the trace on another machine after the container is gone, PerfView will be unable to find | ||
| the symbol files for DLLs that were loaded inside the container. Your stack traces will show addresses | ||
| like "0x7ffe12345678" instead of method names for those DLLs. The managed code (.NET) symbols may still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stacks will just show the module name with a question mark. For example: MyAssembly!? instead of MyAssembly!MyClass.MyMethod.
| try to analyze the trace on another machine after the container is gone, PerfView will be unable to find | ||
| the symbol files for DLLs that were loaded inside the container. Your stack traces will show addresses | ||
| like "0x7ffe12345678" instead of method names for those DLLs. The managed code (.NET) symbols may still | ||
| work since they come from NGEN PDBs or crossgen2 metadata, but native code (including the runtime itself) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jitted code will still work, but nothing else will that comes from a binary inside of the container.
| will have missing symbols. | ||
| </p> | ||
| <p> | ||
| <strong>Why run merge inside the container:</strong> The merge command needs access to the actual DLL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merger is a shared component and doesn't have access to look inside of containers, thus you must run it inside of the container(s).
Co-authored-by: brianrob <6210322+brianrob@users.noreply.github.com>
|
Copilot Post-Mortem:
|
EnableEventsInContainersandImageIDsOnlyoptionsOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.