Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve generated dump debugging instructions #46493

Merged
merged 11 commits into from
Jan 5, 2021
178 changes: 98 additions & 80 deletions eng/testing/debug-dump-template.md
Original file line number Diff line number Diff line change
@@ -1,127 +1,145 @@
# Debugging a CI dump
# Get the dump

This document describes how to debug a CI/PR test dump by downloading assets from helix, using a dotnet tool called `runfo`.
Click the link to the dump on the `Helix Test Logs` tab in Azure DevOps. This is the same place you got these instructions from.

## What is runfo?
# Get the Helix payload

Runfo is a dotnet global tool that helps get information about helix test runs and azure devops builds. For more information see [this](https://github.com/jaredpar/runfo/tree/master/runfo#runfo)
[Runfo](https://github.com/jaredpar/runfo/tree/master/runfo#runfo) helps get information about helix test runs and azure devops builds. We will use it to download the payload and symbols:
```script
danmoseley marked this conversation as resolved.
Show resolved Hide resolved
dotnet tool install --global runfo
dotnet tool update --global runfo
```
If prompted, open a new command prompt to pick up the updated PATH.
```script
# On Windows
runfo get-helix-payload -j %JOBID% -w %WORKITEM% -o %WOUTDIR%
# On Linux and macOS
runfo get-helix-payload -j %JOBID% -w %WORKITEM% -o %LOUTDIR%
```

### How do I install it?
> NOTE: if the helix job is an internal job, you need to pass down a [helix authentication token](https://helix.dot.net/Account/Tokens) using the `--helix-token` argument.

You just need to run:
Now extract the files:

```script
dotnet tool install --global runfo
```
# On Windows
for /f %i in ('dir /s/b %WOUTDIR%\*zip') do tar -xf %i -C %WOUTDIR%

If you already have it installed, make sure you have at least version `0.6.1` installed, which contains support to download helix payloads. If you don't have the latest version just run:
# On Linux and macOS
# obtain `unzip` if necessary; eg `sudo apt-get install unzip` or `sudo dnf install unzip`
find %LOUTDIR% -name '*zip' -exec unzip -d %LOUTDIR% {} \;
```

Now use the [dotnet-sos global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos) to install the SOS debugging extension.
```script
dotnet tool update --global runfo
dotnet tool install --global dotnet-sos
dotnet tool update --global dotnet-sos
```
If prompted, open a new command prompt to pick up the updated PATH.
```script
dotnet sos install --architecture Arm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last one will override the others, at least on Linux (not sure about windows, haven't seen if it is conditional on the bitness of the extension host, but given the errors I've seen I'd say no). Users should only install the one they are going to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - I didn't notice that because I was testing with x64, which is last!

Could we imagine installing them all, and picking the right one - eliminate another decision point? cc @mikem8361

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the text meantime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it defaults silently to the bitness of the dotnet.exe you're running it with (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does default to the architecture of the dotnet.exe you are running (the most common case).

As far as installing all the of the architectures, there would have to be a rid named subdirectory to separate them so the decision point would be postponed a little to the .load command in windbg.

This manual dotnet-sos install/load on Windows was really meant as a fallback while we wait for the public debugger to include SOS (all architectures) like the internal one does now. The internal Windows debugger automatically loads the correct architecture and latest version of SOS from the extension gallery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh perfect then it will go away.

dotnet sos install --architecture Arm64
dotnet sos install --architecture x86
dotnet sos install --architecture x64
```

## Download helix payload containing symbols:
# Now choose a section below based on your OS.

You can just achieve this by running:
## If it's a Windows dump on Windows...

```script
runfo get-helix-payload -j %JOBID% -w %WORKITEM% -o <out-dir>
```
## ... and you want to debug with WinDbg

> NOTE: if the helix job is an internal job, you need to pass down a [helix authentication token](https://helix.dot.net/Account/Tokens) using the `--helix-token` argument.
Install or update WinDbg if necessary ([external](https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools), [internal](https://osgwiki.com/wiki/Installing_WinDbg)). If you don't have a recent WinDbg you may have to do `.update sos`.
danmoseley marked this conversation as resolved.
Show resolved Hide resolved

This will download the workitem contents under `<out-dir>\workitems\` and the correlation payload under: `<out-dir>\correlation-payload\`.
Open WinDbg and open the dump with `File>Open Dump`.

> The correlation payload is usually the testhost or core root, which contain the runtime and dotnet host that we use to run tests.
```script
!setclrpath %WOUTDIR%\shared\Microsoft.NETCore.App\6.0.0
.sympath+ %WOUTDIR%\shared\Microsoft.NETCore.App\6.0.0
```

Once you have those assets, you will need to extract the testhost or core root. Then extract the workitem assets into the same location where coreclr binary is.
Now you can use regular SOS commands like `!dumpstack`, `!pe`, etc.

## Windows dump on Windows
## ... and you want to debug with Visual Studio

### Debug with WinDbg
Currently this is not possible because mscordbi.dll is not signed.

1. Install [dotnet-sos global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos).
2. Run `dotnet sos install` (This has an architecture flag to install diferent plugin versions for specific arch scenarios).
3. Load the dump with a recent WinDbg version for it to load sos automatically (if not you can run `.update sos`). It is important that bitness of WinDbg matches the bitness of the dump.
4. Then run the following commands:
## ... and you want to debug with dotnet-dump

Install the [dotnet-dump global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump).
```script
!setclrpath <path to core root or testhost where coreclr is>
.sympath+ <directory with symbols (for library tests these are in testhost dir)>
dotnet tool install --global dotnet-dump
dotnet tool update --global dotnet-dump
```
If prompted, open a new command prompt to pick up the updated PATH.
```script
dotnet-dump analyze <path-to-dump>
```
Within dotnet-dump:
```script
setclrpath %WOUTDIR%\shared\Microsoft.NETCore.App\6.0.0
setsymbolserver -directory %WOUTDIR%\shared\Microsoft.NETCore.App\6.0.0
```
### Analyze with dotnet-dump

1. Install [dotnet-dump global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump).
2. Run: `dotnet-dump analyze <path-to-dump>`
3. Then run the following commands:

Now you can use regular SOS commands like `dumpstack`, `pe`, etc.
If you are debugging a 32 bit dump using 64 bit dotnet, you will get an error `SOS does not support the current target architecture`. In that case replace dotnet-dump with the 32 bit version:
```script
setclrpath (To verify an incorrect DAC hasn't been loaded).
setclrpath <path to core root or testhost where coreclr is>
setsymbolserver -directory <directory with symbols (for library tests these are in testhost dir)>
dotnet tool uninstall --global dotnet-dump
"C:\Program Files (x86)\dotnet\dotnet.exe" tool install --global dotnet-dump
```
---
## If it's a Linux dump on Windows...

## Linux dumps on Windows
Download the [Cross DAC Binaries](https://dev.azure.com/dnceng/public/_apis/build/builds/%BUILDID%/artifacts?artifactName=CoreCLRCrossDacArtifacts&api-version=6.0&%24format=zip), open it and choose the flavor that matches the dump you are to debug, and copy those files to `%WOUTDIR%\shared\Microsoft.NETCore.App\6.0.0`.

In order to debug a Linux dump on Windows, you will have to first go to the PR/CI build
that sent the test run and download the cross DAC.
Now you can debug with WinDbg or `dotnet-dump` as if it was a Windows dump. See above.

Download the [`CoreCLRCrossDacArtifacts`](https://dev.azure.com/dnceng/public/_apis/build/builds/%BUILDID%/artifacts?artifactName=CoreCLRCrossDacArtifacts&api-version=6.0&%24format=zip), then extract it, and copy the matching flavor of the DAC with your dump and extract it in the same location where coreclr binary is.
---
## If it's a Linux dump on Linux...

### Debug with WinDbg
## ... and you want to debug with LLDB

1. Install [dotnet-sos global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos).
2. Run `dotnet sos install` (This has an architecture flag to install diferent plugin versions for specific arch scenarios).
3. Load the dump with a recent WinDbg version for it to load sos automatically (if not you can run `.update sos`). It is important that bitness of WinDbg matches the bitness of the dump.
4. Then run the following commands:
Install or update LLDB if necessary ([instructions here](https://github.com/dotnet/diagnostics/blob/master/documentation/lldb/linux-instructions.md))

Load the dump:
```script
!setclrpath <path to core root or testhost where coreclr is>
.sympath+ <directory with symbols (for library tests these are in testhost dir)>
lldb --core <path-to-dmp> %LOUTDIR%/shared/Microsoft.NETCore.App/6.0.0/dotnet
```
### Analyze with dotnet-dump

1. Install [dotnet-dump global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump).
2. Run: `dotnet-dump analyze <path-to-dump>`
3. Then run the following commands:

Within lldb:
```script
setclrpath (To verify an incorrect DAC hasn't been loaded).
setclrpath <path to core root or testhost where coreclr is>
setsymbolserver -directory <directory with symbols (for library tests these are in testhost dir)>
setclrpath %LOUTDIR%/shared/Microsoft.NETCore.App/6.0.0
sethostruntime /usr/bin/dotnet
setsymbolserver -directory %LOUTDIR%/shared/Microsoft.NETCore.App/6.0.0
```
If you want to load native symbols
```
loadsymbols
```

## Linux dumps on Linux

### Debug with LLDB

1. Install [dotnet-sos global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos).
2. Run `dotnet sos install` (This has an architecture flag to install diferent plugin versions for specific arch scenarios).
3. Load the dump by running `lldb -c <path-to-dmp> <host binary used (found in testhost)>`
4. Run the following commands:
## ... and you want to debug with dotnet-dump

Install the [dotnet-dump global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump).
```script
setclrpath <path to core root or testhost where coreclr is>
sethostruntime '<path to your local dotnet runtime or testhost where coreclr is>'
setsymbolserver -directory <directory with symbols (for library tests these are in testhost dir)>
loadsymbols (if you want to resolve native symbols)
dotnet tool install --global dotnet-dump
dotnet tool update --global dotnet-dump
```

### Analyze with dotnet-dump

1. Install [dotnet-dump global tool](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump).
2. Run: `dotnet-dump analyze <path-to-dump>`
3. Then run the following commands:

If prompted, open a new command prompt to pick up the updated PATH.
```script
dotnet-dump analyze <path-to-dump>
```
Within dotnet-dump:
```script
setclrpath (To verify an incorrect DAC hasn't been loaded).
setclrpath <path to core root or testhost where coreclr is>
setsymbolserver -directory <directory with symbols (for library tests these are in testhost dir)>
setclrpath %LOUTDIR%/shared/Microsoft.NETCore.App/6.0.0
setsymbolserver -directory %LOUTDIR%/shared/Microsoft.NETCore.App/6.0.0
```

## MacOS dumps
---
## If it's a macOS dump

Instructions for debugging dumps on macOS are essentially the same as [Linux](#If-it's-a-Linux-dump-on-Linux...) with one exception: `dotnet-dump` cannot analyze macOS system dumps: you must use `lldb` for those. `dotnet-dump` can only analyze dumps created by `dotnet-dump` or `createdump` or by the runtime on hangs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Instructions for debugging dumps on macOS are essentially the same as [Linux](#If-it's-a-Linux-dump-on-Linux...) with one exception: `dotnet-dump` cannot analyze macOS system dumps: you must use `lldb` for those. `dotnet-dump` can only analyze dumps created by `dotnet-dump` or `createdump` or by the runtime on hangs.
Instructions for debugging dumps on macOS are essentially the same as [Linux](#If-it's-a-Linux-dump-on-Linux...) with one exception: `dotnet-dump` cannot analyze macOS system dumps: you must use `lldb` for those. `dotnet-dump` can only analyze dumps created by `dotnet-dump` or `createdump`, by the runtime on crashes when the appropriate environment variables are set, or the [`blame-hang` setting of `dotnet test`](https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-test).


Instructions for debugging dumps on MacOS the same as [Linux](#linux-dumps-on-linux); however there are a couple of caveats.
---
# Other Helpful Information

1. It's only supported to debug them in `dotnet-dump` if it's a runtime generated dump. This includes hang dumps and dumps generated by `createdump`, `dotnet-dump` and the runtime itself.
2. If it's a system dump, then only `lldb` works.
* [How to debug a Linux core dump with SOS](https://github.com/dotnet/diagnostics/blob/master/documentation/debugging-coredump.md)
4 changes: 4 additions & 0 deletions eng/testing/gen-debug-dump-docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@

replace_string = ''
dir_separator = '/' if platform.system() != 'Windows' else '\\'
unix_user_folder = '~/dumps/'
danmoseley marked this conversation as resolved.
Show resolved Hide resolved
windows_user_folder = 'c:\\dumps\\'
danmoseley marked this conversation as resolved.
Show resolved Hide resolved
source_file = template_dir + dir_separator + 'debug-dump-template.md'
with open(source_file, 'r+') as f:
file_text = f.read()
Expand All @@ -95,6 +97,8 @@
replace_string = file_text.replace('%JOBID%', job_id)
replace_string = replace_string.replace('%WORKITEM%', workitem)
replace_string = replace_string.replace('%BUILDID%', build_id)
replace_string = replace_string.replace('%LOUTDIR%', unix_user_folder + workitem)
replace_string = replace_string.replace('%WOUTDIR%', windows_user_folder + workitem)

output_file = out_dir + dir_separator + 'how-to-debug-dump.md'
with open(output_file, 'w+') as output:
Expand Down