Add NT_SIGINFO NOTE to ELF dumps #83059
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Customer Impact
Linux Watson needs this to better triage ELF dumps. 1st party teams have asked for this.
Issue: #40958
This change update createdump which allows windbg/Watson to determine which thread actually crashed (via the .lastevent command). The NT_SIGINFO record has been missing from Linux core dumps causing the wrong thread (startup thread) to be blamed for the crash. This breaks Watson bucketing.
The underlying issue here that we do not put enough data in Linux coredumps: The “crashing thread” isn’t marked as the one of interest. Without this information, the debugger assumes that the 0th thread (usually the startup thread) is the guilty party.
When an automated debugging service comes along, like Watson/!analyze, they cannot properly triage the bug. Instead of properly blaming the correct thread (with the correct exception), it will try to blame the non-crashing “crashing thread” (usually just the main function doing nothing, sitting in a wait call). In effect, this renders all of our Azure Watson bucketing for all of our partner teams and customers useless.
Added "ExceptionType" field to "Parameters" section of the Linux crash report json.
Testing
All the SOS diagnostics tests pass with these changes.
Risk
Low. Createdump/core generation only.