Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement crash-tracking handler #282

Merged
merged 87 commits into from
Jan 19, 2024
Merged

Implement crash-tracking handler #282

merged 87 commits into from
Jan 19, 2024

Conversation

danielsn
Copy link
Contributor

@danielsn danielsn commented Nov 28, 2023

What does this PR do?

This module implements a crash tracker that enables libdatadog using profilers to send crash-reports to the datadog backend.

It has three related parts:

  1. A CrashInfo struct which stores information about a crash in a structured format, and has built-in functions to upload itself to the datadog backend.
    This struct can be generated by the crash-handlers described below (e.g. for python or Ruby), or by language runtime tools from a crash dump (e.g. .Net).

  2. A UNIX crash-handler, which registers signal handlers to detect crashes.
    The legal operations within a crash-handler are limited, so the handler does as much out of process as possible.
    When a crash occurs, this handler collects relevant information, and puts it on a pipe to the receiver, which processes it out of process.

  3. A binary application libdatadog-crashtracking-receiver, which receives the crash-report over a pipe and formats it for transmission to the backend.

Motivation

Currently, if crashes occur in profiling code, there is no notification until we get a support ticket. Learning early will enable us to close the loop to faster customer resolution.

Additional Notes

There are a number of open TODOs: see the README.md

How to test the change?

Run the test in profiling/crashtracker/api

For Reviewers

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

danielsn and others added 30 commits October 12, 2023 14:49
Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.37.23 to 0.37.26.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](bytecodealliance/rustix@v0.37.23...v0.37.26)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
* Bump webpki from 0.22.1 to 0.22.4

Bumps [webpki](https://github.com/briansmith/webpki) from 0.22.1 to 0.22.4.
- [Commits](https://github.com/briansmith/webpki/commits)

---
updated-dependencies:
- dependency-name: webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update LICENSE-3rdparty.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
Copy link
Contributor

@sanchda sanchda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! Thank you!

I'm not 100% sure how the CMake will work out, but I'm happy to accept things as-is. I think there might be value in revealing the source file through a variable; that way I can include this cmake in order to find the sourcefile, then add my own linker stuff (for instance, setting a runpath and giving it a wrapper library with suitably exposed symbols).

But again, I don't know and I'll have to play around with it to see how everything fits together. I can easily infer the source location based on other hints, so my comment here is probably non-actionable.

The rest still looks good.

@github-actions github-actions bot removed the common label Jan 18, 2024
@danielsn danielsn force-pushed the dsn/crash-handler-api branch from 19d5608 to 2486394 Compare January 18, 2024 02:49
@danielsn danielsn force-pushed the dsn/crash-handler-api branch from 69e8759 to 0894ac8 Compare January 18, 2024 18:34
@danielsn danielsn force-pushed the dsn/crash-handler-api branch from c781d77 to 471fae9 Compare January 18, 2024 19:00
@danielsn danielsn requested a review from a team as a code owner January 18, 2024 21:12
Copy link

Some tests with 'continue-on-error: true' have failed:

  • Verify trace-protobuf stats_proto

Created by continue-on-error-comment

@danielsn danielsn merged commit 6a97344 into main Jan 19, 2024
19 checks passed
@danielsn danielsn deleted the dsn/crash-handler-api branch January 19, 2024 21:58
ivoanjo added a commit to DataDog/dd-trace-rb that referenced this pull request Mar 11, 2024
**What does this PR do?**

This PR adds support for the libdatadog crash tracker feature (off
by default).

The crash tracker works by detecting when the Ruby VM reaches a
segmentation fault, reporting the crash information as a last profile
before the VM dies.

All of the interesting work is in
<DataDog/libdatadog#282>, this PR basically
just wires things up.

**Motivation:**

This will be a useful tool when debugging VM crashes.

**Additional Notes:**

I'm opening this PR as a draft as the libdatadog support has not
yet landed/been released. Also, there's a few open questions on:

* fork handling
* when to shut down

**How to test the change?**

(You'll need to build <<DataDog/libdatadog#282>
until the crash tracker gets included in a libdatadog release)

To test the crash tracker with an actual crash, try running the
following on Ruby 2.6:

```bash
$ DD_PROFILING_ENABLED=true DD_PROFILING_EXPERIMENTAL_CRASH_TRACKING_ENABLED=true DD_TRACE_DEBUG=true bundle exec ddtracerb exec ruby -e 'Process.detach(fork { exit! }).instance_variable_get(:@foo)'
```

This should also work in the future but right now it doesn't work
correctly; still looking into why:

```bash
$ DD_PROFILING_ENABLED=true DD_PROFILING_EXPERIMENTAL_CRASH_TRACKING_ENABLED=true DD_TRACE_DEBUG=true bundle exec ddtracerb exec ruby -e 'Process.kill("SEGV", Process.pid)'
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build profiling Relates to the profiling* modules.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants