Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-7421 Expose Crash Plugin #2126

Merged
merged 15 commits into from
Dec 11, 2024
Merged

Conversation

maxep
Copy link
Member

@maxep maxep commented Nov 29, 2024

What and why?

Follow-up #2116

Expose CrashReportingPlugin publicly and allow reporting crashes asynchronously with additional attributes.
The plugin can also provide an optional backtrace reporter.

How?

Crash Reporting Plugin

The plugin interface is now public with @escaping reporting closure, allowing reporting crashes at any time.
The interface also define an optional backtraceReporter property for reading backtraces.
This new plugin interface allows integration to fully replace the PLCrashReporter implementation.

Crash Report Additional Attributes

The DDCrashReport now defines additionalAttributes property to propagate atrributes.
These attributes are merged with logs' user attributes and RUM Error's context.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes
  • Add Objective-C interface for public APIs (see our guidelines [internal]) and run make api-surface)

@mobile-app-ci mobile-app-ci force-pushed the RUM-7421/expose-crash-plugin branch from 8e89ba5 to cb6d250 Compare November 29, 2024 16:30
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Nov 29, 2024

Datadog Report

Branch report: RUM-7421/expose-crash-plugin
Commit report: 47e7673
Test service: dd-sdk-ios

✅ 0 Failed, 3583 Passed, 0 Skipped, 2m 32.38s Total Time
🔻 Test Sessions change in coverage: 7 decreased, 3 increased, 4 no change

🔻 Code Coverage Decreases vs Default Branch (7)

This report shows up to 5 code coverage decreases.

  • test DatadogInternalTests iOS 80.46% (-0.06%) - Details
  • test DatadogLogsTests tvOS 47.69% (-0.04%) - Details
  • test DatadogLogsTests iOS 47.63% (-0.04%) - Details
  • test DatadogRUMTests iOS 81.11% (-0.02%) - Details
  • test DatadogSessionReplayTests iOS 34.25% (-0.02%) - Details

@maxep maxep marked this pull request as ready for review December 2, 2024 10:40
@maxep maxep requested review from a team as code owners December 2, 2024 10:40
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 Everything looks great except the removal of queue that will cause performance regression (see my other comment).

DatadogCrashReporting/Sources/CrashReporting.swift Outdated Show resolved Hide resolved
DatadogCrashReporting/Sources/CrashReportingFeature.swift Outdated Show resolved Hide resolved
Comment on lines -84 to +90
DDAssertJSONEqual(rumEvent.context!.contextInfo, crashContext.lastRUMAttributes!)
let contextAttributes = try XCTUnwrap(rumEvent.context?.contextInfo)
let lastRUMAttributes = try XCTUnwrap(crashContext.lastRUMAttributes?.attributes)
DDAssertJSONEqual(contextAttributes, lastRUMAttributes.merging(crashReportAttributes) { $1 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏅

@@ -97,5 +100,6 @@ public struct DDCrashReport: Codable, PassthroughAnyCodable {
self.meta = meta
self.wasTruncated = wasTruncated
self.context = context
self.additionalAttributes = AnyCodable(additionalAttributes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/ Am I correct by saying that Codable requirement for DDCrashReport only comes from the message bus interface? We're using PassthroughAnyCodable so there's no actual coding and perf hit, but things like passing attribtues ☝️ are far from easy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, it is only for conformance.
The DDCrashReport and CrashContext are now both in DatadogInternal, so it's no longer required to pass them as FeatureBaggage. We could pass them directly as a .crash(report: DDCrashReport, context: CrashContext) message on the bus, but such change has wider impact and should be treated separately IMO.

@maxep maxep requested a review from ncreated December 3, 2024 13:10
mariedm
mariedm previously approved these changes Dec 4, 2024
ncreated
ncreated previously approved these changes Dec 10, 2024
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 💪

@maxep
Copy link
Member Author

maxep commented Dec 10, 2024

Hello @naftaly 👋

Here is the follow up of your contribution. The 2 main changes are:

  1. The added attributes are now part of the DDCrashReport object. They are merged with either the RUM Error context-info or with the error log's user attributes.
  2. We have removed the public interface to send a report. Instead, we have made the CrashReportingPlugin.readPendingCrashReport closure escaping so you can report the crash at any time. In practice, you will store the closure in your plugin implementation to report the crash later.

I hope this will answer your needs!

@naftaly
Copy link
Contributor

naftaly commented Dec 10, 2024

Hello @naftaly 👋

Here is the follow up of your contribution. The 2 main changes are:

  1. The added attributes are now part of the DDCrashReport object. They are merged with either the RUM Error context-info or with the error log's user attributes.
  2. We have removed the public interface to send a report. Instead, we have made the CrashReportingPlugin.readPendingCrashReport closure escaping so you can report the crash at any time. In practice, you will store the closure in your plugin implementation to report the crash later.

I hope this will answer your needs!

Thank you for your response. I'm curious why you chose to have the user store a block. Approaches like this can easily lead to retain cycles and unexpected behavior. In contrast, a simple send call is more explicit and easier to understand.

@maxep
Copy link
Member Author

maxep commented Dec 10, 2024

Thank you for your response. I'm curious why you chose to have the user store a block. Approaches like this can easily lead to retain cycles and unexpected behavior. In contrast, a simple send call is more explicit and easier to understand.

Internally, we make sure that the closure does not retain anything, if the Datadog SDK instance is deallocated, the closure will have no effect. Additionally, the crash-reporting feature is immutable and the closure is only about transmitting the crash report by jumping on a dedicated queue (we've called the message-bus), so the closure is also thread-safe.

The reason we went with this approach is that we want a single entry-point for interacting with the crash reporter lib, and that entry-point is the CrashReportingPlugin. By solely relying on the closure, we also control the lifecycle: The SDK will provide the closure when it's ready to make a report.
Providing a custom plugin is an advanced usage that is far from standard integration, one that require data injection and report formatting. Having a send method is indeed simple, but could also lead to mis-use.

I hope this make sense!

@mobile-app-ci mobile-app-ci force-pushed the RUM-7421/expose-crash-plugin branch from 2fd49df to 5005800 Compare December 10, 2024 17:18
@maxep
Copy link
Member Author

maxep commented Dec 10, 2024

/merge

@mobile-app-ci mobile-app-ci dismissed stale reviews from ncreated and mariedm via 47e7673 December 11, 2024 09:11
@maxep
Copy link
Member Author

maxep commented Dec 11, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Dec 11, 2024

Devflow running: /merge

View all feedbacks in Devflow UI.


2024-12-11 09:16:15 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2024-12-11 09:46:52 UTC ℹ️ MergeQueue: merge request added to the queue

The median merge time in develop is 27m.


2024-12-11 10:03:37 UTCMergeQueue: The build pipeline failed for this merge request

Build pipeline has failing jobs for d0acd15:

⚠️ Do NOT retry failed jobs directly (why?).

What to do next?

  • Investigate the failures and when ready, re-add your pull request to the queue!
  • Any question, go check the FAQ.

@maxep
Copy link
Member Author

maxep commented Dec 11, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Dec 11, 2024

Devflow running: /merge

View all feedbacks in Devflow UI.


2024-12-11 10:55:13 UTC ℹ️ MergeQueue: pull request added to the queue

The median merge time in develop is 27m.


2024-12-11 11:23:02 UTC ℹ️ MergeQueue: This merge request was merged

@naftaly
Copy link
Contributor

naftaly commented Dec 11, 2024

Thank you for your response. I'm curious why you chose to have the user store a block. Approaches like this can easily lead to retain cycles and unexpected behavior. In contrast, a simple send call is more explicit and easier to understand.

Internally, we make sure that the closure does not retain anything, if the Datadog SDK instance is deallocated, the closure will have no effect. Additionally, the crash-reporting feature is immutable and the closure is only about transmitting the crash report by jumping on a dedicated queue (we've called the message-bus), so the closure is also thread-safe.

The reason we went with this approach is that we want a single entry-point for interacting with the crash reporter lib, and that entry-point is the CrashReportingPlugin. By solely relying on the closure, we also control the lifecycle: The SDK will provide the closure when it's ready to make a report. Providing a custom plugin is an advanced usage that is far from standard integration, one that require data injection and report formatting. Having a send method is indeed simple, but could also lead to mis-use.

I hope this make sense!

Hi @maxep ,

I just wanted to validate that you don't expect the completion block to be called on any specific if called in the future by a 3rd party. For example, if I hold onto the block and call it at a later point, it won't be on the internal queue as if it was called in a non-escaping manner in the crash plugin.

thanks.

@maxep
Copy link
Member Author

maxep commented Dec 12, 2024

Hey @naftaly 👋

I just wanted to validate that you don't expect the completion block to be called on any specific if called in the future by a 3rd party. For example, if I hold onto the block and call it at a later point, it won't be on the internal queue as if it was called in a non-escaping manner in the crash plugin.

Exactly, you can hold the completion block and call it later from any thread 👍

@maxep maxep deleted the RUM-7421/expose-crash-plugin branch December 12, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants