Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report connector-caused sync job failures to Sentry #13336

Closed
Phlair opened this issue May 31, 2022 · 2 comments · Fixed by #13899
Closed

Report connector-caused sync job failures to Sentry #13336

Phlair opened this issue May 31, 2022 · 2 comments · Fixed by #13899
Assignees

Comments

@Phlair
Copy link
Contributor

Phlair commented May 31, 2022

More details in the spec: https://docs.google.com/document/d/1grrkxdvgAzjYiwG02gwo1JdAEc6pGHJLtJFsfRexR4I/edit#
Demo PR: #13727

The goal of this ticket is to set up the foundation for sending events to Sentry and start reporting connector failures encountered during connection sync jobs.

Implementation

  1. JobErrorReporter:
    Following existing patterns set by JobTracker and JobNotifier, a new JobErrorReporter class can be created to handle reporting an error.

    • JobErrorReporter: gather related metadata and process the FailureSummary.
    • ErrorReportingClient: interface for reporting a FailureReason+metadata to an error tracking service.
    • SentryErrorReportingClient: uses the Sentry SDK to build a SentryEvent based on the FailureReason and sends it to sentry.
  2. Send source and destination failures to sentry
    Using the above, report sync job failures for FailureReasons that originate from source or destination. This can be called from the existing JobCreationAndStatusUpdateActivity.jobFailure method that's currently being used to report these to Segment:

public void jobFailure(final JobFailureInput input) {
try {
final var jobId = input.getJobId();
jobPersistence.failJob(jobId);
final Job job = jobPersistence.getJob(jobId);
jobNotifier.failJob(input.getReason(), job);
emitJobIdToReleaseStagesMetric(OssMetricsRegistry.JOB_FAILED_BY_RELEASE_STAGE, jobId);
trackCompletion(job, JobStatus.FAILED);
} catch (final IOException e) {
throw new RetryableException(e);
}

Building Sentry Events

  • Level -> "error"
  • Message -> FailureReason.internalMessage
  • Release -> connector_docker_image@tag
  • User -> (Id: workspace ID, Name: workspace name)
  • Tags -> Metadata (connection id, airbyte platform version, connector definition id, release stage, etc)
  • Stack trace -> Parsed FailureReason stack trace

To feed stack traces into Sentry's grouping algorithm and enhance Sentry's ability to group errors, we can do some processing on stack trace strings to pull out frame attributes. Because this is language dependent, parsing these stack traces should be done on a "best-effort" basis. If we find something we can't parse or an unsupported language, it's ok - we'll fall back to the message-based grouping.

@Phlair
Copy link
Contributor Author

Phlair commented May 31, 2022

TODO: details depend on 1511

@pedroslopez pedroslopez changed the title Build instrumentation for pushing errors into Sentry Report connector-caused sync job failures to Sentry Jun 14, 2022
@pedroslopez
Copy link
Contributor

grooming notes:

  • can we alert on errors while parsing stack traces? (e.g. via setting a tag on events)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants