diff --git a/src/components/sidebar.tsx b/src/components/sidebar.tsx index fc253334ae..6a54ce8854 100644 --- a/src/components/sidebar.tsx +++ b/src/components/sidebar.tsx @@ -226,6 +226,7 @@ export default () => { Span Operations Dynamic Sampling Context + OpenTelemetry Support diff --git a/src/docs/sdk/performance/opentelemetry/index.mdx b/src/docs/sdk/performance/opentelemetry/index.mdx new file mode 100644 index 0000000000..78394ea2a2 --- /dev/null +++ b/src/docs/sdk/performance/opentelemetry/index.mdx @@ -0,0 +1,89 @@ +--- +title: "OpenTelemetry Support" +--- + + + +This page is under active development. Specifications are not final and subject to change. + +"Proof is in the progress" - Drake + + + +This document details Sentry's work in integrating and supporting [OpenTelemetry](https://opentelemetry.io/), the open standard for metrics, traces and logs. In particular, it focuses on the integration between [Sentry's performance monitoring product](https://docs.sentry.io/product/performance/) and [OpenTelemetry's tracing spec](https://opentelemetry.io/docs/concepts/signals/traces/). + +## Background + +When Sentry performance monitoring was initially introduced, OpenTelemetry was in early stages. This lead to us adopt a slightly different model from OpenTelemetry, notably we have this concept of transactions that OpenTelemetry does not have. We've described this, and some more historical background, in our performance monitoring research document. + +## Approach + +TODO: Talk about the approach we are using, based on Matt's hackweek project - https://github.com/getsentry/sentry-ruby/pull/1876 + +## Span Protocol + +import "./span-protocol.mdx" + +## Transaction Protocol + +There is no concept of a transaction within OpenTelemetry, so we rely on promoting spans to become transactions. The span `description` becomes the transaction `name`, and the span `op` becomes the transaction `op`. Therefore, OpenTelemetry spans must be mapped to Sentry spans before they can be promoted to become a transaction. + +**We should not set tags on transactions. Instead, we should populate the `attributes` field on the `otel` context, see below** + +## OpenTelemetry Context + +Aside from information from Spans and Transactions, OpenTelemetry has meta-level information about the SDK, resource, and service that generated spans. To track this information, we generate a new OpenTelemetry Event Context. + +The existence of this context on an event (transaction or error) is how the Sentry backend will know that the incoming event is an OpenTelemetry event. + +It uses the following schema: + +```ts +type Primitive = number | string | boolean | bigint; + +interface Attributes { + [key: string]: T | Array | Record; +} + +interface OpenTelemetryContext { + type?: "otel", + + // https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/trace/v1/trace.proto#L174-L186 + attributes?: Attributes; + + // https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#service + service?: { + // from: service.name + name: string; + // from: service.namespace + namespace?: string; + // from: service.instance.id + instance_id?: string; + // from: service.version + version?: string; + }; + + // https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#telemetry-sdk + otel_sdk?: { + // from: telemetry.sdk.name + name?: string; + // from: telemetry.sdk.language + language?: SDKLanguage; + // from: telemetry.sdk.version + version?: string; + // from: telemetry.auto.version + auto_version?: string; + }; + + // Rest of attribute keys excluding service and sdk keys + // https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/resource/v1/resource.proto#L32 + [key: string]?: string, +} +``` + +The reason sdk and service are split are so they can be indexed as top level fields in the future for easier usage within Sentry. + +## SDK Spec + +- SpanProcessor: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#span-processor +- Propogator: https://opentelemetry.io/docs/reference/specification/context/api-propagators/ diff --git a/src/docs/sdk/performance/opentelemetry/span-protocol.mdx b/src/docs/sdk/performance/opentelemetry/span-protocol.mdx new file mode 100644 index 0000000000..9af9f5b63c --- /dev/null +++ b/src/docs/sdk/performance/opentelemetry/span-protocol.mdx @@ -0,0 +1,341 @@ +Below describe the transformations between an OpenTelemetry span and a Sentry Span. Related: [the interface for a Sentry Span](https://develop.sentry.dev/sdk/event-payloads/span/), [the Relay spec for a Sentry Span](https://github.com/getsentry/relay/blob/master/relay-general/src/protocol/span.rs) and the spec for an [OpenTelemetry span](https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/trace/v1/trace.proto#L80-L256). + +This is based on a mapping done as part of work on the [OpenTelemetry Sentry Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/sentryexporter/docs/transformation.md). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OpenTelemetry SpanSentry SpanNotes
+ + trace_id + + + + trace_id + +
+ + span_id + + + + span_id + +
+ + parent_span_id + + + + parent_span_id + + + If a span does not have a parent span ID, it is a root span. For + a root span: +
  • + If there is an active Sentry transaction, add it to the + transaction +
  • +
  • + If there is no active Sentry transaction, construct a new + transaction from that span +
  • +
    + + name + + + + description + +
    + + name + + , + + attributes + + , + + kind + + + + op + +
    + + attributes + + , + + kind + + , + + status + + + + tags + + + The OpenTelemetry Span Status message and span kind are set as tags on the Sentry span. Note, if you are constructing a transaction, *DO NOT ADD TO TAGS*. +
    + + attributes + + , + + status + + + + status + + + See Span Status for more details +
    + + start_time_unix_nano + + + + start_timestamp + +
    + + end_time_unix_nano + + + + timestamp + +
    + + event + + + + See Span Events for more details +
    + +Currently there is no spec for how [Span.link in OpenTelemetry](https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/trace/v1/trace.proto#L220-L247) should appear in Sentry. + +### Span Status + +In OpenTelemetry, [Span Status is an enum of 3 values](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-status), while [Sentry's Span Status is an enum of 17 values](https://github.com/getsentry/relay/blob/ed3a521f773ecd4bf9a2aefd7af80080d56d0841/relay-common/src/constants.rs#L200-L286) that map to the [GRPC status codes](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md). Each of the Sentry Span Status codes also map to HTTP codes. Sentry adopted it's Span Status spec from OpenTelemetry, [who used the GRPC status code spec](https://github.com/open-telemetry/opentelemetry-specification/blob/8fb6c14e4709e75a9aaa64b0dbbdf02a6067682a/specification/api-tracing.md#status), but later on changed to the current spec it uses today. + +To map from OpenTelemetry Span Status to, you need to rely on both OpenTelemetry [Span Status](https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/trace/v1/trace.proto#L253-L255) and [Span attributes](https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/trace/v1/trace.proto#L174-L186). This approach was adapted from a PR by GH user [@anguisa](https://github.com/anguisa) to the [OpenTelemetry Sentry Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/13407). + +```go +// OpenTelemetry span status can be Unset, Ok, Error. HTTP and Grpc codes contained in tags can make it more detailed. + +// canonicalCodesHTTPMap maps some HTTP codes to Sentry's span statuses. See possible mapping in https://develop.sentry.dev/sdk/event-payloads/span/ +var canonicalCodesHTTPMap = map[string]sentry.SpanStatus{ + "400": sentry.SpanStatusFailedPrecondition, // SpanStatusInvalidArgument, SpanStatusOutOfRange + "401": sentry.SpanStatusUnauthenticated, + "403": sentry.SpanStatusPermissionDenied, + "404": sentry.SpanStatusNotFound, + "409": sentry.SpanStatusAborted, // SpanStatusAlreadyExists + "429": sentry.SpanStatusResourceExhausted, + "499": sentry.SpanStatusCanceled, + "500": sentry.SpanStatusInternalError, // SpanStatusDataLoss, SpanStatusUnknown + "501": sentry.SpanStatusUnimplemented, + "503": sentry.SpanStatusUnavailable, + "504": sentry.SpanStatusDeadlineExceeded, +} + +// canonicalCodesGrpcMap maps some GRPC codes to Sentry's span statuses. See description in grpc documentation. +var canonicalCodesGrpcMap = map[string]sentry.SpanStatus{ + "1": sentry.SpanStatusCanceled, + "2": sentry.SpanStatusUnknown, + "3": sentry.SpanStatusInvalidArgument, + "4": sentry.SpanStatusDeadlineExceeded, + "5": sentry.SpanStatusNotFound, + "6": sentry.SpanStatusAlreadyExists, + "7": sentry.SpanStatusPermissionDenied, + "8": sentry.SpanStatusResourceExhausted, + "9": sentry.SpanStatusFailedPrecondition, + "10": sentry.SpanStatusAborted, + "11": sentry.SpanStatusOutOfRange, + "12": sentry.SpanStatusUnimplemented, + "13": sentry.SpanStatusInternalError, + "14": sentry.SpanStatusUnavailable, + "15": sentry.SpanStatusDataLoss, + "16": sentry.SpanStatusUnauthenticated, +} + +code := spanStatus.Code() +if code < 0 || int(code) > 2 { + return sentry.SpanStatusUnknown, fmt.Sprintf("error code %d", code) +} +httpCode, foundHTTPCode := tags["http.status_code"] +grpcCode, foundGrpcCode := tags["rpc.grpc.status_code"] +var sentryStatus sentry.SpanStatus +switch { +case code == 1 || code == 0: + sentryStatus = sentry.SpanStatusOK +case foundHTTPCode: + httpStatus, foundHTTPStatus := canonicalCodesHTTPMap[httpCode] + switch { + case foundHTTPStatus: + sentryStatus = httpStatus + default: + sentryStatus = sentry.SpanStatusUnknown + } +case foundGrpcCode: + grpcStatus, foundGrpcStatus := canonicalCodesGrpcMap[grpcCode] + switch { + case foundGrpcStatus: + sentryStatus = grpcStatus + default: + sentryStatus = sentry.SpanStatusUnknown + } +default: + sentryStatus = sentry.SpanStatusUnknown +} +return sentryStatus +``` + +### Span Events + +OpenTelemetry, has the concept of [Span Events](https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/trace/v1/trace.proto#L193-L214). As per the spec: + +> An event is a human-readable message on a span that represents “something happening” during it’s lifetime + +In Sentry, we have two options for how to treat span events. First, we can add them as breadcrumbs to the transaction the span belongs to. Second, we can create an artificial "point-in-time" span (a span with 0 duration), and add it to the span tree. TODO on what approach we take here. + +In the special case that the span event is an exception span, [where the `name` of the span event is `exception`](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/), we also have the possibility of generating a Sentry error from an exception. In this case, we can create this [exception based on the attributes of an event](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/#attributes), which include the error message and stacktrace. This exception can also inherit all other attributes of the span event + span as tags on the event. + +In the OpenTelemetry Sentry exporter, we've used this [strategy to generate Sentry errors](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/8eda2f80b6dbd5aea03ca699c3ad1454714156d0/exporter/sentryexporter/sentry_exporter.go#L169-L196).