Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add formal Azure SDK tracing conventions #3260

Merged
merged 5 commits into from
Aug 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/general/azurecore.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ The response downloader is required for most (but not all) operations to change

### Distributed tracing policy

Distributed tracing allows the consumer to trace their code from frontend to backend. The distributed tracing library creates spans (units of unique work) to facilitate tracing. Each span is in a parent-child relationship. As you go deeper into the hierarchy of code, you create more spans. These spans can then be exported to a suitable receiver as needed. To keep track of the spans, a _distributed tracing context_ (called a context within the rest of this section) is passed into each successive layer. For more information on this topic, visit the [OpenTelemetry]topic on tracing.
Distributed tracing allows the consumer to trace their code from frontend to backend. The distributed tracing library creates spans (units of unique work) to facilitate tracing. Each span is in a parent-child relationship. As you go deeper into the hierarchy of code, you create more spans. These spans can then be exported to a suitable receiver as needed. Spans must follow [Tracing Conventions]. To keep track of the spans, a _distributed tracing context_ (called a context within the rest of this section) is passed into each successive layer. For more information on this topic, visit the [OpenTelemetry] topic on tracing.

The Distributed Tracing policy is responsible for:

Expand Down Expand Up @@ -270,4 +270,5 @@ OAuth token authentication, obtained via Managed Security Identities (MSI) or Az
[Transient fault handling]: https://docs.microsoft.com/azure/architecture/best-practices/transient-faults
[OpenTelemetry]: https://opentelemetry.io/
[Azure Monitor]: https://azure.microsoft.com/services/monitor/
[CIDR notation]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing
[CIDR notation]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing
[Tracing Conventions]: {{ site.baseurl }}{% link docs/tracing/distributed-tracing-conventions.md %}
25 changes: 25 additions & 0 deletions docs/tracing/distributed-tracing-conventions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: "Distributed Tracing Conventions"
permalink: distributed_tracing_conventions.html
keywords: opentelemetry conventions
folder: general
sidebar: general_sidebar
---

Conventions are the contract between Azure SDK and tracing providers such as Azure Monitor, Jaeger and others. They describe and standardize attributes, events and relationships for common span types: HTTP, DB, messaging and others. Observability vendors use conventions to build visualizations and may be very sensitive to them. Custom Azure SDK conventions are described in [tracing-conventions.yml](./distributed-tracing-conventions.yml) and include:

- Custom Azure SDK attributes
- Azure SDK HTTP conventions compatible with OpenTelemetry
- Custom messaging and DB conventions. They don't follow OpenTelemetry convention and used until OpenTelemetry conventions evolve and stabilize

When writing instrumentation in Azure SDK or Core:

{% include requirement/MUST id="general-tracing-convention-use-otel" %} use [OpenTelemetry conventions](https://github.com/open-telemetry/opentelemetry-specification/tree/main/semantic_conventions/trace) whenever possible.

{% include requirement/MUST id="general-tracing-convention-describe-attributes" %} update [distributed-tracing-conventions.yml](./distributed-tracing-conventions.yml) when adding new attributed.

{% include requirement/SHOULD id="general-tracing-convention-new-otel" %} contribute new conventions (or patch existing ones) to OpenTelemetry when there is no suitable one or some scenarios are missing.

{% include requirement/MAY id="general-tracing-convention-add-attributes" %} extend list of attributes on top of OpenTelemetry contentions with Azure-specific ones.

{% include requirement/MUSTNOT id="general-tracing-convention-new-custom" %} add new custom Azure SDK conventions.
126 changes: 126 additions & 0 deletions docs/tracing/distributed-tracing-conventions.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# This document describes Azure SDK semantic conventions for tracing in [OpenTelemetry format](https://github.com/open-telemetry/build-tools/blob/main/semantic-conventions/syntax.md).
# DO NOT add new conventions - use [OpenTelemetry conventions](https://github.com/open-telemetry/opentelemetry-specification/tree/main/semantic_conventions), but it's ok to extend existing ones.
# DO remove conventions when moving to OpenTelemetry one - it's not breaking.
# Version: 0.0.0

groups:
# common
- id: azure-sdk
brief: 'Describes Azure SDK spans.'
attributes:
- id: az.namespace
required: always
type: string
brief: '[Namespace](https://docs.microsoft.com/azure/azure-resource-manager/management/azure-services-resource-providers) of Azure service request is made against.'
examples: ['Microsoft.Storage', 'Microsoft.KeyVault', 'Microsoft.ServiceBus']

# public API
- id: azure-sdk.api
span_kind: internal
extends: azure-sdk
brief: 'Describes Azure SDK API calls spans.'
note: 'Represents public surface API calls that wrap an Azure service call.'

# http
- id: azure-sdk.http
extends: azure-sdk
span_kind: client
brief: 'Describes HTTP client spans created per HTTP request (try).'
note: >
This conventions follows [OpenTelemetry HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md)
but omits all optional attributes, providing only `http.url` to describe destination. It adds request-id attributes supported by Azure services.
attributes:
- id: http.method
type: string
required: always
brief: 'HTTP request method.'
examples: ["GET", "POST", "HEAD"]
- id: http.url
type: string
required: always
brief: Full HTTP request URL in the form `scheme://host[:port]/path?query[#fragment]`
examples: ['https://www.foo.bar/search?q=OpenTelemetry#SemConv']
- id: http.status_code
type: int
required:
conditional: If and only if one was received/sent.
brief: '[HTTP response status code](https://tools.ietf.org/html/rfc7231#section-6).'
examples: [200]
- id: http.user_agent
type: string
required: always
brief: 'Value of the [HTTP User-Agent](https://tools.ietf.org/html/rfc7231#section-5.5.3) header sent by the client.'
examples: ['CERN-LineMode/2.15 libwww/2.17b3']
- id: requestId
type: string
required: always
brief: 'Value of the [x-ms-client-request-id] header (or other request-id header, depending on the service) sent by the client.'
examples: ['eb178587-c05a-418c-a695-ae9466c5303c']
- id: serviceRequestId
type: string
required: always
brief: 'Value of the [x-ms-request-id] header (or other request-id header, depending on the service) sent by the server in response.'
examples: ['3f828ae5-ecb9-40ab-88d9-db0420af30c6']

# messaging
- id: azure-sdk.messaging
brief: 'Describes Azure messaging SDKs spans.'
extends: azure-sdk
attributes:
- id: message_bus.destination
type: string
required: always
brief: 'Name of the messaging entity within namespace: e.g EventHubs name, ServiceBus queue or topic name.'
examples: ['myqueue', 'myhub']
- id: peer.address
type: string
brief: 'Fully qualified messaging service name.'
required: always
examples: ['myEventHubNamespace.servicebus.windows.net']
- id: azure-sdk.messaging.producer
span_kind: producer
extends: azure-sdk.messaging
brief: 'Describes producer span created per message.'

- id: azure-sdk.messaging.send
span_kind: client
extends: azure-sdk.messaging
brief: 'Describes send (transport call) span.'
note: 'Contains links to all messages contexts being sent.'

- id: azure-sdk.messaging.process
span_kind: consumer
extends: azure-sdk.messaging
brief: 'Describes consumption span.'
note: >
Contains links to all messages contexts being consumed. Each link has attribute `enqueuedTime` (with `long` type)
attribute with unix epoch time with milliseconds precision representing when message was enqueued.

# db
- id: azure-sdk.cosmos
span_kind: client
brief: 'Describes Azure CosmosDB spans.'
note: >
Events with additional debug info are added for long running operations.
extends: azure-sdk
attributes:
- id: db.url
type: string
required: always
brief: 'Cosmos DB URI'
examples: ['https://my-cosmos.documents.azure.com:443/']
- id: db.statement
type: string
required: always
brief: 'Database statement'
examples: ['createContainerIfNotExists.myContainer']
- id: db.instance
type: string
required: always
brief: 'Database name'
examples: ['mydb']
- id: db.type
type: string
required: always
brief: 'Database type'
examples: ['Cosmos']