Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve semantic inconsistencies for non traditional messaging #1027

Merged
merged 16 commits into from
Oct 21, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions semantic_conventions/trace/messaging.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ groups:
type: string
brief: 'Connection string.'
examples: ['tibjmsnaming://localhost:7222', 'https://queue.amazonaws.com/80398EXAMPLE/MyQueue']
- id: service
type: string
brief: 'Name of the external broker, or name of the service being interacted with. See note below for a definition.'
arminru marked this conversation as resolved.
Show resolved Hide resolved
examples: ['kafkaService']
- id: message_id
type: string
brief: 'A value used by the messaging system as an identifier for the message, represented as a string.'
Expand Down
77 changes: 63 additions & 14 deletions specification/trace/semantic_conventions/messaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,18 @@

Although messaging systems are not as standardized as, e.g., HTTP, it is assumed that the following definitions are applicable to most of them that have similar concepts at all (names borrowed mostly from JMS):

A *message* usually consists of headers (or properties, or meta information) and an optional body. It is sent by a single message *producer* to:

* Physically: some message *broker* (which can be e.g., a single server, or a cluster, or a local process reached via IPC). The broker handles the actual routing, delivery, re-delivery, persistence, etc. In some messaging systems the broker may be identical or co-located with (some) message consumers.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
* Logically: some particular message *destination*.
A *message* is an envelope around a potentially empty payload.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
This envelope may offer the possibility to convey additional metadata, often under the key/value form.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
Messages can be delivered to 0, 1, or multiple consumers depending on the dispatching semantic of the protocol.
Traditional messaging brokers, such as JMS, use the concept of topics when a message is dispatched to potentially multiple consumers and queues when a message is dispatched to a single consumer.
arminru marked this conversation as resolved.
Show resolved Hide resolved
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
In a messaging system such as Apache Kafka, consumer groups are used. Each record, or message, is sent to a single consumer per consumer group.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
Whether a specific message is processed as if it was sent to a topic or queue entirely depends on the consumer groups and their composition.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved

### Destinations

A destination is usually identified by some name unique within the messaging system instance, which might look like an URL or a simple one-word identifier.
Two kinds of destinations are distinguished: *topic*s and *queue*s.
A message that is sent (the send-operation is often called "*publish*" in this context) to a *topic* is broadcasted to all *subscribers* of the topic.
A destination is usually identified by some name unique within the messaging system instance, which might look like a URL or a simple one-word identifier.
Traditional messaging involves two kinds of destinations: *topic*s and *queue*s.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
A message that is sent (the send-operation is often called "*publish*" in this context) to a *topic* is broadcasted to all consumers that have *subscribed* to the topic.
A message submitted to a queue is processed by a message *consumer* (usually exactly once although some message systems support a more performant at-least-once mode for messages with [idempotent][] processing).

[idempotent]: https://en.wikipedia.org/wiki/Idempotence
Expand All @@ -47,11 +49,10 @@ The consumption of a message can happen in multiple steps.
First, the lower-level receiving of a message at a consumer, and then the logical processing of the message.
Often, the waiting for a message is not particularly interesting and hidden away in a framework that only invokes some handler function to process a message once one is received
(in the same way that the listening on a TCP port for an incoming HTTP message is not particularly interesting).
However, in a synchronous conversation, the wait time for a message is important.

### Conversations

In some messaging systems, a message can receive a reply message that answers a particular other message that was sent earlier. All messages that are grouped together by such a reply-relationship are called a *conversation*.
In some messaging systems, a message can receive a reply message, or possibly multiple, that answers a particular other message that was sent earlier. All messages that are grouped together by such a reply-relationship are called a *conversation*.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
The grouping usually happens through some sort of "In-Reply-To:" meta information or an explicit *conversation ID* (sometimes called *correlation ID*).
Sometimes a conversation can span multiple message destinations (e.g. initiated via a topic, continued on a temporary one-to-one queue).

Expand All @@ -74,6 +75,7 @@ The span name SHOULD be set to the message destination name and the operation be

The destination name SHOULD only be used for the span name if it is known to be of low cardinality (cf. [general span name guidelines](../api.md#span)).
This can be assumed if it is statically derived from application code or configuration.
Wherever possible, the preference is to use real destination names over logical or aliased names.
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
If the destination name is dynamic, such as a [conversation ID](#conversations) or a value obtained from a `Reply-To` header, it SHOULD NOT be used for the span name.
In these cases, an artificial destination name that best expresses the destination, or a generic, static fallback like `"(temporary)"` for [temporary destinations](#temporary-destinations) SHOULD be used instead.

Expand Down Expand Up @@ -118,14 +120,15 @@ The following operations related to messages are defined for these semantic conv
| `messaging.protocol` | string | The name of the transport protocol. | `AMQP`<br>`MQTT` | No |
| `messaging.protocol_version` | string | The version of the transport protocol. | `0.9.1` | No |
| `messaging.url` | string | Connection string. | `tibjmsnaming://localhost:7222`<br>`https://queue.amazonaws.com/80398EXAMPLE/MyQueue` | No |
| `messaging.service` | string | Name of the external broker, or name of the service being interacted with. See note below for a definition. | `kafkaService` | No |
| `messaging.message_id` | string | A value used by the messaging system as an identifier for the message, represented as a string. | `452a7c7c7c7048c2f887f61572b18fc2` | No |
| `messaging.conversation_id` | string | The [conversation ID](#conversations) identifying the conversation to which the message belongs, represented as a string. Sometimes called "Correlation ID". | `MyConversationId` | No |
| `messaging.message_payload_size_bytes` | number | The (uncompressed) size of the message payload in bytes. Also use this attribute if it is unknown whether the compressed or uncompressed payload size is reported. | `2738` | No |
| `messaging.message_payload_compressed_size_bytes` | number | The compressed size of the message payload in bytes. | `2048` | No |

**[1]:** Required only if the message destination is either a `queue` or `topic`.

**Additional attribute requirements:** At least one of the following sets of attributes is required:
**Additional attribute recommendations:** At least one of the following sets of attributes is recommended:

* [`net.peer.name`](span-general.md)
* [`net.peer.ip`](span-general.md)
Expand All @@ -140,6 +143,7 @@ The following operations related to messages are defined for these semantic conv

Additionally `net.peer.port` from the [network attributes][] is recommended.
Furthermore, it is strongly recommended to add the [`net.transport`][] attribute and follow its guidelines, especially for in-process queueing systems (like [Hangfire][], for example).
`messaging.service` refers to the logical name of the external broker or messaging system where a message was sent to, or received from. In an environment such as Kubernetes, it would be the Kubernetes Service Name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify how this differs from the peer.service general span attribute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression of peer.service is a service running on a specific peer, or host, of a network.

In an environment like Kubernetes a single "peer", or node, can have many different services exposed on it. Meaning that sending a message to a Kafka service named "my-kafka" means more than a peer.service of "mycluster.kube", or some permutation thereof.

It's possible I'm missing something, but that's my impression, which is why I felt a separate message.service was needed. Granted there might be situations where they are the same, but in Kubernetes that would be rare

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would interpret peer.service to be the particular service this span is directed at, which also works if a given peer or host exposes multiple different services.

@anuraaga do you think we could clarify this further in the spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arminru am I correct in saying your interpretation of peer.service is actually the name of the service?

From my perspective, all the other peer.* or net.peer.* information relates to specific host or network node information, which is why I see peer.service as referring to a service on a specific host, as opposed to a virtual service name that could be clustered across many hosts/nodes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. net.* is about lower-level networking information where net.peer.* is about the host on the other end. peer.service on the other hand is about the higher-level service. I don't think there's any other peer.* attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about that. The peer.service attribute is something that is expected to be manually configured by the user, not automatically determined by the instrumentation.

See https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/span-general.md#general-remote-service-attributes

Users can define what the name of a service is based on their particular semantics in their distributed system. Instrumentations SHOULD provide a way for users to configure this name.
...
Examples of peer.service that users may specify:

CC @anuraaga who introduced peer.service in #652

Still it sounds like this concept is not specific to messaging, so maybe we should have peer.detected_service if peer.service is otherwise fitting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iNikem did you mean messaging.service duplicates peer.service or Resource's service.name? Personally, it maybe overlaps with the latter and not the former.

I don't see peer.service being a great name for what it is, but if that's the preferred approach I will remove messaging.service

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They actually belong together.
Given, for example, a client A sending a request to a server process B exposing a service called X:
A creates a span for the request and sets peer.service="X"
B, on the other end, sets service.name="X" on its resource, which will be added to a span it creates for tracking the processing of A's request

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arminru, it hasn't been clear where Resource fits, that helps.

I will drop message.service and use peer.service

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please add a note in the Kafka section saying it is recommended to set this and how to determine the appropriate value to make sure instrumentation writers are aware of it.

@anuraaga please verify if you think it's fine that instrumentations set this automatically (while still allowing users to override it, although I'm not really sure how this will work in practice).

These attributes should be set to the broker to which the message is sent/from which it is received.

[network attributes]: span-general.md#general-network-connection-attributes
Expand Down Expand Up @@ -176,6 +180,17 @@ In RabbitMQ, the destination is defined by an _exchange_ and a _routing key_.
`messaging.destination` MUST be set to the name of the exchange. This will be an empty string if the default exchange is used.
The routing key MUST be provided to the attribute `messaging.rabbitmq.routing_key`, unless it is empty.

#### Apache Kafka
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a tombstone filled in the instrumentation currently, is it useful to have here?

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/f23ad29187ecea4e742e1dd6a5baeb84020298d4/instrumentation/kafka-clients-0.11/src/main/java/io/opentelemetry/javaagent/instrumentation/kafkaclients/KafkaProducerTracer.java#L52

If not it's ok and we can remove it from the Instrumentation. But this seems like the right time to get out instrumentation synced up with the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might also consider separating out a subfolder for messaging implementations as per #968 - there are so many of them that I worry a single doc just gets unwieldy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problems with the delay, I appreciate everyone is super busy right now and context switching IS difficult.

I can see it being beneficial to identify if a particular trace is related to a tombstone record. I can add it to the Kafka section.

Can the splitting be done in a subsequent PR, or you'd prefer to include it here as well? I can certainly follow up with such a change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting in a separate PR sounds great, thanks


For Apache Kafka, the following additional attributes are defined:

| Attribute name | Notes and examples |
| -------------- | ---------------------------------------------------------------------- |
| `messaging.kafka.message_key` | Differs from `messaging.message_id` in that it's not unique, and can be `null`. The type is a String representation of the type of the actual value. |
| `messaging.kafka.consumer_group` | Name of the Kafka Consumer Group that is handling the message. Only applies to consumers, not producers. |
| `messaging.kafka.client_id` | Client Id for the Consumer or Producer that is handling the message. |
| `messaging.kafka.partition` | Partition the message is sent to. |

## Examples

### Topic with multiple consumers
Expand All @@ -199,11 +214,43 @@ Process CB: | Span CB1 |
| Status | `Ok` | `Ok` | `Ok` |
| `net.peer.name` | `"ms"` | `"ms"` | `"ms"` |
| `net.peer.port` | `1234` | `1234` | `1234` |
| `messaging.system` | `"kafka"` | `"kafka"` | `"kafka"` |
| `messaging.system` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` |
| `messaging.destination` | `"T"` | `"T"` | `"T"` |
| `messaging.destination_kind` | `"topic"` | `"topic"` | `"topic"` |
| `messaging.service` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` |
arminru marked this conversation as resolved.
Show resolved Hide resolved
| `messaging.operation` | | `"process"` | `"process"` |
| `messaging.message_id` | `"a1"` | `"a1"`| `"a1"` |
arminru marked this conversation as resolved.
Show resolved Hide resolved

### Apache Kafka
arminru marked this conversation as resolved.
Show resolved Hide resolved

Given is a process P, that publishes a message to a topic T1 on Apache Kafka.
One process, CA, receives the message and publishes a new message to a topic T2 that is then received and processed by CB.

```
Process P: | Span Prod1 |
--
Process CA: | Span Rcv1 |
| Span Proc1 |
| Span Prod2 |
--
Process CB: | Span Rcv2 |
```

| Field or Attribute | Span Prod1 | Span Rcv1 | Span Proc1 | Span Prod2 | Span Rcv2
|-|-|-|-|-|-|
| Span name | `"T1 send"` | `"T1 receive"` | `"T1 process"` | `"T2 send"` | `"T2 receive`" |
| Parent | | Span Prod1 | Span Prod1 | | Span Prod2 |
| Links | | | Span Rcv1 | Span Prod1 | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Parent | | Span Prod1 | Span Prod1 | | Span Prod2 |
| Links | | | Span Rcv1 | Span Prod1 | |
| Parent | | Span Prod1 | Span Rcv1 | Span Proc1 | Span Prod2 |
| Links | | | Span Prod1 | | |

Proc1's parent would be Rcv1 in this case as it's Proc1's direct predecessor and Prod1 would be added as a link.
Same for Prod2 being a child of Proc1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the reasoning for Proc1's parent being Rcv1, but Prod2 is not a child of Proc1.

The reasoning is that in Kafka the producing of a message is disconnected in time from when, or how often, that message is processed.

In this situation, Prod2 may be processed quickly, or it could be days/weeks/months before it's processed. From a span perspective, I don't think it should have a parent of Proc1 due to that timing disconnect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your timing diagram and description above it looks like as part of processing of the first message, i.e., in Proc1, the second message would be produced, i.e., Prod2. Is this not the case? For me it looks like Prod2 would be a direct consequence of Proc1. Please note that parents don't have to necessarily enclose their children from a timing perspective but also allow for async relationships.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is performed as part of processing the first message. However, in reactive messaging and event-driven architectures there isn't always a "parent-child" connection, it's more like a correlation or link relationship. They may be "parent-child", but not always.

If it is added as a parent, and it hasn't had any further span, doesn't that leave the parent span/trace unclosed? Does that make sense for reporting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related #958 / CC @anuraaga wanted kinda the opposite there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Oberon00, I knew there was a better term but couldn't think of it. Reading the issue made me realize it was "follows-from".

So for reactive messaging and event-driven, producing a new message while processing an existing message is more a "follows-from" relationship as opposed to parent-child, which is why I had used Links

| SpanKind | `PRODUCER` | `CONSUMER` | `CONSUMER` | `PRODUCER` | `CONSUMER` |
| Status | `Ok` | `Ok` | `Ok` | `Ok` | `Ok` |
| `messaging.system` | `"kafka"` | `"kafka"` | `"kafka"` | `"kafka"` | `"kafka"` |
| `messaging.destination` | `"T1"` | `"T1"` | `"T1"` | `"T2"` | `"T2"` |
| `messaging.destination_kind` | `"topic"` | `"topic"` | `"topic"` | `"topic"` | `"topic"` |
| `messaging.service` | `"myKafka"` | `"myKafka"` | `"myKafka"` | `"myKafka"` | `"myKafka"` |
| `messaging.operation` | | `"receive"` | `"process"` | | `"receive"` |
| `messaging.kafka.message_key` | `"myKey"` | `"myKey"` | `"myKey"` | `"anotherKey"` | `"anotherKey"` |
| `messaging.kafka.consumer_group` | | `"my-group"` | `"my-group"` | | `"another-group"` |
| `messaging.kafka.client_id` | | `"5"` | `"5"` | `"5"` | `"8"` |
| `messaging.kafka.partition` | | `"1"` | `"1"` | | `"3"` |

### Batch receiving

Expand All @@ -228,9 +275,10 @@ Process C: | Span Recv1 |
| Status | `Ok` | `Ok` | `Ok` | `Ok` | `Ok` |
| `net.peer.name` | `"ms"` | `"ms"` | `"ms"` | `"ms"` | `"ms"` |
| `net.peer.port` | `1234` | `1234` | `1234` | `1234` | `1234` |
| `messaging.system` | `"kafka"` | `"kafka"` | `"kafka"` | `"kafka"` | `"kafka"` |
| `messaging.system` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` |
| `messaging.destination` | `"Q"` | `"Q"` | `"Q"` | `"Q"` | `"Q"` |
| `messaging.destination_kind` | `"queue"` | `"queue"` | `"queue"` | `"queue"` | `"queue"` |
| `messaging.service` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` |
| `messaging.operation` | | | `"receive"` | `"process"` | `"process"` |
| `messaging.message_id` | `"a1"` | `"a2"` | | `"a1"` | `"a2"` |

Expand Down Expand Up @@ -261,8 +309,9 @@ Process C: | Span Recv1 | Span Recv2 |
| Status | `Ok` | `Ok` | `Ok` | `Ok` | `Ok` |
| `net.peer.name` | `"ms"` | `"ms"` | `"ms"` | `"ms"` | `"ms"` |
| `net.peer.port` | `1234` | `1234` | `1234` | `1234` | `1234` |
| `messaging.system` | `"kafka"` | `"kafka"` | `"kafka"` | `"kafka"` | `"kafka"` |
| `messaging.system` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` | `"rabbitmq"` |
| `messaging.destination` | `"Q"` | `"Q"` | `"Q"` | `"Q"` | `"Q"` |
| `messaging.destination_kind` | `"queue"` | `"queue"` | `"queue"` | `"queue"` | `"queue"` |
| `messaging.service` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` | `"myRabbitMQ"` |
| `messaging.operation` | | | `"receive"` | `"receive"` | `"process"` |
| `messaging.message_id` | `"a1"` | `"a2"` | `"a1"` | `"a2"` | |