This is a proposal to add HTTP Transport extension for OTLP (OpenTelemetry Protocol).
OTLP can be currently communicated only via one transport: gRPC. While using gRPC has certain benefits there are also drawbacks:
-
Some users have infrastructure limitations that make gRPC-based protocol usage impossible. For example AWS ALB does not support gRPC connections.
-
gRPC is a relatively big dependency, which some clients are not willing to take. Plain HTTP is a smaller dependency and is built in the standard libraries of many programming languages.
This proposal keeps the existing specification of OTLP over gRPC transport (OTLP/gRPC for short) and defines an additional way to use OTLP protocol over HTTP transport (OTLP/HTTP for short). OTLP/HTTP uses the same ProtoBuf payload that is used by OTLP/gRPC and defines how this payload is communicated over HTTP transport.
OTLP/HTTP uses HTTP POST requests to send telemetry data from clients to servers. Implementations MAY use HTTP/1.1 or HTTP/2 transports. Implementations that use HTTP/2 transport SHOULD fallback to HTTP/1.1 transport if HTTP/2 connection cannot be established.
Telemetry data is sent via HTTP POST request.
The default URL path for requests that carry trace data is /v1/traces
(for
example the full URL when connecting to "example.com" server will be
https://example.com/v1/traces
). The request body is a ProtoBuf-encoded
ExportTraceServiceRequest
message.
The default URL path for requests that carry metric data is /v1/metrics
and the
request body is a ProtoBuf-encoded
ExportMetricsServiceRequest
message.
The client MUST set "Content-Type: application/x-protobuf" request header. The client MAY gzip the content and in that case SHOULD include "Content-Encoding: gzip" request header. The client MAY include "Accept-Encoding: gzip" request header if it can receive gzip-encoded responses.
Non-default URL paths for requests MAY be configured on the client and server sides.
On success the server MUST respond with HTTP 200 OK
. Response body MUST be
ProtoBuf-encoded
ExportTraceServiceResponse
message for traces and
ExportMetricsServiceResponse
message for metrics.
The server MUST set "Content-Type: application/x-protobuf" response header. If the request header "Accept-Encoding: gzip" is present in the request the server MAY gzip-encode the response and set "Content-Encoding: gzip" response header.
The server SHOULD respond with success no sooner than after successfully decoding and validating the request.
If the processing of the request fails the server MUST respond with appropriate
HTTP 4xx
or HTTP 5xx
status code. See sections below for more details about
specific failure cases and HTTP status codes that should be used.
Response body for all HTTP 4xx
and HTTP 5xx
responses MUST be a
ProtoBuf-encoded
Status
message that describes the problem.
This specification does not use Status.code
field and the server MAY omit
Status.code
field. The clients are not expected to alter their behavior based
on Status.code
field but MAY record it for troubleshooting purposes.
The Status.message
field SHOULD contain a developer-facing error message as
defined in Status
message schema.
The server MAY include Status.details
field with additional details. Read
below about what this field can contain in each specific failure case.
If the processing of the request fails because the request contains data that
cannot be decoded or is otherwise invalid and such failure is permanent then the
server MUST respond with HTTP 400 Bad Request
. The Status.details
field in
the response SHOULD contain a
BadRequest
that describes the bad data.
The client MUST NOT retry the request when it receives HTTP 400 Bad Request
response.
If the server receives more requests than the client is allowed or the server is
overloaded the server SHOULD respond with HTTP 429 Too Many Requests
or
HTTP 503 Service Unavailable
and MAY include
"Retry-After" header with a
recommended time interval in seconds to wait before retrying.
The client SHOULD honour the waiting interval specified in "Retry-After" header
if it is present. If the client receives HTTP 429
or HTTP 503
response and
"Retry-After" header is not present in the response then the client SHOULD
implement an exponential backoff strategy between retries.
All other HTTP responses that are not explicitly listed in this document should be treated according to HTTP specification.
If the server disconnects without returning a response the client SHOULD retry and send the same request. The client SHOULD implement an exponential backoff strategy between retries to avoid overwhelming the server.
If the client is unable to connect to the server the client SHOULD retry the connection using exponential backoff strategy between retries. The interval between retries must have a random jitter.
The client SHOULD keep the connection alive between requests.
Server implementations MAY handle OTLP/gRPC and OTLP/HTTP requests on the same port and multiplex the connections to the corresponding transport handler based on "Content-Type" request header.
To achieve higher total throughput the client MAY send requests using several parallel HTTP connections. In that case the maximum number of parallel connections SHOULD be configurable.
I have also considered HTTP/1.1+WebSocket transport. Experimental implementation of OTLP over WebSocket transport has shown that it typically has better performance than plain HTTP transport implementation (WebSocket uses less CPU, higher throughput in high latency connections). However WebSocket transport requires slightly more complicated implementation and WebSocket libraries are less ubiquitous than plain HTTP, which may make implementation in certain languages difficult or impossible.
HTTP/1.1+WebSocket transport may be considered as a future transport for high-performance use cases as it exhibits better performance than OTLP/gRPC and OTLP/HTTP.