PIP-264, which can also be viewed here, describes in high level a plan to greatly enhance Pulsar metric system by replacing it with OpenTelemetry. You can read in the PIP the numerous existing problems PIP-264 solves. Among them are:
- Control which metrics to export per topic/group/namespace via the introduction of a metric filter configuration. This configuration is planned to be dynamic as outline in the PIP-264.
- Reduce the immense metrics cardinality due to high topic count (One of Pulsar great features), by introducing the concept of Metric Group - a group of topics for metric purposes. Metric reporting will also be done to a group granularity. 100k topics can be downsized to 1k groups. The dynamic metric filter configuration would allow the user to control which metric group to un-filter.
- Proper histogram exporting
- Clean-up codebase clutter, by relying on a single industry standard API, SDK and metrics protocol (OTLP) instead of existing mix of home-brew libraries and hard coded Prometheus exporter.
- any many more
You can here why OpenTelemetry was chosen.
Since OpenTelemetry (a.k.a. OTel) is an emerging industry standard, there are plenty of good articles, videos and documentation about it. In this very short paragraph I'll describe what you need to know about OTel from this PIP perspective.
OpenTelemetry is a project aimed to standardize the way we instrument, collect and ship metrics from applications to telemetry backends, be it databases (e.g. Prometheus, Cortex, Thanos) or vendors (e.g. Datadog, Logz.io). It is divided into API, SDK and Collector:
- API: interfaces to use to instrument: define a counter, record values to a histogram, etc.
- SDK: a library, available in many languages, implementing the API, and other important features such as reading the metrics and exporting it out to a telemetry backend or OTel Collector.
- Collector: a lightweight process (application) which can receive or retrieve telemetry, transform it (e.g. filter, drop, aggregate) and export it (e.g. send it to various backends). The SDK supports out-of-the-box exporting metrics as Prometheus HTTP endpoint or sending them out using OTLP protocol. Many times companies choose to ship to the Collector and there ship to their preferred vendors, since each vendor already published their exporter plugin to OTel Collector. This makes the SDK exporters very light-weight as they don't need to support any vendor. It's also easier for the DevOps team as they can make OTel Collector their responsibility, and have application developers only focus on shipping metrics to that collector.
Just to have some context: Pulsar codebase will use the OTel API to create counters / histograms and records values to
them. So will the Pulsar plugins and Pulsar Function authors. Pulsar itself will be the one creating the SDK
and using that to hand over an implementation of the API where ever needed in Pulsar. Collector is up to the choice
of the user, as OTel provides a way to expose the metrics as /metrics
endpoint on a configured port, so Prometheus
compatible scrapers can grab it from it directly. They can also send it via OTLP to OTel collector.
PIP-264 clearly outlined there will be two layers of metrics, collected and exported, side by side: OpenTelemetry and the existing metric system - currently exporting in Prometheus. This PIP will explain in detail how it will work. The basic premise is that you will be able to enable or disable OTel metrics, alongside the existing Prometheus metric exporting.
As specified in PIP-264, OpenTelemetry Java SDK has several fixes the Pulsar community must
complete before it can be used in production. They are documented
in PIP-264. The most important one is reducing memory allocations to be negligible. OTel SDK is built upon immutability,
hence allocated memory in O(#topics
) which is a performance killer for low latency application like Pulsar.
You can track the proposal and progress the Pulsar and OTel communities are making in this issue.
Today Pulsar metrics endpoint /metrics
has an option to be protected by the configured AuthenticationProvider
.
The configuration option is named authenticateMetricsEndpoint
in the broker and
authenticateMetricsEndpoint
in the proxy.
Implementing PIP-264 consists of a long list of steps, which are detailed in this issue. The first step is add all the bare-bones infrastructure to use OpenTelemetry in Pulsar, such that next PRs can use it to start translating existing metrics to their OTel form. It means the same metrics will co-exist in the codebase and also in runtime, if OTel was enabled.
- Ability to add metrics using OpenTelemetry to Pulsar components: Broker, Function Worker and Proxy.
- User can disable or enable OpenTelemetry metrics, which by default will be disabled
- OpenTelemetry metrics will be configured via its native OTel Java SDK configuration options
- All the necessary information to use OTel with Pulsar will be documented in Pulsar documentation site
- OpenTelemetry metrics layer defined as experimental, and not GA
- Ability to add metrics using OpenTelemetry as Pulsar Function author.
- Only authenticated sessions can access OTel Prometheus endpoint, using Pulsar authentication
- Metrics in Pulsar clients (as defined in PIP-264))
OpenTelemetry, as any good telemetry library (e.g. log4j, logback), has its own configuration mechanisms:
- System properties
- Environment variables
- Experimental file-based configuration
Pulsar doesn't need to introduce any additional configuration. The user can decide, using OTel configuration things like:
- How do I want to export the metrics? Prometheus? Which port prometheus will be exposed at
- Change histogram buckets using Views
- and more
Pulsar will use AutoConfiguredOpenTelemetrySdk
which uses all the above configuration mechanisms
(documented here).
This class builds an OpenTelemetrySdk
based on configurations. This is the entry point to OpenTelemetry API, as it
implements OpenTelemetry
API class.
There are some configuration options we wish to change their default, but still allow the users to override it if they wish. We think those default values will make a much easier user experience.
otel.experimental.metrics.cardinality.limit
- value: 10,000 This property sets an upper bound on the amount of uniqueAttributes
an instrument can have. Take Pulsar for example, an instrument likepulsar.broker.messaging.topic.received.size
, the uniqueAttributes
would be in the amount of active topics in the broker. Since Pulsar can handle up to 1M topics, it makes more sense to put the default value to 10k, which translates to 10k topics.
AutoConfiguredOpenTelemetrySdkBuilder
allows to add properties using the method addPropertiesSupplier
. The
System properties and environment variables override it. The file-based configuration still doesn't take
those properties supplied into account, but it will.
We would like to have the ability to toggle OpenTelemetry-based metrics, as they are still new. We won't need any special Pulsar configuration, as OpenTelemetry SDK comes with a configuration key to do that. Since OTel is still experimental, it will have to be opt-in, hence we will add the following property to be the default using the mechanism described above:
otel.sdk.disabled
- value: true This property value disables OpenTelemetry.
With OTel disabled, the user remains with the existing metrics system. OTel in a disabled state operates in a
no-op mode. This means, instruments do get built, but the instrument builders return the same instance of a
no-op instrument, which does nothing on record-values method (e.g. add(number)
, record(number)
). The no-op
MeterProvider
has no registered MetricReader
hence no metric collection will be made. The memory impact
is almost 0 and the same goes for CPU impact.
The current metric system doesn't have a toggle which causes all existing data structures to stop collecting
data. Inserting will need changing in so many places since we don't have a single place which through
all metric instrument are created (one of the motivations for PIP-264).
The current system do have a toggle: exposeTopicLevelMetricsInPrometheus
. It enables toggling off
topic-level metrics, which means the highest cardinality metrics will be namespace level.
Once that toggle is false
, the amount of data structures accounting memory would in the range of
a few thousands which shouldn't post a burden memory wise. If the user refrain from calling
/metrics
it will also reduce the CPU and memory cost associated with collecting metrics.
When the user enables OTel it means there will be a memory increase, but if the user disabled topic-level metrics in existing system, as specified above, the majority of the memory increase will be due to topic level metrics in OTel, at the expense of not having them in the existing metric system.
A broker is part of a cluster. It is configured in the Pulsar configuration key clusterName
. When the broker is part
of a cluster, it means it shares the topics defined in that cluster (persisted in Metadata service: e.g. ZK)
among the brokers of that cluster.
Today, each unique time series emitted in Prometheus metrics contains the cluster
label (almost all of them, as it
is done manually). We wish the same with OTel - to have that attribute in each exported unique time series.
OTel has the perfect location to place attributes which are shared across all time series: Resource. An application can have multiple Resource, with each having 1 or more attributes. You define it once, in OTel initialization or configuration. It can contain attributes like the hostname, AWS region, etc. The default contains the service name and some info on the SDK version.
Attributes can be added dynamically, through addResourceCustomizer()
in AutoConfiguredOpenTelemetrySdkBuilder
.
We will use that to inject the cluster
attribute, taken from the configuration.
In Prometheus, we submitted a proposal to opentelemetry specifications, which was merged, to allow copying resource attributes into each exported unique time series in Prometheus exporter. We plan to contribute its implementation to OTel Java SDK.
Resources in Prometheus exporter, are exported as target_info{} 1
and the attributes are added to this
time series. This will require making joins to get it, making it extremely difficult to use.
The other alternative was to introduce our own PulsarAttributesBuilder
class, on top of
AttributesBuilder
of OTel. Getting every contributor to know this class, use it, is hard. Getting this
across Pulsar Functions or Plugins authors, will be immensely hard. Also, when exporting as
OTLP, it is very inefficient to repeat the attribute across all unique time series, instead of once using
Resource. Hence, this needed to be solved in the Prometheus exporter as we did in the proposal.
The attribute will be named pulsar.cluster
, as both the proxy and the broker are part of this cluster.
- We shall prefix each attribute with
pulsar.
. Example:pulsar.topic
,pulsar.cluster
.
We should have a clear hierarchy, hence use the following prefix
pulsar.broker
pulsar.proxy
pulsar.function_worker
It's customary to use reverse domain name for meter names. Hence, we'll use:
org.apache.pulsar.broker
org.apache.pulsar.proxy
org.apache.pulsar.function_worker
OTel meter name is converted to the attribute name otel_scope_name
and added to each unique time series
attributes by Prometheus exporter.
We won't specify a meter version, as it is used solely to signify the version of the instrumentation, and currently we are the first version, hence not use it.
-
OpenTelemetryService
class- Parameters:
- Cluster name
- What it will do:
- Override default max cardinality to 10k
- Register a resource with cluster name
- Place defaults setting to instruct Prometheus Exporter to copy resource attributes
- In the future: place defaults for Memory Mode to be REUSABLE_DATA
- Parameters:
-
PulsarBrokerOpenTelemetry
class- Initialization
- Construct an
OpenTelemetryService
using the cluster name taken from the broker configuration - Constructs a Meter for the broker metrics
- Construct an
- Methods
getMeter()
returns theMeter
for the broker
- Notes
- This is the class that will be passed along to other Pulsar service classes that needs to define telemetry such as metrics (in the future: traces).
- Initialization
-
PulsarProxyOpenTelemetry
class- Same as
PulsarBrokerOpenTelemetry
but for Pulsar Proxy
- Same as
-
PulsarWorkerOpenTelemetry
class- Same as
PulsarBrokerOpenTelemetry
but for Pulsar function worker
- Same as
- OTel Prometheus Exporter adds
/metrics
endpoint on a user defined port, if user chose to use it
- OTel configurations are used
- OTel currently does not support setting a custom Authenticator for Prometheus exporter.
An issue has been raised here.- Once it do we can secure the Prometheus exporter metrics endpoint using
AuthenticationProvider
- Once it do we can secure the Prometheus exporter metrics endpoint using
- Any user can access metrics, and they are not protected per tenant. Like today's implementation
- Mailing List discussion thread: https://lists.apache.org/thread/xcn9rm551tyf4vxrpb0th0wj0kktnrr2
- Mailing List voting thread: https://lists.apache.org/thread/zp6vl9z9dhwbvwbplm60no13t8fvlqs2