Metric names between OpenCensus & Prometheus exporters are different #2174

dprotaso · 2021-06-28T22:31:07Z

/area monitoring
/kind cleanup

We currently prepend a 'domain' and the 'component' name to the metrics. This done for our OpenCensus exporter which might have been cargo cult'd from our stack driver exporter implementation. The prometheus exporter only prefixes the 'component' name to the metric.

pkg/metrics/opencensus_exporter.go

Line 39 in cf1994e

metrixPrefix := path.Join(config.domain, config.component)

We should determine whether these should be consistent, if there should be a prefix and it's format

cc @upodroid @skonto @evankanderson @MontyCarter @pianohacker

dprotaso · 2021-06-28T22:32:36Z

From @pianohacker re: prepending a domain prefix for prometheus https://knative.slack.com/archives/CEPGNPHP1/p1624558123099100

This level of namespacing on the metric name isn't typically done for Prometheus metrics; see https://prometheus.io/docs/practices/naming/
The source (as opposed to the kind) of a Prometheus-exposed metric is intended to be distinguished by its labels (in this case, namespace).
In that vein, could we expose the ability to configure one or more labels on the exposed metrics?

dprotaso · 2021-06-28T22:42:05Z

cc @evankanderson who reviewed the OpenCensus exporter changes

skonto · 2021-06-29T10:03:27Z

Regarding Prometheus this is what I read and what we have does not seem to be a violation of the recommendations unless I misread something:

...should have a (single-word) application prefix relevant to the domain the metric belongs to. The prefix is sometimes referred to as namespace by client libraries. For metrics specific to an application, the prefix is usually the application name itself. Sometimes, however, metrics are more generic, like standardized metrics exported by client libraries. Examples:
prometheus_notifications_total (specific to the Prometheus server)
process_cpu_seconds_total (exported by many client libraries)
http_request_duration_seconds (for all HTTP requests)

I am not sure what value the domain adds here we already use the component name as the prefix and we know the domain for each component, it is straightforward for the control plane ones eg. Serving, Eventing.
Maybe there is some value for the user services to be able to correlate queue proxy metrics with some domain eg. my_business_domain but then Prom queries will have to detect that with some regex and does not make life easier (especially if we have arbitrary domains as prefixes) eg. for building generic dashboards. Also not though having queue_proxy as a metric prefix works well so far.
However, there is a valid need to be able to configure existing/add custom labels eg. tag metrics with special labels to avoid joining metrics info based on kube-state-metrics or other sources. I am facing this downstream as I would like to tag metrics with the func tag for Boson funcs, as not all services are functions.
What issues do we see with the current status? Are there users of the Opencensus exporter that benefit from the extra domain info?

dprotaso · 2021-06-29T17:26:43Z

@skonto @evankanderson during the operation wg meeting mentioned you wrote a doc about stable metric names and a potential deprecation policy. Do you have a link to that?

Whatever we decide with this change we'll want to roll it out gradually with a flag.

evankanderson · 2021-06-29T17:39:50Z

The domain name is used today by at least Stackdriver, though they could set up a mapping to enrich the content:

https://cloud.google.com/monitoring/api/metrics_knative

While Prometheus tends to prefer short, non-namespaced metric names, some other products like Stackdriver and Wavefront tend to prefer somewhat more nested names (deeply for stackdriver, medium for Wavefront). Based on the principle that it's easier to remove data than to add it, I incorporated the domain and component labels in OpenCensus, though I'd be happy with an alternate mechanism that didn't discard these labels.

Note that there are some metrics-exporting components from both Serving and Eventing that run in a user's namespace, so having some mechanism (metric name or labels) to distinguish metrics between the two would be useful. At the same time, common metrics like go GC information should have the same metric name for both.

Maybe a few examples from our existing documentation:

metric (opencensus name)	type	prefix reason
serving/activator/request_count	HTTP request count	Prefix with serving component, because activator request count may be double-counted with e.g. queue-proxy counts
serving/revision/request_count	HTTP request count	Prefix with revision; counted at the queue-proxy. May double-count with activator.
go_alloc	go memory stats	No prefix, common go instrumentation across components.
eventing/broker/event_count	events received count	Prefix with eventing component, because events are counted at several places in the system.
eventing/broker/request_count	HTTP request count	Prefix with eventing component; not all HTTP requests may result in an event receipt.
eventing/trigger/event_count	Post-filter event delivery counts	Prefix with eventing component, because events are counted at several places in the system.
eventing/trigger/http_requests	Post-filter HTTP request counts	Prefix with eventing component; not all HTTP requests may

MontyCarter · 2021-06-30T04:30:41Z

Yes, Stackdriver is using the domain, and the domain is different based on the component. However, the metric domains are pretty static, so it would be manageable keep a map in an otel collector config and prepend the domain to the metric name based on the sender.

With that said, I think it would be much nicer to pass the domain as a label rather than simply removing the domain from the metric. This would make it easy to rewrite the name from the domain label in an otel collector config.

evankanderson · 2021-07-06T19:03:28Z

Summary of discussion from 6 July:

We're going to write up a plan in metrics/README.md for the desired future state
Metric renames will happen probably behind a ConfigMap flag, and we'll announce them ahead-of-time to knative-users so people can pre-set the ConfigMap value.
We're going to extract the component and domain parts into separate labels for OTel support behind the flag, so the short metric names match the prometheus ones. (We should also look at the standardization going on in https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/metrics/semantic_conventions, particularly http-metrics.md).

dprotaso · 2021-07-06T19:04:52Z

Direct link to the meeting notes (2021/07/06)

dprotaso · 2021-07-06T19:07:18Z

We should also look at the standardization going on in https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/metrics/semantic_conventions, particularly http-metrics.md).

Probably would be part of #855

dprotaso · 2021-07-14T19:00:21Z

/assign @dprotaso

github-actions · 2021-10-18T01:26:55Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

upodroid · 2021-10-20T22:33:42Z

/remove-lifecycle stale

github-actions · 2022-01-19T01:24:06Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

dprotaso · 2022-02-18T22:27:50Z

/reopen
/lifecycle frozen

knative-prow-robot added area/monitoring kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 28, 2021

dprotaso changed the title ~~Metric names between OpenCensus & Prometheus are different~~ Metric names between OpenCensus & Prometheus exporters are different Jun 28, 2021

dprotaso mentioned this issue Jul 12, 2021

drop stackdriver metrics and tracing exporters #2183

Merged

knative-prow-robot assigned dprotaso Jul 14, 2021

dprotaso mentioned this issue Jul 22, 2021

include a flag to disable metric prefixes #2198

Closed

upodroid mentioned this issue Oct 15, 2021

Update metrics documentation knative/docs#4138

Merged

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2021

knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2021

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2022

github-actions bot closed this as completed Feb 18, 2022

dprotaso reopened this Feb 18, 2022

knative-prow-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2022

dprotaso mentioned this issue Jul 7, 2023

Avoid queue telemetry using hostname as telemetry service name knative/serving#14075

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric names between OpenCensus & Prometheus exporters are different #2174

Metric names between OpenCensus & Prometheus exporters are different #2174

dprotaso commented Jun 28, 2021 •

edited

Loading

dprotaso commented Jun 28, 2021

dprotaso commented Jun 28, 2021

skonto commented Jun 29, 2021 •

edited

Loading

dprotaso commented Jun 29, 2021

evankanderson commented Jun 29, 2021 •

edited

Loading

MontyCarter commented Jun 30, 2021

evankanderson commented Jul 6, 2021

dprotaso commented Jul 6, 2021 •

edited

Loading

dprotaso commented Jul 6, 2021

dprotaso commented Jul 14, 2021

github-actions bot commented Oct 18, 2021

upodroid commented Oct 20, 2021

github-actions bot commented Jan 19, 2022

dprotaso commented Feb 18, 2022

Metric names between OpenCensus & Prometheus exporters are different #2174

Metric names between OpenCensus & Prometheus exporters are different #2174

Comments

dprotaso commented Jun 28, 2021 • edited Loading

dprotaso commented Jun 28, 2021

dprotaso commented Jun 28, 2021

skonto commented Jun 29, 2021 • edited Loading

dprotaso commented Jun 29, 2021

evankanderson commented Jun 29, 2021 • edited Loading

MontyCarter commented Jun 30, 2021

evankanderson commented Jul 6, 2021

dprotaso commented Jul 6, 2021 • edited Loading

dprotaso commented Jul 6, 2021

dprotaso commented Jul 14, 2021

github-actions bot commented Oct 18, 2021

upodroid commented Oct 20, 2021

github-actions bot commented Jan 19, 2022

dprotaso commented Feb 18, 2022

dprotaso commented Jun 28, 2021 •

edited

Loading

skonto commented Jun 29, 2021 •

edited

Loading

evankanderson commented Jun 29, 2021 •

edited

Loading

dprotaso commented Jul 6, 2021 •

edited

Loading