Metric names are very unusual and inconvenient. #732

Monnoroch · 2018-03-08T17:33:34Z

Metrics exported from the "production" images (not all-in-one though) are not "namespaced". If you have a Prometheus that monitors dozens of apps, you can have thousands of metrics. The names of Jaeger metrics (i.e. IndexCreate:attempts) are very not friendly to this situation. Normally I would expect a binary names jaeger-collector to export metrics starting with jaeger_collector_. This is the case with the all-in-one binary, but not the other ones.

Other less important issues:

It is common to export metrics with underscores, not CamelCase. Also not an issue in all-in-one binary afaik.
The : in metric names is unique to Jaeger.

The text was updated successfully, but these errors were encountered:

yurishkuro · 2018-03-08T22:58:39Z

+1 to add service name as a prefix

the camelcase should be easy to fix - it just refers to table names

The : is used as namespace separator, e.g. cassandra:Read:ReadTraces:latency_err_count. With your first suggestion it would become jaeger_query:cassandra:Read:ReadTraces:latency_err_count, however this is because we are still using hierarchical names (#395) instead of tags for dimensions. I am not a fan of going all the way to tags, e.g.

latency{service="jaeger_query",storage="cassandra",path="read",action="ReadTraces",result="err"}

because the metric name can easily clash with something else that uses different dimensions.

Suggestions are welcome.

Monnoroch · 2018-03-08T23:11:43Z

because the metric name can easily clash with something else that uses different dimensions.

That was my original point: metrics should be "namespaced". I would expect the collector to export jaeger_latency{service="collector",storage="cassandra",path="read",action="ReadTraces",result="err"} and the agent to export jaeger_latency{service="agent",storage="cassandra",path="read",action="ReadTraces",result="err"} (btw I can't find the URL to get agent metrics on 1.2.0, but that's a whole other story).

That way metrics don't clash with other monitored software. For example you can look at metrics exported by official Prometheus products: Prometheus itself, node_exporter and a few other "official" exporters from this page. My suggestion is to follow the standard prometheus metric naming style because most software does and this will allow jaeger to coexist with it very nicely. Also, the all-in-one binary already has nice metrics (except for use of : instead of standard _).

yurishkuro · 2018-03-08T23:26:01Z

These two examples jaeger_latency{service="collector"... and jaeger_latency{service="agent"... are insufficient to dedupe the metrics. The collector can very easily have another jaeger_latency metric with completely different dimensions (e.g. something about http endpoints, nothing to do with storage), which will blow up prometheus client. So we'd instead do something like jaeger_collector:storage:latency{...} and jaeger_collector:http_server:latency{...}

Monnoroch · 2018-03-08T23:28:54Z

That's totally fine, it's easy to ensure metrics are called differently within the same project. Right now the bigger issue is ensuring they're unique between projects to scrape different software with a single prometheus instance. I literally have a metric called IndexCreate:attempts now and this is not very user-friendly.

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro · 2018-03-08T23:49:59Z

I literally have a metric called IndexCreate:attempts now and this is not very user-friendly.

doh, that's just a bug - #733

Monnoroch · 2018-03-08T23:53:07Z

Nice bug fixing latency, thanks! But still, you might want to think about aligning with standard prometheus metric naming conventions.

Signed-off-by: Yuri Shkuro <ys@uber.com>

… (#733) Signed-off-by: Yuri Shkuro <ys@uber.com>

jpkrohling · 2018-03-15T09:29:27Z

I really should learn more about Prometheus, but in the meantime, I asked @simonpasquier to share his thoughts, as he's a Prometheus developer.

I'm copying here his answer to me, hoping that it will bring some light to this issue:

Replacing ':' by '_' and not using CamelCase: I agree with the OP that it is the most common pattern found. This isn't crucial but still nice for consistency.

Taking the "cassandra:Read:ReadTraces:latency_err_count" example, lets look at the different options:
"jaeger_query:cassandra:Read:ReadTraces:latency_err_count" doesn't work well because it can't be aggregated. Parts of the metric name should definitely be labels.
latency{service="jaeger_query",storage="cassandra",path="read",action="ReadTraces",result="err"} isn't correct either. You will probably never want to aggregate latency metrics irrespective of the labels (eg "sum(rate(latency[5m]))"). As Yuri points, you can have latency metrics for Cassandra and HTTP endpoints which it doesn't make sense to combine.
a middle ground would be jaeger_query_cassandra_latency_err_count{path="read",action="ReadTraces"}.But as I understand Jaeger has wrapper around the Prometheus instrumentation library so it may not be easy to change that.

Also regarding these metrics in your blog post:
jaeger:baggage_updates_total{result=”err”,} 0.0
jaeger:baggage_updates_total{result=”ok”,} 0.0

It is usually more convenient to have one counter for the errors and another one for all the events including errors (see [1]):

jaeger:baggage_updates_error_total 0.0
jaeger:baggage_updates_total 0.0

[1] https://docs.google.com/presentation/d/1aRtot7Zv_7MGS2nB96MLtzVRICEVFQCd5H_Y3TzR_-4/edit?usp=sharing

jpkrohling · 2018-04-16T16:05:31Z

@yurishkuro , could you please assign this to me?

svenwltr · 2018-05-28T13:26:42Z

Talking about aggregation. It is also very hard to use the jaeger_collector:jaeger:spans_by_svc:my_service metrics, because it is not easy to aggregate them or just display all (or a subset) of them in Grafana. IMO the metrics should be named something like jaeger_collector:jaeger:spans and the services should just have a different label (eg service).

Not sure if this belongs to an extra issue.

jpkrohling · 2018-05-28T13:37:10Z

@svenwltr, could you please check the Gist with the new metric names as per #776? Is the new scheme acceptable?

svenwltr · 2018-05-28T13:39:57Z

Yes it is. This is exactly what I meant. Thank you!

yurishkuro · 2018-05-28T17:52:56Z

can this be closed post 1.5 release, or is there something remaining?

jpkrohling · 2018-06-28T08:44:21Z

I'm closing this one, as there's been no response since end of May. If you believe this issue is still happening, feel free to reopen.

yurishkuro pushed a commit that referenced this issue Mar 8, 2018

Apply namespace when creating Prometheus metrics factory (fix for #732)

c7b7212

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro mentioned this issue Mar 8, 2018

Apply namespace when creating Prometheus metrics factory (fix for #732) #733

Merged

ghost assigned yurishkuro Mar 8, 2018

ghost added the review label Mar 8, 2018

yurishkuro pushed a commit that referenced this issue Mar 9, 2018

Apply namespace when creating Prometheus metrics factory (fix for #732)

abb5901

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro closed this as completed in #733 Mar 9, 2018

yurishkuro added a commit that referenced this issue Mar 9, 2018

Apply namespace when creating Prometheus metrics factory (fix for #732)…

6fce4f9

… (#733) Signed-off-by: Yuri Shkuro <ys@uber.com>

ghost removed the review label Mar 9, 2018

yurishkuro reopened this Mar 9, 2018

yurishkuro mentioned this issue Mar 12, 2018

Unify metric name and format in client libraries #572

Closed

yurishkuro assigned jpkrohling and unassigned yurishkuro Apr 16, 2018

yurishkuro mentioned this issue Apr 23, 2018

Closes #298 - Metrics are now consistent with Go jaegertracing/jaeger-client-java#335

Merged

jpkrohling closed this as completed Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric names are very unusual and inconvenient. #732

Metric names are very unusual and inconvenient. #732

Monnoroch commented Mar 8, 2018

yurishkuro commented Mar 8, 2018 •

edited

Loading

Monnoroch commented Mar 8, 2018

yurishkuro commented Mar 8, 2018

Monnoroch commented Mar 8, 2018

yurishkuro commented Mar 8, 2018

Monnoroch commented Mar 8, 2018

jpkrohling commented Mar 15, 2018

jpkrohling commented Apr 16, 2018

svenwltr commented May 28, 2018

jpkrohling commented May 28, 2018

svenwltr commented May 28, 2018

yurishkuro commented May 28, 2018

jpkrohling commented Jun 28, 2018

Metric names are very unusual and inconvenient. #732

Metric names are very unusual and inconvenient. #732

Comments

Monnoroch commented Mar 8, 2018

yurishkuro commented Mar 8, 2018 • edited Loading

Monnoroch commented Mar 8, 2018

yurishkuro commented Mar 8, 2018

Monnoroch commented Mar 8, 2018

yurishkuro commented Mar 8, 2018

Monnoroch commented Mar 8, 2018

jpkrohling commented Mar 15, 2018

jpkrohling commented Apr 16, 2018

svenwltr commented May 28, 2018

jpkrohling commented May 28, 2018

svenwltr commented May 28, 2018

yurishkuro commented May 28, 2018

jpkrohling commented Jun 28, 2018

yurishkuro commented Mar 8, 2018 •

edited

Loading