-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change metrics naming scheme #776
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,7 +33,7 @@ type Table struct { | |
// NewTable takes a metrics scope and creates a table metrics struct | ||
func NewTable(factory metrics.Factory, tableName string) *Table { | ||
t := storageMetrics.WriteMetrics{} | ||
metrics.Init(&t, factory.Namespace(tableName, nil), nil) | ||
metrics.Init(&t, factory.Namespace("", map[string]string{"table": tableName}), nil) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure about this one, this allows users to aggregate metrics across different c* tables. For attempts and inserts (total c* throughput) it makes sense but allowing users to aggregate latencies across different tables doesn't seem right (workload will be different per table). I guess giving the ability to aggregate doesn't imply people will aggregate and technically people using non-tag based metrics systems could already aggregate in the previous rendition. TLDR; I've convinced myself this is fine There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Glad to hear :D |
||
return &Table{t} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we removing the
-collector
suffix here? Isn't it better to be able to tell apart which binary emitted the metric? The storage metrics won't be grouped by collector name anymore.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just so that the
jaeger
prefix is used for the storage metrics without the component's name.collector
is added later as a namespace to the collector-specific metrics.I think the agreement was to have per-component metrics, instead of per-binary, so that metrics generated by the collector via
standalone-linux
would be consistent with the metrics generated by collector viacollector-linux
:(before)
jaeger_standalone_jaeger_collector_in_queue_latency_bucket{host="caju",le="0.005"} 0
vs
(after, across all binaries)
jaeger_collector_in_queue_latency_bucket{host="caju",le="0.005"} 0
You mean,
jaeger_cassandra_inserts{table="service_names"}
would be used bycollector
,query
andstandalone
? This would potentially create three different metric names for the same thing:jaeger_standalone_cassandra_inserts
,jaeger_collector_cassandra_inserts
,jaeger_query_cassandra_inserts
. I personally prefer to have one single metrics, but I do see the point of having one per component (not binary). Just not sure how we'd tell the components apart when using the standalone.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If knowing the source binary is important, then it could be added as a label instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really surprised Prometheus community has no guidelines around this. Can't imagine any minimally complex system not running into the same exact issues with multiple namespaces and hierarchies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Prometheus scraper can add arbitrary labels, so, this kind of information would arguably be added there, along with some metadata not known to the "microservice", like the Geo, DC, Rack, and so on. Our "binary" name could also be added at scrape time, for cases where it has more value than a consistent set of metrics across all binaries.
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cstatic_config%3E
How about we merge it with the current state, publish a blog post about this change + how to add arbitrary labels, and ask for community feedback? If we hear that having this as part of the metrics we emit, we add it as a label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should discuss on Friday. We are accumulating a set of breaking changes for 1.5: metrics, docker commands, and Cassandra schema.