-
Notifications
You must be signed in to change notification settings - Fork 86
Add metrics and latency dashboard per service and namespace #130
Conversation
Signed-off-by: Ross Kukulinski <ross@kukulinski.com>
Some of the graph's legends display "namespace/service" instead of the actual namespace and service. The Upstream RPS graph is an example. I did some digging and it looks like the metrics for I am actually blanking on how the envoy metrics end up with |
@alexbrand I added some regex in the statsd-exporter config (e.g. https://github.com/heptio/gimbal/pull/130/files#diff-5d97ec68ef0a6b5e22a5e67fa697e23bR18) that splits out to the different labels. I don't know what to do about the |
Oh, and there's corresponding regex in the Prometheus config: https://github.com/heptio/gimbal/pull/130/files#diff-63627bfdd1800e3898caa1eb81a4dd58R301 |
Aaaah that makes more sense now. I was looking at the existing statsd-exporter config to try to figure it out. We'll have to be careful if we ever change the envoy cluster naming scheme in contour, as it will affect these stats. I am also unsure about what we can do for the clusters that are statically configured in envoy. |
Signed-off-by: Ross Kukulinski <ross@kukulinski.com>
@alexbrand I realized that I had been overzealous with my grafana dashboard variables. I had added a custome "All" selector of |
@rosskukulinski Are your Namespace and Service filters getting populated with values? I found that I need to update the Variable query to get it to work. For example, I had to update the Namespace variable's query from |
The reason I was not getting anything in the dropdown was because I had not sent an initial request to the backend. Once I did, the metric that drives the Grafana Variables became available, and the Namespace & Service filters got populated as expected. This LGTM. |
This pull request improves the default envoy-metrics Grafana dashboard to provide visualizations of the following:
To support the additional breakdown by service and namespace, this PR modifies the prometheus-statsd exporter running in Envoy AND the Promtheus pods-job to split the
cluster_name
field into it's subcomponents:namespace
,service
, andport
.Note: the
service
field here is actually thebackend-name
+servicename
from the discovery system. In future work, we should look to find an efficient way to isolate the backend-name from servicename so we can provide latency/RPS metrics per-backend cluster.