Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress and terminating gateway stats are misnamed since 1.9 #9887

Closed
chrisboulton opened this issue Mar 17, 2021 · 2 comments · Fixed by #10404
Closed

Ingress and terminating gateway stats are misnamed since 1.9 #9887

chrisboulton opened this issue Mar 17, 2021 · 2 comments · Fixed by #10404
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected

Comments

@chrisboulton
Copy link
Contributor

chrisboulton commented Mar 17, 2021

Overview of the Issue

As of Consul 1.9.0, Prometheus metrics for ingress (and possibly other gateway types based on statsPrefix usage in agent/xds/listeners.go) are misnamed, and include the upstream name/port in the metric name, instead of as a including that in the envoy_http_conn_manager_prefix label. This makes the metrics difficult to use on standardised Envoy dashboards.

Note, I think this was introduced in #9207:

connect: Update Envoy metrics names and labels for proxy listeners so that attributes like datacenter and namespace can be extracted. [GH-9207]

The specific change that cased this was the transition from statPrefix ending with an underscore (_) to a period (.) - fe72885#diff-3f5a22a74e450c254308e27dccb905bdce627f0a29ec602ca66df792d1dbf5ceL1010. This means the built in Envoy stats formatters/matchers don't kick in and extract the correct part for the label. While updated matchers handled this for normal connect proxy upstreams (https://github.com/hashicorp/consul/pull/9207/files#diff-3d65fe35c13bd0d74e9cabcee963eea26abcafb433b75454c7bfd697f1f451f4R367), they don't account for the naming conventions of gateways.

Right now I'm not sure if this is an intended change, or just a side effect of the metric tag updates in that pull request.

The Envoy metrics we're talking about here are those such as: http.ingress_upstream.14145.downstream_cx_length_ms

In 1.8.x:

# envoy stat:  http.ingress_upstream_14146_http.no_route: 0
envoy_http_no_route{
  local_cluster="ingress-service",
  envoy_http_conn_manager_prefix="ingress_upstream_14146_http"
} 0

In 1.9.x:

# envoy stat: http.ingress_upstream.14146.no_route: 0
envoy_http_14146_no_route{
  local_cluster="ingress-service",
  consul_source_service="ingress-service",
  consul_source_namespace="default",
  consul_source_datacenter="dc1",
  envoy_http_conn_manager_prefix="ingress_upstream"
} 0

Reproduction Steps

Grab yourself a Consul, and get it up and running:

./consul agent -data-dir /tmp/consul-tmp -dev

Create some a dummy service and ingress gateway:

echo '{
    "Kind": "service-defaults",
    "Name": "svc1",
    "Protocol": "http"
}'| ./consul config write -

echo '{
    "Kind": "ingress-gateway",
    "Name": "ingress-service",
    "TLS": {
        "Enabled": false
    },
    "Listeners": [
        {
            "Port": 14146,
            "Protocol": "http",
            "Services": [
                {
                    "Name": "svc1",
                    "Hosts": [
                        "*"
                    ]
                }
            ]
        }
    ]
}'| ./consul config write -

Start an ingress gateway:

$  ./consul connect envoy -gateway=ingress -register -service ingress-service -address '127.0.0.1:8888'

Observe Envoy + Prometheus stats:

$ curl -s localhost:19000/stats | grep ^http\.ingress_upstream | head -n 2
http.ingress_upstream.14146.downstream_cx_active: 0
http.ingress_upstream.14146.downstream_cx_delayed_close_timeout: 0
@dnephin dnephin added theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies labels Mar 17, 2021
@mikemorris
Copy link
Contributor

@chrisboulton Could you add a link to the standardized Envoy dashboards you're referring to if they're publicly available?

@chrisboulton
Copy link
Contributor Author

@mikemorris Sorry, I didn't mean to allude that they're public dashboards -- these are just internal dashboards we've built that allow for target service, downstream service, and upstream service selection & visualisation. They work great for everything in the mesh, with the exception of the ingress gateways because of the non-standard/conforming metric names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants