From 84f0cba3be2c01006cbb121b878e35fef43d259f Mon Sep 17 00:00:00 2001 From: Mikko Karjalainen Date: Mon, 7 Oct 2019 13:59:05 +0100 Subject: [PATCH] Update metrics documentation. --- docs/user-guide/metrics-reference.md | 124 ++++++++++---- docs/user-guide/metrics.md | 234 --------------------------- 2 files changed, 91 insertions(+), 267 deletions(-) diff --git a/docs/user-guide/metrics-reference.md b/docs/user-guide/metrics-reference.md index 25b50403aa..306cf949c2 100644 --- a/docs/user-guide/metrics-reference.md +++ b/docs/user-guide/metrics-reference.md @@ -1,28 +1,53 @@ # Metrics Reference +## Metric Categories + +Styx collects performance metrics from the following functional areas: + + - Server Side Metrics + - HTTP level metrics (`requests` scope) + - TCP connection level metrics (`connections` scope) + - OpenSSL metrics (when `OPENSSL` provider is configured): + - Server metrics (`styx` scope): + + - Origin Metrics + - Request metrics aggregated to back-end service + - Request metrics per origin origin + - Connection pool metrics + - Health-check metrics + + - JVM Metrics + + - Operating System Metrics + +The server side metrics are collected from the styx ingress interface. +The origin metrics are collected on the application side where +the requests are forwarded to the backend services. + + ## Server Side Metrics +The server side metrics scopes are illustrated in a diagram below: + +![Styx Server Metrics](../assets/styx-server-metrics.png "Styx server metrics") + ### HTTP level metrics (`requests` scope) **requests.cancelled.``** -* Counter * Requests cancelled due to an error. **requests.outstanding** -* Counter * Number of requests currently being served (in flight). **requests.response.sent** -* Counter * Total number of responses sent downstream **requests.response.status.``** -* Counter * Total number or responses for each status code class (1xx, 2xx, ...) * Total number of responses for each error status code (code >= 400) * Total number of unrecognised status codes (`` is `unrecognised`) @@ -30,18 +55,15 @@ **requests.received** -* Counter * Total number of requests received **requests.error-rate.500** -* Meter * The rate of 500 Internal Server Error * This metric combines statuses from origins with statuses from Styx-generated responses. **requests.latency** -* Timer * Request latency, measured on Styx server interface. * Measured as a time to last byte written back to downstream. * Timer starts when request arrives, timer stops when the response @@ -52,65 +74,58 @@ **connections.eventloop.``.registered-channel-count** -* Counter * Number of TCP connections registered against the Styx server IO thread, where `` is the IO thread name. **connections.total-connections** -* Counter * Total number of TCP connections active on Styx server side. * Does not count client side TCP connections. **connections.eventloop.``.channels** -* Histogram * Measures the distribution of number of channels for a named IO thread. There is a counter for each thread. **connections.bytes-received** -* Counter * Total number of bytes received. **connections.bytes-sent** -* Counter * Total number of bytes sent. +**connections.idleClosed** + +* Number of server side connections closed due to idleness. + ### Styx Server metrics (`styx` scope) **styx.exception.``** -* Count * Number of exceptions, for each `` exception name. **styx.server.http.requests** -* Count * Number of requests received from http connector (port). **styx.server.http.responses.``** -* Count * Number of responses sent out via http connector. **styx.server.https.requests** -* Count * Number of requests received from https connector (port). **styx.server.https.responses.``** -* Count * Number of responses sent out via https connector. **styx.version.buildnumber** -* Gauge * Styx version number. @@ -123,25 +138,28 @@ TBD: connections.openssl.session.acceptRenegotiate connections.openssl.session.cacheFull connections.openssl.session.cbHits + connections.openssl.session.hits connections.openssl.session.misses connections.openssl.session.number + connections.openssl.session.timeouts +## Origin Side Metrics +The origin side metrics scopes are illustrated in a diagram below: + +![Styx Client Metrics](../assets/styx-origin-metrics.png "Styx client metrics") -## Client Side Metrics ### Per Back-End Request Metrics **origins.``.requests.cancelled** **origins.``.``.requests.cancelled** -* Count * Number of requests cancelled due to an error. **origins.``.requests.success-rate** **origins.``.``.requests.success-rate** -* Meter * Rate of successful requests to the origin. * A request is considered a success when it returns a non-5xx class status code. @@ -149,14 +167,12 @@ TBD: **origins.``.requests.error-rate** **origins.``.``.requests.error-rate** -* Meter * Number of failed requests to the origin. * A request is considered a failure when origin responds with a 5xx class status code. **origins.``.requests.response.status.``** **origins.``.``.requests.response.status.``** -* Meter * Number of responses from origin with a status codee of ``. * Unrecognised status codes are collapsed to a value of -1. A status code is unrecognised when `code < 100` or `code >= 600`. @@ -165,13 +181,11 @@ TBD: **origins.``.requests.response.status.5xx** **origins.``.``.requests.response.status.5xx** -* Meter * A rate of 5xx responses from an origin. **origins.``.requests.latency** **origins.``.``.requests.latency** -* Timer * A latency distribution of requests to origin. * Measured as time to last byte. * Timer started when request is sent, and stopped when the last content @@ -179,48 +193,92 @@ TBD: **origins.``.``.status** -* Gauge * Current [origin state](configure-health-checks.md), as follows: * 1 - ACTIVE, * 0 - INACTIVE * -1 - DISABLED +**origins.``.healthcheck.failure** + +* Number of health check failure rate + ### Connection pool metrics **origins.``.``.connectionspool.available-connections** -* Gauge * Number of primed TCP connections readily available in the pool. **origins.``.``.connectionspool.busy-connections** -* Gauge * Number of connections borrowed at the moment. **origins.``.``.connectionspool.connection-attempts** -* Gauge * Number of TCP connection establishment attempts. **origins.``.``.connectionspool.connection-failures** -* Gauge * Number of failed TCP connection attempts. **origins.``.``.connectionspool.connections-closed** -* Gauge * Number of TCP connection closures. * Counts the connections closed by *Styx*, not an origin. **origins.``.``.connectionspool.connections-terminated** -* Gauge * Number of times TCP connection has terminated, either because it was closed by styx, or by an origin, or otherwise disconnected. **origins.``.``.connectionspool.pending-connections** -* Gauge * Size of the [pending connections queue](configure-connection-pooling.md) at the moment. + +**origins.``.``.connectionspool.connections-in-establishment** + +* Number of connections performing a TCP handshake or an SSL/TLS handshake procedure. + +## Operating System Metrics + +Styx also measures metrics from the underlying operating system: + + os.process.cpu.load + os.process.cpu.time + os.system.cpu.load + os.memory.physical.free + os.memory.physical.total + os.memory.virtual.committed + os.swapSpace.free + os.swapSpace.total + +These ones are only available on a Unix-based system: + + os.fileDescriptors.max + os.fileDescriptors.open + +## Plugin Metrics + +Custom extension plugins expose their metrics under `styx.plugins.` +hierarchy. The `name` is a plugin name as it is configured in the +`plugins` section. Consider the following: + +``` + plugins: + all: + guidFixer: + factory: + ... factories ... + config: + ... config ... +``` + +All metrics from this plugin would go under `styx.plugins.guidFixer` prefix. + + +## Undocumented or unstable metrics + + +Following metrics are subject to change their names: + + origins.response.status. diff --git a/docs/user-guide/metrics.md b/docs/user-guide/metrics.md index acf4d2cd3b..c3d088d89c 100644 --- a/docs/user-guide/metrics.md +++ b/docs/user-guide/metrics.md @@ -33,240 +33,6 @@ Examples where `term` is the string you want to filter for: `http:///admin/metrics/?filter=` -## Metrics Grouping - -Styx metrics are roughly grouped into *server* and *client* metrics. -The server metrics are measured at the server port where the HTTP traffic -comes in. The client metrics are measured on the application side where -the requests are forwarded to the backend services. - -### Server Side Metrics - - - HTTP level metrics (`requests` scope) - -``` - requests.cancelled.x - requests.outstanding - requests.response.sent - requests.response.status.1xx - requests.response.status.2xx - requests.response.status.3xx - requests.response.status.4xx - requests.response.status.5xx - requests.response.status.= 400> - requests.response.status.unrecognised - - requests.received - requests.error-rate.500 - requests.latency -``` - - - TCP connection level metrics (`connections` scope) - -``` - connections.eventloop..registered-channel-count - connections.total-connections - connections.eventloop..channels -histogram - connections.bytes-received - connections.bytes-sent -``` - -Following metrics are only available when `OPENSSL` provider is used: - -``` - connections.openssl.session.accept - connections.openssl.session.acceptGood - connections.openssl.session.acceptRenegotiate - connections.openssl.session.cacheFull - connections.openssl.session.cbHits - connections.openssl.session.misses - connections.openssl.session.number - connections.openssl.session.timeouts -``` - - - Server metrics (`styx` scope) - -``` - styx.exception. - styx.server.http.requests - styx.server.http.responses. - styx.server.https.requests - styx.server.https.responses. - styx.version.buildnumber -``` - -The server side metrics scopes are illustrated in a diagram below: - -![Styx Server Metrics](../assets/styx-server-metrics.png "Styx server metrics") - -### Client Side Metrics - - - Request metrics aggregated to back-end service - -``` - origins..requests.cancelled - origins..requests.success-rate - origins..requests.error-rate - origins..requests.response.status. - origins..requests.response.status.5xx - origins..requests.response.status.-1 - origins..requests.latency -``` - - - Request metrics per origin origin - -``` - origins...requests.cancelled - origins...requests.success-rate - origins...requests.error-rate - origins...requests.response.status. - origins...requests.response.status.5xx - origins...requests.response.status.-1 - origins...requests.latency -``` - - - Connection pool metrics - -``` - origins...connectionspool.available-connections - origins...connectionspool.busy-connections - origins...connectionspool.connection-attempts - origins...connectionspool.connection-failures - origins...connectionspool.connections-closed - origins...connectionspool.connections-terminated - origins...connectionspool.pending-connections - origins...status -``` - - - Health-check metrics -``` - origins..healthcheck.failure -``` - -The client side metrics scopes are illustrated in a diagram below: - -![Styx Client Metrics](../assets/styx-origin-metrics.png "Styx client metrics") - - - -### JVM Metrics - -Styx also measures metrics from the underlying JVM: - - jvm.bufferpool.direct.capacity - jvm.bufferpool.direct.count - jvm.bufferpool.direct.used - jvm.bufferpool.mapped.capacity - jvm.bufferpool.mapped.count - jvm.bufferpool.mapped.used - jvm.gc.PS-MarkSweep.count - jvm.gc.PS-MarkSweep.time - jvm.gc.PS-Scavenge.count - jvm.gc.PS-Scavenge.time - jvm.memory.heap.committed - jvm.memory.heap.init - jvm.memory.heap.max - jvm.memory.heap.usage - jvm.memory.heap.used - jvm.memory.non-heap.committed - jvm.memory.non-heap.init - jvm.memory.non-heap.max - jvm.memory.non-heap.usage - jvm.memory.non-heap.used - jvm.memory.pools.Code-Cache.committed - jvm.memory.pools.Code-Cache.init - jvm.memory.pools.Code-Cache.max - jvm.memory.pools.Code-Cache.usage - jvm.memory.pools.Code-Cache.used - jvm.memory.pools.Compressed-Class-Space.committed - jvm.memory.pools.Compressed-Class-Space.init - jvm.memory.pools.Compressed-Class-Space.max - jvm.memory.pools.Compressed-Class-Space.usage - jvm.memory.pools.Compressed-Class-Space.used - jvm.memory.pools.Metaspace.committed - jvm.memory.pools.Metaspace.init - jvm.memory.pools.Metaspace.max - jvm.memory.pools.Metaspace.usage - jvm.memory.pools.Metaspace.used - jvm.memory.pools.PS-Eden-Space.committed - jvm.memory.pools.PS-Eden-Space.init - jvm.memory.pools.PS-Eden-Space.max - jvm.memory.pools.PS-Eden-Space.usage - jvm.memory.pools.PS-Eden-Space.used - jvm.memory.pools.PS-Old-Gen.committed - jvm.memory.pools.PS-Old-Gen.init - jvm.memory.pools.PS-Old-Gen.max - jvm.memory.pools.PS-Old-Gen.usage - jvm.memory.pools.PS-Old-Gen.used - jvm.memory.pools.PS-Survivor-Space.committed - jvm.memory.pools.PS-Survivor-Space.init - jvm.memory.pools.PS-Survivor-Space.max - jvm.memory.pools.PS-Survivor-Space.usage - jvm.memory.pools.PS-Survivor-Space.used - jvm.memory.total.committed - jvm.memory.total.init - jvm.memory.total.max - jvm.memory.total.used - jvm.thread.blocked.count - jvm.thread.count - jvm.thread.daemon.count - jvm.thread.deadlock.count - jvm.thread.deadlocks - jvm.thread.new.count - jvm.thread.runnable.count - jvm.thread.terminated.count - jvm.thread.timed_waiting.count - jvm.thread.waiting.count - jvm.uptime - jvm.uptime.formatted - - -### Operating System Metrics - -Styx also measures metrics from the underlying operating system: - - os.process.cpu.load - os.process.cpu.time - os.system.cpu.load - os.memory.physical.free - os.memory.physical.total - os.memory.virtual.committed - os.swapSpace.free - os.swapSpace.total - -These ones are only available on a Unix-based system: - - os.fileDescriptors.max - os.fileDescriptors.open - - -### Undocumented or unstable metrics - - -Following metrics are subject to change their names: - - origins.response.status. - - - -### Plugin Metrics - -Custom extension plugins expose their metrics under `styx.plugins.` -hierarchy. The `name` is a plugin name as it is configured in the -`plugins` section. Consider the following: - -``` - plugins: - all: - guidFixer: - factory: - ... factories ... - config: - ... config ... -``` - -All metrics from this plugin would go under `styx.plugins.guidFixer` prefix. - # Metrics Reporter Configuration