Skip to content

Commit

Permalink
update machine-api-usage-telemetry
Browse files Browse the repository at this point in the history
During implementation of this enhancement we have determined that there
will be a need for at least one label on the MachineHealthCheck related
metrics. Although this label has unbounded cardinality, it will only
need to be exported through telemetry as a sum without the need for the
label.
  • Loading branch information
elmiko committed Nov 18, 2020
1 parent 0ae0bb4 commit 9f09377
Showing 1 changed file with 8 additions and 5 deletions.
13 changes: 8 additions & 5 deletions enhancements/machine-api/machine-api-usage-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,11 +259,14 @@ will need to be exposed through telemetry. These metrics are:
* This should be exported through the `cluster:usage:resources:sum` series with
a resource type of `machinehealthchecks.machine.openshift.io`.
* MachineHealthCheck total nodes covered count
* `mapi_machinehealthcheck_nodes_covered` - This metric has no labels.
* `mapi_machinehealthcheck_nodes_covered` - This metric has two labels representing
the name and namespace of the machine health check.
* MachineHealthCheck successful remediations count
* `mapi_machinehealthcheck_remediation_success_total` - This metric has no labels.
* `mapi_machinehealthcheck_remediation_success_total` - This metric has two labels
representing the name and namespace of the machine health check.
* MachineHealthCheck short circuit state
* `mapi_machinehealthcheck_short_circuit` - This metric has no labels.
* `mapi_machinehealthcheck_short_circuit` - This metric has two labels representing
the name and namespace of the machine health check.

**Metric series to be exported**

Expand All @@ -275,8 +278,8 @@ will need to be exposed through telemetry. These metrics are:
with no labels.
* Total MachineAutoscaler resource count, using `cluster:usage:resources:sum{resource="machineautoscalers.autoscaling.openshift.io"}`.
* Total MachineHealthCheck resource count, using `cluster:usage:resources:sum{resource="machinehealthchecks.machine.openshift.io"}`.
* Total nodes covered by MachineHealthChecks count, using `mapi_machinehealthcheck_nodes_covered` with no labels.
* Total remediations completed by MachineHealthChecks count, using `mapi_machinehealthcheck_remediation_success_total` with no labels.
* Total nodes covered by MachineHealthChecks count, using a sum of all `mapi_machinehealthcheck_nodes_covered` with no labels on the series.
* Total remediations completed by MachineHealthChecks count, using a sum of all `mapi_machinehealthcheck_remediation_success_total` with no labels on the series.

In addition to the metrics defined above, the alerts generated by the Machine
API components will be used to augment this data. The listings below are
Expand Down

0 comments on commit 9f09377

Please sign in to comment.