Enhancement proposal for monitoring Windows Nodes

Enhancement proposal for enabling monitoring on Windows nodes created by Windows Machine Config Operator(WMCO).
openshift · Mar 10, 2021 · fdda97f · fdda97f
1 parent 9eb5f69
commit fdda97f
Showing 1 changed file with 239 additions and 0 deletions.
diff --git a/enhancements/windows-containers/monitoring-windows-nodes.md b/enhancements/windows-containers/monitoring-windows-nodes.md
@@ -0,0 +1,239 @@
+---
+title: monitoring-windows-nodes
+authors:
+  - "@VaishnaviHire"
+  - "@PratikMahajan"
+reviewers:
+  - "@@openshift/openshift-team-windows-containers"
+  - "@simonpasquier"
+  - "@spadgett"
+approvers:
+  - "@aravindhp"
+  - "@simonpasquier"
+creation-date: 2021-02-08
+last-updated: 2021-03-04
+status: implementable
+---
+
+# Monitoring Windows Nodes
+
+## Release Signoff Checklist
+
+- [x] Enhancement is `implementable`
+- [x] Design details are appropriately documented from clear requirements
+- [x] Test plan is defined
+- [ ] Operational readiness criteria is defined
+- [x] Graduation criteria for dev preview, tech preview, GA
+- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
+
+## Summary
+
+The intent of this enhancement is to enable performance monitoring on Windows
+nodes created by Windows Machine Config Operator(WMCO) in OpenShift cluster.
+
+## Motivation
+
+Monitoring is critical to identify issues with nodes, containers running on the
+nodes. The main motivation behind this enhancement is to enable monitoring on
+the Windows nodes.
+
+### Goals
+
+As part of this enhancement, we plan to do the following:
+* Run [windows_exporter](https://github.com/prometheus-community/windows_exporter)
+  as a service on Windows nodes
+* Upgrade the windows_exporter on the Windows Nodes
+* Leverage cluster-monitoring operator that sets up Prometheus, Alertmanager
+  and other components
+
+### Non-Goals
+
+As part of this enhancement, we do not plan to do the following:
+* Integrating windows_exporter with cluster monitoring operator
+* Ship Grafana dashboards for Windows Nodes
+
+## Proposal
+
+The main idea here is to run windows_exporter as a Windows Service and let
+Prometheus instance which was provisioned as part of OpenShift install to
+collect data from windows_exporter. The metrics exposed by the windows_exporter
+will be used to display console graphs for Windows nodes.
+
+## Justification
+
+Unlike [Node exporter](https://github.com/prometheus/node_exporter) on Linux
+nodes, windows_exporter cannot run as a container on the Windows nodes since
+Windows container images contains a Windows Kernel and Red Hat has a policy not
+to ship third party kernels for support reasons. Please refer to the [WMCO
+ enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/windows-containers/windows-machine-config-operator.md#justification)
+for more details.
+
+### Risks and Mitigations
+
+* Running `windows_exporter` as a Windows Service, posses a risk of having
+  inadequate resources to run the service if the Windows node is overwhelmed
+  with workload containers. This can be mitigated by leveraging [priority
+   classes](https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities) for
+  Windows processes. This is similar to what is being done for other [Windows
+  services](https://issues.redhat.com/browse/WINC-534).
+
+* One of the risks with the current approach is renaming Windows metrics to
+  display pod graphs. The pod metrics for Linux come from cAdvisor. However, we
+  do not get same metrics from cAdvisor for Windows nodes. This becomes a
+  hindrance to display pod graphs by creating custom recording rules to use same
+  console queries as Linux workloads. To mitigate this, use metrics exposed by
+  the windows_exporter to display pod graphs as mentioned in the [Future
+   Plans](#future-plans) is required. This also requires changes in console
+  queries that support OS specific metrics.
+
+## Design Details
+
+As we are not able to run windows_exporter as a [container](#justification)
+on the Windows Node, to capture data from windows_exporter, WMCO creates a
+`windows-machine-config-operator-metrics` Service without selectors and
+manually defines Endpoints object for that service. The Endpoints object has
+entries for the endpoints `<internal-ip>:9182/metrics`, exposed by
+windows_exporter for every Windows node. Once the Service and Endpoints
+object is created, WMCO ensures that a Service Monitor for `windows-machine-config-operator-metrics`
+Service is running so that the Prometheus operator can discover the targets
+created above to scrape Windows metrics. Following design details reflect the
+current approach and future plans to enable monitoring support for Windows.
+
+### Current State
+
+To enable basic monitoring support for Windows node, WMCO has done the
+following:
+
+* Build and add windows_exporter binary to WMCO payload.
+* Install windows_exporter on the Windows nodes and ensuring
+  that it runs as a Windows service.
+* Add `openshift.io/cluster-monitoring=true` label to the
+ `openshift-windows-machine-config-operator` namespace so that cluster
+  monitoring stack will pick up the Service Monitor created by WMCO.
+* Add privileges to WMCO to create Services, Endpoints, Service Monitor in
+  the `openshift-windows-machine-config-operator` namespace.
+* Create a Service and Endpoints object in `openshift-windows-machine-config
+-operator` namespace that point to windows_exporter endpoint. WMCO uses default
+  values to define metrics endpoint, `<internal-ip>:9182/metrics`,
+  exposed by windows_exporter for every Windows node. The Endpoints object
+  created in the namespace consist of subsets of endpoints from all the
+  Windows nodes.
+* Create a Service Monitor in `openshift-windows-machine-config-operator`
+  namespace for Service created above.
+
+To display node graphs WMCO has done the following:
+
+* Add custom Prometheus rules in `openshift-windows-machine-config-operator`
+  namespace. The custom recording rules are created using Windows metrics
+  exposed by the windows_exporter and have the same names as Linux
+  recording rules. This is to make use of same console queries as Linux.
+* Note that WMCO is unable to display pod graphs for the Windows Nodes
+  with the current implementation. See [Risks and Mitigations](#risks-and-mitigations)
+  for details.
+
+### Future Plans
+
+#### Displaying Console Graphs
+
+* As we move forward, our plan to display monitoring graphs is to create a
+ [common interface](https://issues.redhat.com/browse/WINC-530) for Windows
+  and Linux recording rules. Monitoring team will define recording rules for the
+  metrics that have different `metric labels` for Linux and Windows. The
+  differences in `metric labels` for metrics used for Node graphs and pod graphs
+  are displayed in the tables below.
+  The Windows team will align the Windows recording rules with these new
+  recording rules. The recording rules for Windows will be managed by
+  WMCO. This set of common recording rules for monitoring will return results
+  for both Linux and Windows nodes for a single query.The console queries
+  currently use some raw metrics such as `node_filesystem_size_bytes`,
+  `node_filesystem_free_bytes` etc. They would need to be updated to include
+  the new recording rules in place of using raw metrics. This will ensure that
+  we have a consistent user experience for monitoring across Linux and Windows.
+* In the cases where `metric labels` are equivalent, we plan to relabel the
+  Windows metrics to align with the Linux metrics.
+
+##### Node Metrics
+
+| Node Exporter                  | Windows Exporter                 | Label Difference                                                         |
+|--------------------------------|----------------------------------|--------------------------------------------------------------------------|
+| node_memory_MemTotal_bytes     | windows_cs_physical_memory_bytes | -                                                                        |  
+| node_memory_MemAvailable_bytes | windows_memory_available_bytes   | -                                                                        |  
+| node_filesystem_size_bytes     | windows_logical_disk_size_bytes  | Missing Labels: (device, mountpoint, fstype) Additional label : (volume) |
+| node_filesystem_free_bytes     | windows_logical_disk_free_bytes  | Missing Label: device, mountpoint, fstype) Additional label : (volume)   |
+| node_cpu_seconds_total         | windows_cpu_time_total           | Missing Label : cpu Additional Label: core                               |
+
+##### Pod Metrics
+
+| Kubelet metrics                        | Windows Kubelet      | Windows Exporter                                         | Label Difference                                                                                            |
+|----------------------------------------|----------------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
+| kubelet_running_pods                   | kubelet_running_pods | windows_container_available                              | -                                                                                                           |
+| container_memory_working_set_bytes     | -                    | windows_container_memory_usage_private_working_set_bytes | Missing Label: (image) Additional Label: (container_id) which is equivalent of (id) for Linux               |
+| container_cpu_usage_seconds_total      | -                    | windows_container_cpu_usage_seconds_total                | Missing Label: (image, metrics_path) Additional Label: (container_id) which is equivalent of (id) for Linux |
+| container_fs_usage_bytes               | -                    | -                                                        |                                                                                                             |
+| container_network_receive_bytes_total  | -                    | windows_container_network_receive_bytes_total            | Missing Label: (image, metrics_path) Additional Label: (container_id) which is equivalent of (id) for Linux |
+| container_network_transmit_bytes_total | -                    | windows_container_network_transmit_bytes_total           | Missing Label: (image, metrics_path) Additional Label: (container_id) which is equivalent of (id) for Linux |
+
+#### Moving towards EndpointSlices
+
+* Since the metrics Endpoints object is managed by WMCO, we plan to replace
+  Endpoints object with [EndpointSlices](https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#motivation)
+  to improve performance. This can be done once the `prometheus-operator` has
+  [support](https://github.com/prometheus-operator/prometheus-operator/issues/3862)
+  for EndpointSlices object.
+
+#### Securing windows_exporter endpoint
+
+* Since the windows-exporter is not running as a [pod](#justification), the
+  endpoint is not secure. The reason for this is when running inside a pod, we
+  can use CA signer for providing TLS cert/key to the service for
+  authentication. We plan to leverage windows_exporter's support for `https`
+  configuration. WMCO will be responsible for adding [web config](https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md)
+  for TLS. This will ensure that the metrics Endpoint will be able to
+  authenticate the requests.
+
+#### Telemetry Rules
+
+* We plan to ensure that for [telemetry rules](https://docs.openshift.com/container-platform/4.7/support/remote_health_monitoring/showing-data-collected-by-remote-health-monitoring.html#showing-data-collected-from-the-cluster_showing-data-collected-by-remote-health-monitoring)
+  also use metrics from Windows. This can be done by renaming the Windows
+  metrics to align with metrics used in telemetry rules. For e.g.
+  `memory_usage_bytes:sum` rule uses `node_memory_MemTotal_bytes` that is
+  defined in the Windows rules. We also need to test if the existing telemetry
+  rules need to be updated similar to console queries, if they have Linux
+  specific queries. For e.g rules with `job=node-exporter` attribute.
+
+### Test Plan
+
+The current tests ensure that WMCO checks if :
+* The operator namespace, `openshift-windows-machine-config-operator`, uses
+  `openshift.io/cluster-monitoring=true` label.
+* Service, endpoints and Service Monitor objects are created as expected.
+* Prometheus is able to collect data from  windows_exporter.
+* Custom Prometheus rules return Windows data.
+
+The test plan for [future implementation](#future-plans)
+will use existing tests to test creation of windows_exporter service and
+metrics Service, Endpoints and Service Monitor objects. WMCO will also be
+responsible for testing Prometheus rules created for Windows. We also
+plan to add tests in console repo, that test the common recording rules and
+ensure that they return results for Windows.
+
+### Graduation Criteria
+
+This enhancement will start as GA
+
+### Upgrade / Downgrade Strategy
+
+* WMCO is responsible for upgrading [windows_exporter](https://github.com/prometheus-community/windows_exporter/tags)
+  binary to the latest release. Downgrades are [not supported](https://github.com/operator-framework/operator-lifecycle-manager/issues/1177)
+  by OLM.
+
+## Implementation History
+
+v1: Initial Proposal
+
+### Drawbacks
+
+Running windows_exporter as a Windows service instead of running as a DaemonSet
+pod makes it hard for the Prometheus to monitor Windows nodes. The
+limitation of not able to run windows_exporter on Windows nodes as a pod is
+because of support reasons as mentioned in the [WMCO_enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/windows-containers/windows-machine-config-operator.md#justification).