Skip to content

3.1.3.2 Publishing system and cluster metrics using Netdata

ipatini edited this page Jul 10, 2024 · 19 revisions

This method of acquiring system and K8S metric values involves the deployment of one Netdata agent at every K8S cluster node. Netdata is an open source software for collecting metrics, displaying them as charts, but also providing them through a REST API. The default Nebulous application deployment scenario installs Netdata agents along with EMS at application clusters. EPAs will periodically contact the REST API server of each Netdata agent and scrape the required metrics. To enable EPAs scrape the Netdata agents, it is required that the application metric model provides the needed configuration. For each raw metric that will have its values using this method, it is necessary to define a sensor of “netdata” type and provide the corresponding configuration (including the scraping period).

In order to define a raw metric that takes its values from Netdata agents, the netdata type must be entered in the Sensor field in Nebulous GUI. This will instruct EPA to use its K8S Netdata collector plugin for retrieving the values. Under the hood the K8S Netdata collector plugin will build a URL of the form http://<NODE_IP_ADDRESS>:<PORT>/<PATH>?<QUERY_PARAMS>&format=ssv and attempt to retrieve the relevent JSON response. Following, it will extract the value(s) of the metric of interest (see next) and publish it/them as the raw metric's value(s) in EPA broker. If needed, it will also aggreate multiple values into a single one. In order to build the URL, the collector plugin will use the provided configuration settings, or the corresponding defaults. If the metric of interest is a K8S metric (its names starts with k8s.) the collector plugin can take into consideration the pod name and namespace. The metric(s) of interest must be provided in the configuration.

The configuration comprises a few settings used to guide the collector plugin, while the remaining are used to build the QUERY_PARAMS list of the URL. The plugin-specific configuration settings, along with their respective defaults, are:

Plugin Setting Type Default value Comments
endpoint String /api/v2/data The <PATH> part of the URL. Only the v2 version has been tested.
port Port 19999 The <PORT> part of the URL. Allowed values: 1..65535.
components String component name In case of K8S metric of interest, specifies which pod(s) to pick. If left empty it will pick all pods in the namespace. If omitted it will use the name of component(s) the raw metric applies.
namespace String default In case of K8S metric of interest, specifies the pod namespace to use. If left empty it will pods from all namespaces. If omitted it defaults to default namespace.
results-aggregation Enum no default Allowed values: SUM, AVERAGE, COUNT, MIN, MAX, NONE. If omitted or is NONE individual events will be published for each metric value.
intervalPeriod Positive Integer 60 How often to scrape the prometheus/OpenMetrics endpoint.
intervalUnit Enum SECONDS The time unit of intervalPeriod. Allowed values: SECONDS, MINUTES, HOURS, DAYS

If intervalPeriod is omitted, the scraping period is taken from the metric's Output interval and unit fields. If they are not specified either, it is assumed to be 60 seconds.

The settings used to build the Netdata URL, along with their respective defaults, are:

Netdata Setting Type Default value Comments
scope_contexts String no default REQUIRED: The metric(s) of interest to extract. Can be a comma-separated list
context String no default Can be used instead of scope_contexts. Check Netdata documentation for details
dimension String * The scope_context dimensions to use
after Long -1 Selected the measurement of the last second
time_group Enum average This parameters defines the method of grouping of multiple measurements

The settings in the table above are always added in the URL (either the provided value or the default). Any additional settings (not listed in tables above) will also be included in query parameters list.

For a complete list of the supported query parameters, and their semantics, please consult the official Netdata documentation on the topic.

Kubernetes metrics

Netdata metrics pertaining to Kubernetes (pods, nodes etc) are named using the k8s. prefix. For instance, k8s.cgroup.cpu. When a metric can be measured per pod, there will be a measurement (value) for each pod in every Netdata response (e.g. CPU consumed by each pod). The K8S Netdata collector plugin will filter these values in order to retain only those complying to the raw metric specification.

NOTE:
The K8S Netdata collector plugin will attempt to extract metric values from the response section under view.dimensions.ids and view.dimensions.values. If the metric of interest is a K8S metric, the id's represent different pods (running at various namespaces). Plugin will filter id's (i.e. pods) based on the provided component name(s) and namespace.

IMPORTANT:
Pods names may be different than component names (as they appear in metric model). For instance using Helm will prepend the deployment name in front of each component name to generate pod names. In this case it is essential to set the components setting with a value including the Helm deployment prefix (or any other deviation).

The following table details the outcome of each possible combination of components and namespace configuration settings. For each pod selected, the corresponding measurement (metric value) will be kept for further processing. The rest will be filtered out.

components namespace Pods selected
provided provided Pod with name included in components list, and running in the namespace specified in namespace setting
blank provided All pods running in the namespace specified in namespace setting
omitted provided Pods named exactly as one of the metric model components where raw metric applies to, and running in the namespace specified in namespace setting
provided blank Pod with name included in components list, running in any namespace
blank blank All pods, running in any namespace
omitted blank Pods named exactly as one of the metric model components where raw metric applies to, and running in any namespace
provided omitted Pod with name included in components list, running in namespace named default
blank omitted All pods, running in namespace named default
omitted omitted Pods named exactly as one of the metric model components where raw metric applies to, and running in namespace named default

The values retained (each one pertains to one pod), can either:

  • be aggregated according to result-aggregation value, and then published as a raw metric value, or
  • immediately published as a raw metric value.

In the former case the event conveying the raw metric value will also have the property destination-key with the respective pod name and namespace.


Appendix

Sample list of Netdata metrics. It may vary between devices based on the hardware, architecture, OS, but also the installed software.

..........TBD


Additional reading

[1] Netdata site [2] Netdata Queries/Lookup