3.1.3.2 Publishing system and cluster metrics using Netdata

This method of acquiring system and K8S metric values involves the deployment of one Netdata agent at every K8S cluster node. Netdata is an open source software for collecting metrics, displaying them as charts, but also providing them through a REST API. The default Nebulous application deployment scenario installs Netdata agents along with EMS at application clusters. EPAs will periodically contact the REST API server of each Netdata agent and scrape the required metrics. To enable EPAs scrape the Netdata agents, it is required that the application metric model provides the needed configuration. For each raw metric that will have its values using this method, it is necessary to define a sensor of “netdata” type and provide the corresponding configuration (including the scraping period).

NOTE:
At each Kubernetes cluster node, exactly one instance of EPA and one instance of Netdata agent is deployed, as DaemonSets. At runtime, each EPA queries its collocated Netdata agent. The <NETDATA_IP_ADDRESS> is the node's IP address and it is provided by Kubernetes through the Downward API.

In order to define a raw metric that takes its values from Netdata agents, the netdata type must be entered in the Sensor field in Nebulous GUI. This will instruct EPA to use its K8S Netdata collector plugin for retrieving the values. Under the hood the K8S Netdata collector plugin will build a URL of the form:
http://<NETDATA_IP_ADDRESS>:<PORT>/<PATH>?<QUERY_PARAMS>&format=ssv
and attempt to retrieve the relevant JSON response from there. Following, it will extract the value(s) of the metric of interest (see next) and publish it/them as the raw metric's value(s) in EPA broker. If configured, it will also aggregate multiple values into a single one.

The following screenshot explains how a raw metric collecting values from Netdata can be setup in Nebulous GUI (Metric model editor).

netdata-config-for-node-1-with-comments

If no configuration is provided (i.e. defaults apply) then the Metric of interest can be provided in the Sensor field, after the sensor type netdata.

netdata-simple-with-comments

Configuration

In order to build the URL, the collector plugin will use the provided configuration settings, or the corresponding defaults. If the metric of interest is a Kubernetes-related metric (its name starts with k8s.) the collector plugin can take into consideration the pod name and namespace. The metric of interest must be provided in the configuration.

The configuration comprises a few settings used to guide the collector plugin, while the remaining are used to build the QUERY_PARAMS list of the URL.

The plugin-specific configuration settings, along with their respective defaults, are:

Plugin Setting	Type	Default value	Comments
endpoint	`String`	`/api/v2/data`	The `<PATH>` part of the URL. Only the v2 version has been tested.
port	`Port`	`19999`	The `<PORT>` part of the URL. Allowed values: `1..65535`.
components	`String`	component name	In case of K8S metric of interest, specifies which pod(s) to pick. If left empty it will pick all pods in the namespace. If omitted it will use the name of the component(s) the raw metric applies to.
namespace	`String`	`default`	In case of K8S metric of interest, specifies the pod namespace to use. If left empty it will pick matching pods from all namespaces. If omitted it defaults to `default` namespace.
results-aggregation	`Enum`	no default	Allowed values: `SUM`, `AVERAGE`, `COUNT`, `MIN`, `MAX`, `NONE`. If omitted or is `NONE` individual events will be published for each metric value.
intervalPeriod	`Positive Integer`	`60`	How often to query Netdata API.
intervalUnit	`Enum`	`SECONDS`	The time unit of `intervalPeriod`. Allowed values: `SECONDS`, `MINUTES`, `HOURS`, `DAYS`

If intervalPeriod is omitted, the querying period is taken from the raw metric Output interval and unit fields. If they are not specified either, it is assumed to be 60 seconds.

The settings used to build the Netdata URL, along with their respective defaults, are:

Netdata Setting	Type	Default value	Comments
scope_contexts	`String`	no default	REQUIRED: The metric(s) of interest to extract. Can be a comma-separated list
context	`String`	no default	Can be used instead of `scope_contexts`. Check Netdata documentation for details
dimension	`String`	`*`	The scope_context dimensions to use
after	`Long`	`-1`	Selects the measurements taken the last second (before now)
time_group	`Enum`	`average`	Defines the method of grouping of multiple measurements

The settings in the table above are always added in the URL (either the provided value or the default). Any additional settings provided (not listed in tables above) will also be included in query parameters list.

For a complete list of the supported query parameters, and their semantics, please consult the official Netdata documentation on the topic. You can also check Netdata API.

Kubernetes metrics

Netdata metrics pertaining to Kubernetes are named using the k8s. prefix. For instance, k8s.cgroup.cpu. When a metric can be measured per pod, there will be one measurement (value) for each pod in every Netdata response (e.g. CPU consumed by each pod). The K8S Netdata collector plugin will filter these values in order to retain only those complying to the raw metric specification.

NOTE:
The K8S Netdata collector plugin will attempt to extract metric values from the JSON response section under view.dimensions.ids and view.dimensions.values. If the metric of interest is a K8S metric, the ids represent different pods (running at various namespaces). Plugin will filter ids (i.e. pods) based on the provided component name(s) and namespace.

IMPORTANT:
Pod names may be different than component names (as they appear in the metric model). For instance, using Helm will prepend the deployment name in front of each component name to generate pod names. In this case it is essential to set the components setting with a value including the Helm deployment prefix (or any other naming deviation).

The following table details the outcome of each possible combination of components and namespace configuration settings. For each pod selected, the corresponding measurement (metric value) will be kept for further processing. The rest will be filtered out.

components	namespace	Pods selected
provided	provided	Pod with name included in `components` list, and running in the namespace specified in `namespace` setting
blank	provided	All pods running in the namespace specified in `namespace` setting
omitted	provided	Pods named exactly as one of the metric model components where raw metric applies to, and running in the namespace specified in `namespace` setting
provided	blank	Pod with name included in `components` list, running in any namespace
blank	blank	All pods, running in any namespace
omitted	blank	Pods named exactly as one of the metric model components where raw metric applies to, and running in any namespace
provided	omitted	Pod with name included in `components` list, running in namespace named `default`
blank	omitted	All pods, running in namespace named `default`
omitted	omitted	Pods named exactly as one of the metric model components where raw metric applies to, and running in namespace named `default`

The values retained (each pertaining to one pod), can either:

be aggregated according to result-aggregation value, and then published as the raw metric value (one event), or
immediately published as the raw metric values (one or more events).

In the former case the event conveying the raw metric value will also have the property destination-key set with the respective pod name and namespace.

Examples

The following examples demonstrate how to configure the metric collection from Netdata, for a few typical use cases.

Node metric

Node metrics refer to the system (VM or computer) hosting the Kubernetes node.

The following screenshot of the Nebulous GUI, gives the definition of the current_cpu raw metric that collects the measurements of system.cpu Netdata metric. This metric encompasses several dimensions (e.g. system, user, iowait etc) with an individual measurement each, hence resulting in multiple metric values.

netdata-example1--metric-model

Based on the raw metric definition, the K8S Netdata collector plugin will query Netdata agent using the following settings.

netdata-example1--rest-call

Since, no results aggregation is set in raw metric definition, each dimension measurement will be published as a separate event, in EPA event broker. The values of destination-key property of each event give the name of the respective dimension.

netdata-example1--events

Node metric with aggregation

This is a variation of the previous example where results aggregation is used. In order to have the total system CPU consumption we must redefine the current_cpu raw metric to sum up the individual dimension measurements into a single value.

netdata-example2--metric-model

The REST call to Netdata agent will use the following settings.

netdata-example2--rest-call

In this case the output will be a single event conveying the total (sum) CPU, and destination-key property is not set.

netdata-example2--events

Single pod metric (K8S)

Kubernetes metrics refer to the pods running in a Kubernetes cluster or in the cluster itself. Note that Kubernetes cluster metrics are different than the (corresponding) host metrics.

The following screenshot of the Nebulous GUI, gives the definition of the current_cpu raw metric that collects the measurements of k8s.cgroup.cpu Netdata metric, and retains only those pertaining to the pods corresponding to the component the raw metric applies to, i.e. testm1-server. This metric encompasses different measurements for each testm1-server pod, hence resulting in multiple metric values.

netdata-example3--metric-model

NOTE: Each Netdata agent will report measurements for pods running in the same node. Therefore, pods of the same component type running in different nodes will be reported by different Netdata agents, queries by different EPAs.

The REST call to Netdata agent will use the following settings.

netdata-example3--rest-call

The next screenshot gives a list of all running pods in the cluster. There is only one pod matching the testm1-server component and default namespace.

netdata-example3--pod-list

Since, no results aggregation is set in raw metric definition, each pod measurement will be published as a separate event, in the EPA event broker. The values of destination-key property of each event give the name and namespace of the respective pod. In this example it is destination-key=default,testm1-server-6789487c9b-76pdz.

netdata-example3--events

Multiple pod metrics with aggregation (K8S)

This is a variation of the previous example, where more pods are involved. The raw metric of the previous example would reported an individual event for each pod (i.e. the CPU consumed by each pod). In this case we will define raw metric for collecting the total (sum) CPU consumed by all testm1-server pods.

The following screenshot of the Nebulous GUI, gives the definition of the current_cpu raw metric that collects the measurements of k8s.cgroup.cpu Netdata metric, and retains those pertaining to the testm1-server pods (where raw metric applies to). Since the results-aggregation is set to SUM there will be a single metric value conveying the sum of the individual pod CPU measurements.

netdata-example4--metric-model

The REST call to Netdata agent will use the following settings.

netdata-example4--rest-call

The next screenshot gives a list of all running pods in the cluster. There are two pods matching the testm1-server component and default namespace per worker node.

netdata-example4--pod-list

Since results aggregation is set to SUM in raw metric definition, all pod measurements will be aggregated per node, and a single value will be published in the EPA event broker. In this case the value of destination-key property will be omitted.

netdata-example4--events

Node-wide metric with aggregation (K8S)

This is a variation of the previous examples, where all pods in a cluster node are involved. In this case we will define a raw metric for summing CPU consumed by all node pods.

The following screenshot of the Nebulous GUI, gives the definition of the current_cpu raw metric that collects the measurements of k8s.cgroup.cpu Netdata metric, and includes all pods. Again the results-aggregation is set to SUM, therefore a single metric value event is published, conveying the K8S cluster node CPU.

netdata-example5--metric-model

The REST call to Netdata agent will use the following settings.

netdata-example5--rest-call

Since results aggregation is set to SUM in raw metric definition, all pod measurements will be aggregated per node, and a single value will be published in the EPA event broker. In this case the value of destination-key property will be omitted.

netdata-example5--events

NOTE:
Each EPA at each Kubernetes node will report the node's total CPU, but not a cluster-wide value. An overall cluster value (including all nodes) would require a composite metric that aggregates the EPA raw metric values.

Appendix

Sample list of Netdata metrics. It may vary between devices based on the hardware, architecture, OS, but also the installed software.

Scroll horizontally to view all columns.


app.cpu_context_switches	ipv4.sockstat_udp_mem	k8s_kubelet.kubelet_pleg_relist_interval_microseconds	system.clock_status
app.cpu_utilization	ipv4.sockstat_udp_sockets	k8s_kubelet.kubelet_pleg_relist_latency_microseconds	system.clock_sync_offset
app.disk_logical_io	ipv4.sockstat_udplite_sockets	k8s_kubelet.kubelet_pods_log_filesystem_used_bytes	system.clock_sync_state
app.disk_physical_io	ipv4.udperrors	k8s_kubelet.kubelet_pods_running	system.cpu
app.fds_open	ipv4.udplite	k8s_kubelet.kubelet_runtime_operations	system.ctxt
app.fds_open_limit	ipv4.udplite_errors	k8s_kubelet.kubelet_token_requests	system.entropy
app.mem_page_faults	ipv4.udppackets	k8s_kubelet.rest_client_requests_by_code	system.file_nr_used
app.mem_private_usage	ipv6.bcast	k8s_kubelet.rest_client_requests_by_method	system.file_nr_utilization
app.mem_usage	ipv6.ect	k8s_kubelet.volume_manager_total_volumes	system.forks
app.processes	ipv6.errors	k8s_kubeproxy.http_request_duration	system.idlejitter
app.swap_usage	ipv6.fragsin	k8s_kubeproxy.kubeproxy_sync_proxy_rules	system.interrupts
app.threads	ipv6.fragsout	k8s_kubeproxy.kubeproxy_sync_proxy_rules_latency	system.intr
app.uptime	ipv6.groupmemb	k8s_kubeproxy.kubeproxy_sync_proxy_rules_latency_microseconds	system.io
app.vmem_usage	ipv6.icmp	k8s_kubeproxy.rest_client_requests_by_code	system.ip
disk.avgsz	ipv6.icmpechos	k8s_kubeproxy.rest_client_requests_by_method	system.ipc_semaphore_arrays
disk.await	ipv6.icmperrors	mem.available	system.ipc_semaphores
disk.backlog	ipv6.icmpmldv2	mem.balloon	system.ipv6
disk.busy	ipv6.icmpneighbor	mem.cma	system.load
disk.inodes	ipv6.icmpredir	mem.committed	system.net
disk.io	ipv6.icmprouter	mem.directmaps	system.pgpgio
disk.iotime	ipv6.icmptypes	mem.fragmentation_index_dma	system.processes
disk.mops	ipv6.mcast	mem.fragmentation_index_dma32	system.processes_state
disk.ops	ipv6.mcastpkts	mem.fragmentation_index_normal	system.ram
disk.qops	ipv6.packets	mem.kernel	system.shared_memory_bytes
disk.space	ipv6.sockstat6_frag_sockets	mem.ksm_cow	system.shared_memory_segments
disk.svctm	ipv6.sockstat6_raw_sockets	mem.oom_kill	system.softirqs
disk.util	ipv6.sockstat6_tcp_sockets	mem.pgfaults	system.softnet_stat
disk_ext.avgsz	ipv6.sockstat6_udp_sockets	mem.reclaiming	system.uptime
disk_ext.await	ipv6.sockstat6_udplite_sockets	mem.slab	systemd.service.memory.failcnt
disk_ext.io	ipv6.udperrors	mem.swap	systemd.service.memory.paging.faults
disk_ext.iotime	ipv6.udpliteerrors	mem.swap_cached	systemd.service.memory.paging.io
disk_ext.mops	ipv6.udplitepackets	mem.swapio	systemd.service.memory.ram.usage
disk_ext.ops	ipv6.udppackets	mem.thp	systemd.service.memory.usage
ip.sockstat_sockets	ipvs.net	mem.thp_collapse	systemd.service.memory.writeback
ip.tcp_accept_queue	ipvs.packets	mem.thp_compact	systemd.service.pids.current
ip.tcp_syn_queue	ipvs.sockets	mem.thp_details	user.cpu_context_switches
ip.tcpconnaborts	k8s.cgroup.cpu	mem.thp_faults	user.cpu_utilization
ip.tcperrors	k8s.cgroup.cpu_limit	mem.thp_file	user.disk_logical_io
ip.tcphandshake	k8s.cgroup.mem	mem.thp_split	user.disk_physical_io
ip.tcpmemorypressures	k8s.cgroup.mem_activity	mem.thp_swapout	user.fds_open
ip.tcpofo	k8s.cgroup.mem_failcnt	mem.thp_zero	user.fds_open_limit
ip.tcpopens	k8s.cgroup.mem_usage	mem.writeback	user.mem_page_faults
ip.tcppackets	k8s.cgroup.mem_usage_limit	mem.zswapio	user.mem_private_usage
ip.tcpreorders	k8s.cgroup.mem_utilization	net.carrier	user.mem_usage
ip.tcpsock	k8s.cgroup.net_carrier	net.drops	user.processes
ip.tcpsyncookies	k8s.cgroup.net_drops	net.errors	user.swap_usage
ipv4.bcast	k8s.cgroup.net_errors	net.events	user.threads
ipv4.bcastpkts	k8s.cgroup.net_events	net.fifo	user.uptime
ipv4.ecnpkts	k8s.cgroup.net_fifo	net.mtu	user.vmem_usage
ipv4.errors	k8s.cgroup.net_mtu	net.net	usergroup.cpu_context_switches
ipv4.fragsin	k8s.cgroup.net_net	net.operstate	usergroup.cpu_utilization
ipv4.fragsout	k8s.cgroup.net_operstate	net.packets	usergroup.disk_logical_io
ipv4.icmp	k8s.cgroup.net_packets	netfilter.conntrack_sockets	usergroup.disk_physical_io
ipv4.icmp_errors	k8s.cgroup.pgfaults	netfilter.synproxy_conn_reopened	usergroup.fds_open
ipv4.icmpmsg	k8s.cgroup.pids_current	netfilter.synproxy_cookies	usergroup.fds_open_limit
ipv4.mcast	k8s.cgroup.writeback	netfilter.synproxy_syn_received	usergroup.mem_page_faults
ipv4.mcastpkts	k8s_kubelet.apiserver_audit_requests_rejected	sctp.chunks	usergroup.mem_private_usage
ipv4.packets	k8s_kubelet.apiserver_storage_data_key_generation_failures	sctp.established	usergroup.mem_usage
ipv4.sockstat_frag_mem	k8s_kubelet.apiserver_storage_data_key_generation_latencies	sctp.fragmentation	usergroup.processes
ipv4.sockstat_frag_sockets	k8s_kubelet.apiserver_storage_data_key_generation_latencies_percent	sctp.packet_errors	usergroup.swap_usage
ipv4.sockstat_raw_sockets	k8s_kubelet.apiserver_storage_envelope_transformation_cache_misses	sctp.packets	usergroup.threads
ipv4.sockstat_tcp_mem	k8s_kubelet.kubelet_containers_running	sctp.transitions	usergroup.uptime
ipv4.sockstat_tcp_sockets	k8s_kubelet.kubelet_node_config_error	system.active_processes	usergroup.vmem_usage

Additional reading

[1] Netdata site
[2] Netdata Queries/Lookup
[3] Netdata API

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the Directorate-General for Communications Networks, Content and Technology. Neither the European Union nor the granting authority can be held responsible for them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.1.3.2 Publishing system and cluster metrics using Netdata

Configuration

Kubernetes metrics

Examples

Node metric

Node metric with aggregation

Single pod metric (K8S)

Multiple pod metrics with aggregation (K8S)

Node-wide metric with aggregation (K8S)

Appendix

Additional reading

Clone this wiki locally