Skip to content

Commit

Permalink
Add pct calculated fields for Pod and container CPU and memory usages (
Browse files Browse the repository at this point in the history
…#6158)

This PR adds the following set of calculated metrics for Pods and containers:

- `kubernetes.container.cpu.usage.node.pct`
- `kubernetes.container.cpu.usage.limit.pct`
- `kubernetes.container.memory.usage.node.pct`
- `kubernetes.container.memory.usage.limit.pct`
- `kubernetes.pod.cpu.usage.nanocores`
- `kubernetes.pod.cpu.usage.node.pct`
- `kubernetes.pod.cpu.usage.limit.pct`
- `kubernetes.pod.memory.usage.bytes`
- `kubernetes.pod.memory.usage.node.pct`
- `kubernetes.pod.memory.usage.limit.pct`

As the source of data to calculate these values comes from different places (kubelet & kube-state-metrics) I added a mechanism to share some performance metrics through an in-memory cache. For that reason, these new metrics will only be available when `state_*` metricsets are enabled.

Closes #6125
Closes #6124
  • Loading branch information
exekias authored and ruflin committed Jan 28, 2018
1 parent 11835ac commit 8bcddff
Show file tree
Hide file tree
Showing 15 changed files with 438 additions and 9 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ https://github.com/elastic/beats/compare/v6.0.0-beta2...master[Check the HEAD di
- Update the MySQL dashboard to use the Time Series Visual Builder. {pull}5996[5996]
- Add experimental uwsgi module. {pull}6006[6006]
- Docker and Kubernetes modules are now GA, instead of Beta. {pull}6105[6105]
- Add pct calculated fields for Pod and container CPU and memory usages. {pull}6158[6158]

*Packetbeat*

Expand Down
108 changes: 108 additions & 0 deletions metricbeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5110,6 +5110,26 @@ type: long
CPU used nanocores
[float]
=== `kubernetes.container.cpu.usage.node.pct`
type: scaled_float
format: percentage
CPU usage as a percentage of the total node allocatable CPU
[float]
=== `kubernetes.container.cpu.usage.limit.pct`
type: scaled_float
format: percentage
CPU usage as a percentage of the defined limit for the container (or total node allocatable CPU if unlimited)
[float]
== logs fields
Expand Down Expand Up @@ -5198,6 +5218,26 @@ format: bytes
Total memory usage
[float]
=== `kubernetes.container.memory.usage.node.pct`
type: scaled_float
format: percentage
Memory usage as a percentage of the total node allocatable memory
[float]
=== `kubernetes.container.memory.usage.limit.pct`
type: scaled_float
format: percentage
Memory usage as a percentage of the defined limit for the container (or total node allocatable memory if unlimited)
[float]
=== `kubernetes.container.memory.rss.bytes`
Expand Down Expand Up @@ -5709,6 +5749,74 @@ type: long
Tx errors
[float]
== cpu fields
CPU usage metrics
[float]
=== `kubernetes.pod.cpu.usage.nanocores`
type: long
CPU used nanocores
[float]
=== `kubernetes.pod.cpu.usage.node.pct`
type: scaled_float
format: percentage
CPU usage as a percentage of the total node CPU
[float]
=== `kubernetes.pod.cpu.usage.limit.pct`
type: scaled_float
format: percentage
CPU usage as a percentage of the defined limit for the pod containers (or total node CPU if unlimited)
[float]
=== `kubernetes.pod.memory.usage.bytes`
type: long
format: bytes
Total memory usage
[float]
=== `kubernetes.pod.memory.usage.node.pct`
type: scaled_float
format: percentage
Memory usage as a percentage of the total node allocatable memory
[float]
=== `kubernetes.pod.memory.usage.limit.pct`
type: scaled_float
format: percentage
Memory usage as a percentage of the defined limit for the pod containers (or total node allocatable memory if unlimited)
[float]
== container fields
Expand Down
2 changes: 1 addition & 1 deletion metricbeat/module/kubernetes/_meta/test/stats_summary.json
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
"startTime": "2017-04-18T16:47:44Z",
"cpu": {
"time": "2017-04-20T08:06:34Z",
"usageNanoCores": 0,
"usageNanoCores": 11263994,
"usageCoreNanoSeconds": 43959424
},
"memory": {
Expand Down
20 changes: 20 additions & 0 deletions metricbeat/module/kubernetes/container/_meta/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,16 @@
type: long
description: >
CPU used nanocores
- name: node.pct
type: scaled_float
format: percentage
description: >
CPU usage as a percentage of the total node allocatable CPU
- name: limit.pct
type: scaled_float
format: percentage
description: >
CPU usage as a percentage of the defined limit for the container (or total node allocatable CPU if unlimited)
- name: logs
type: group
description: >
Expand Down Expand Up @@ -90,6 +100,16 @@
format: bytes
description: >
Total memory usage
- name: node.pct
type: scaled_float
format: percentage
description: >
Memory usage as a percentage of the total node allocatable memory
- name: limit.pct
type: scaled_float
format: percentage
description: >
Memory usage as a percentage of the defined limit for the container (or total node allocatable memory if unlimited)
- name: rss
type: group
fields:
Expand Down
3 changes: 2 additions & 1 deletion metricbeat/module/kubernetes/container/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"github.com/elastic/beats/metricbeat/helper"
"github.com/elastic/beats/metricbeat/mb"
"github.com/elastic/beats/metricbeat/mb/parse"
"github.com/elastic/beats/metricbeat/module/kubernetes/util"
)

const (
Expand Down Expand Up @@ -55,7 +56,7 @@ func (m *MetricSet) Fetch() ([]common.MapStr, error) {
return nil, err
}

events, err := eventMapping(body)
events, err := eventMapping(body, util.PerfMetrics)
if err != nil {
return nil, err
}
Expand Down
16 changes: 14 additions & 2 deletions metricbeat/module/kubernetes/container/container_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"github.com/stretchr/testify/assert"

"github.com/elastic/beats/libbeat/common"
"github.com/elastic/beats/metricbeat/module/kubernetes/util"
)

const testFile = "../_meta/test/stats_summary.json"
Expand All @@ -21,14 +22,19 @@ func TestEventMapping(t *testing.T) {
body, err := ioutil.ReadAll(f)
assert.NoError(t, err, "cannot read test file "+testFile)

events, err := eventMapping(body)
cache := util.NewPerfMetricsCache()
cache.NodeCoresAllocatable.Set("gke-beats-default-pool-a5b33e2e-hdww", 2)
cache.NodeMemAllocatable.Set("gke-beats-default-pool-a5b33e2e-hdww", 146227200)
cache.ContainerMemLimit.Set(util.ContainerUID("default", "nginx-deployment-2303442956-pcqfc", "nginx"), 14622720)

events, err := eventMapping(body, cache)
assert.NoError(t, err, "error mapping "+testFile)

assert.Len(t, events, 1, "got wrong number of events")

testCases := map[string]interface{}{
"cpu.usage.core.ns": 43959424,
"cpu.usage.nanocores": 0,
"cpu.usage.nanocores": 11263994,

"logs.available.bytes": 98727014400,
"logs.capacity.bytes": 101258067968,
Expand All @@ -44,6 +50,12 @@ func TestEventMapping(t *testing.T) {
"memory.pagefaults": 841,
"memory.majorpagefaults": 0,

// calculated pct fields:
"cpu.usage.node.pct": 0.005631997,
"cpu.usage.limit.pct": 0.005631997,
"memory.usage.node.pct": 0.01,
"memory.usage.limit.pct": 0.1,

"name": "nginx",

"rootfs.available.bytes": 98727014400,
Expand Down
26 changes: 25 additions & 1 deletion metricbeat/module/kubernetes/container/data.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ import (
"github.com/elastic/beats/libbeat/common"
"github.com/elastic/beats/metricbeat/mb"
"github.com/elastic/beats/metricbeat/module/kubernetes"
"github.com/elastic/beats/metricbeat/module/kubernetes/util"
)

func eventMapping(content []byte) ([]common.MapStr, error) {
func eventMapping(content []byte, perfMetrics *util.PerfMetricsCache) ([]common.MapStr, error) {
events := []common.MapStr{}
var summary kubernetes.Summary

Expand All @@ -19,6 +20,8 @@ func eventMapping(content []byte) ([]common.MapStr, error) {
}

node := summary.Node
nodeCores := perfMetrics.NodeCoresAllocatable.Get(node.NodeName)
nodeMem := perfMetrics.NodeMemAllocatable.Get(node.NodeName)
for _, pod := range summary.Pods {
for _, container := range pod.Containers {
containerEvent := common.MapStr{
Expand Down Expand Up @@ -93,6 +96,27 @@ func eventMapping(content []byte) ([]common.MapStr, error) {
},
},
}

if nodeCores > 0 {
containerEvent.Put("cpu.usage.node.pct", float64(container.CPU.UsageNanoCores)/1e9/nodeCores)
}

if nodeMem > 0 {
containerEvent.Put("memory.usage.node.pct", float64(container.Memory.UsageBytes)/nodeMem)
}

cuid := util.ContainerUID(pod.PodRef.Namespace, pod.PodRef.Name, container.Name)
coresLimit := perfMetrics.ContainerCoresLimit.GetWithDefault(cuid, nodeCores)
memLimit := perfMetrics.ContainerMemLimit.GetWithDefault(cuid, nodeMem)

if coresLimit > 0 {
containerEvent.Put("cpu.usage.limit.pct", float64(container.CPU.UsageNanoCores)/1e9/coresLimit)
}

if memLimit > 0 {
containerEvent.Put("memory.usage.limit.pct", float64(container.Memory.UsageBytes)/memLimit)
}

events = append(events, containerEvent)
}

Expand Down
43 changes: 43 additions & 0 deletions metricbeat/module/kubernetes/pod/_meta/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,46 @@
type: long
description: >
Tx errors
- name: cpu
type: group
description: >
CPU usage metrics
fields:
- name: usage
type: group
fields:
- name: nanocores
type: long
description: >
CPU used nanocores
- name: node.pct
type: scaled_float
format: percentage
description: >
CPU usage as a percentage of the total node CPU
- name: limit.pct
type: scaled_float
format: percentage
description: >
CPU usage as a percentage of the defined limit for the pod containers (or total node CPU if unlimited)
- name: memory
type: group
fields:
- name: usage
type: group
fields:
- name: bytes
type: long
format: bytes
description: >
Total memory usage
- name: node.pct
type: scaled_float
format: percentage
description: >
Memory usage as a percentage of the total node allocatable memory
- name: limit.pct
type: scaled_float
format: percentage
description: >
Memory usage as a percentage of the defined limit for the pod containers (or total node allocatable memory if unlimited)
Loading

0 comments on commit 8bcddff

Please sign in to comment.