Skip to content

Commit

Permalink
Azure Monitor: adjust grouping logic and avoid duplicating documents …
Browse files Browse the repository at this point in the history
…to make the metricset TSDB-friendly (#36823)

## Overview

(WHAT) Here is a summary of the changes introduced with this PR.

- Update the metric grouping logic
- Track metrics collection info
- Adjust collection interval

### Update the metric grouping logic

Streamlines the metrics grouping logic to include all the fields the TSDB team identified as dimensions for the Azure Metrics events.

Here are the current components of the grouping key:

- timestamp
- namespace
- resource ID
- resource Sub ID
- dimensions
- time grain

It also tries to make the grouping simpler to read.

(WHY)

When TSDB is enabled, it drops events with the same timestamp and dimensions. The metricset must group all metrics values by timestamp+dimensions and create one event for each group to avoid data loss.

### Track metrics collection info

The metricset tracks the timestamp and time grain for each metrics collection. At the beginning of each iteration, it skips collecting a value if the metricset has already collected a value for the (time grain) period.

(WHY)

The metricset usually collects one data point for each collection period. When the time grain is larger than the collection period, the metricset collects the identical data point multiple times.

For example, consider a `PT1H` (one hour) time grain and a collection period of five minutes: without tracking, the metrics would collect the identical `PT1H` data point 12 times.

### Adjust collection interval

Change the collection interval to `[{-2 x INTERVAL},{-1 x INTERVAL})` with a delay of `{INTERVAL}`.

(WHY)

The collection interval was [2x the collection period or time grain](https://github.com/elastic/beats/blob/ed34c37f59c7bc0cf9e8051f7b5327c861b59467/x-pack/metricbeat/module/azure/client.go#L110-L116). This interval is too large, and we collected multiple data points for the same metric. There was some code to drop the additional data points, but it wasn't working in all cases.

Glossary:
- collection interval: the time range used to fetch metrics values.
- collection period: time between metric collections (e.g., with a 5 min period, the metricset collects new metrics every 5 minutes)

(cherry picked from commit 886d078)
  • Loading branch information
zmoog authored and mergify[bot] committed Nov 22, 2023
1 parent a590ba4 commit 6af9c01
Show file tree
Hide file tree
Showing 9 changed files with 561 additions and 323 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,13 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Fix EC2 host.cpu.usage {pull}35717[35717]
- Add option in SQL module to execute queries for all dbs. {pull}35688[35688]
- Add remaining dimensions for azure storage account to make them available for tsdb enablement. {pull}36331[36331]
- Add missing 'TransactionType' dimension for Azure Storage Account. {pull}36413[36413]
- Add log error when statsd server fails to start {pull}36477[36477]
- Fix CassandraConnectionClosures metric configuration {pull}34742[34742]
- Fix event mapping implementation for statsd module {pull}36925[36925]
- The region and availability_zone ecs fields nested within the cloud field. {pull}37015[37015]
- Fix CPU and memory metrics collection from privileged process on Windows {issue}17314[17314]{pull}37027[37027]
- Enhanced Azure Metrics metricset with refined grouping logic and resolved duplication issues for TSDB compatibility {pull}36823[36823]

*Osquerybeat*

Expand Down
19 changes: 12 additions & 7 deletions x-pack/metricbeat/module/azure/add_metadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"github.com/elastic/elastic-agent-libs/mapstr"
)

// addHostMetadata enriches the event with host metadata.
func addHostMetadata(event *mb.Event, metricList mapstr.M) {
hostFieldTable := map[string]string{
"percentage_cpu.avg": "host.cpu.usage",
Expand All @@ -30,24 +31,28 @@ func addHostMetadata(event *mb.Event, metricList mapstr.M) {
if metricName == "percentage_cpu.avg" {
value = value / 100
}
event.RootFields.Put(hostName, value)
_, _ = event.RootFields.Put(hostName, value)
}
}
}

// addCloudVMMetadata enriches the event with cloud VM metadata.
func addCloudVMMetadata(event *mb.Event, vm VmResource, subscriptionId string) {
if vm.Name != "" {
event.RootFields.Put("cloud.instance.name", vm.Name)
event.RootFields.Put("host.name", vm.Name)
_, _ = event.RootFields.Put("cloud.instance.name", vm.Name)
_, _ = event.RootFields.Put("host.name", vm.Name)
}

if vm.Id != "" {
event.RootFields.Put("cloud.instance.id", vm.Id)
event.RootFields.Put("host.id", vm.Id)
_, _ = event.RootFields.Put("cloud.instance.id", vm.Id)
_, _ = event.RootFields.Put("host.id", vm.Id)
}

if vm.Size != "" {
event.RootFields.Put("cloud.machine.type", vm.Size)
_, _ = event.RootFields.Put("cloud.machine.type", vm.Size)
}

if subscriptionId != "" {
event.RootFields.Put("cloud.account.id", subscriptionId)
_, _ = event.RootFields.Put("cloud.account.id", subscriptionId)
}
}
35 changes: 27 additions & 8 deletions x-pack/metricbeat/module/azure/azure.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,24 +87,43 @@ func NewMetricSet(base mb.BaseMetricSet) (*MetricSet, error) {
// It publishes the event which is then forwarded to the output. In case
// of an error set the Error field of mb.Event or simply call report.Error().
func (m *MetricSet) Fetch(report mb.ReporterV2) error {
// Initialize cloud resources and monitor metrics
// information.
//
// The client collects and stores:
// - existing cloud resource definitions (e.g. VMs, DBs, etc.)
// - metric definitions for the resources (e.g. CPU, memory, etc.)
//
// The metricset periodically refreshes the information
// after `RefreshListInterval` (default 600s for
// most metricsets).
err := m.Client.InitResources(m.MapMetrics)
if err != nil {
return err
}

if len(m.Client.ResourceConfigurations.Metrics) == 0 {
// error message is previously logged in the InitResources, no error event should be created
// error message is previously logged in the InitResources,
// no error event should be created
return nil
}
// retrieve metrics
groupedMetrics := groupMetricsByResource(m.Client.ResourceConfigurations.Metrics)

for _, metrics := range groupedMetrics {
results := m.Client.GetMetricValues(metrics, report)
err := EventsMapping(results, m.Client, report)
if err != nil {
return fmt.Errorf("error running EventsMapping: %w", err)
// Group metric definitions by cloud resource ID.
//
// We group the metric definitions by resource ID to fetch
// metric values for each cloud resource in one API call.
metricsByResourceId := groupMetricsDefinitionsByResourceId(m.Client.ResourceConfigurations.Metrics)

for _, metricsDefinition := range metricsByResourceId {
// Fetch metric values for each resource.
metricValues := m.Client.GetMetricValues(metricsDefinition, report)

// Turns metric values into events and sends them to Elasticsearch.
if err := mapToEvents(metricValues, m.Client, report); err != nil {
return fmt.Errorf("error mapping metrics to events: %w", err)
}
}

return nil
}

Expand Down
Loading

0 comments on commit 6af9c01

Please sign in to comment.