Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanmetrics: Add 10m, 60m aggregation intervals #9926

Merged
merged 8 commits into from
Jan 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions apmpackage/apm/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
- version: generated
changes:
- description: Introduce `metrics-apm.service_destination-${interval}` data stream for service_destination metrics (`1m`, `10m` and `60m`).
type: enhancement
link: https://github.com/elastic/apm-server/pull/9926
- description: Introduce `metrics-apm.transaction-${interval}` data stream for transaction metrics (`1m`, `10m` and `60m`).
type: enhancement
link: https://github.com/elastic/apm-server/pull/9846
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "14d",
"max_size": "50gb"
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "120d",
"actions": {
"delete": {}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "7d",
"max_size": "50gb"
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "30d",
"max_size": "50gb"
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "240d",
"actions": {
"delete": {}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
description: Pipeline for ingesting APM transaction metrics.
processors:
- pipeline:
name: observer_version
- pipeline:
name: observer_ids
- pipeline:
name: ecs_version
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- name: '@timestamp'
external: ecs
- name: data_stream.type
external: ecs
- name: data_stream.dataset
external: ecs
- name: data_stream.namespace
external: ecs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
- external: ecs
name: agent.name
- external: ecs
name: ecs.version
- external: ecs
name: event.outcome
- external: ecs
name: observer.hostname
- external: ecs
name: observer.name
- external: ecs
name: observer.type
- external: ecs
name: observer.version
- external: ecs
name: service.environment
- external: ecs
name: service.name
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
- name: metricset.name
type: constant_keyword
description: Name of the set of metrics.
- name: metricset.interval
type: constant_keyword
description: Metricset aggregation interval.
- name: processor.event
type: constant_keyword
description: Processor event.
- name: processor.name
type: constant_keyword
description: Processor name.
- name: service.target.name
type: keyword
description: Target service for which data is collected.
- name: service.target.type
type: keyword
description: Type of the target service for which data is collected
- name: span.name
type: keyword
description: Generic designation of a span in the scope of a transaction.
- name: span.destination.service.resource
type: keyword
description: |
Identifier for the destination service resource being operated on (e.g. 'http://elastic.co:80', 'elasticsearch', 'rabbitmq/queue_name')
- name: span.destination.service.response_time.count
type: long
description: Number of aggregated outgoing requests.
- name: span.destination.service.response_time.sum.us
type: long
description: Aggregated duration of outgoing requests, in microseconds.
unit: micros
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
title: APM service destination metrics {{ .Interval }}
type: metrics
dataset: apm.service_destination.{{ .Interval }}
ilm_policy: metrics-apm.service_destination_interval_metrics-default_policy.{{ .Interval }}
elasticsearch:
index_template:
mappings:
# Transaction metrics should have all fields strictly mapped;
# we are in full control of the field names.
dynamic: strict
# Individual measurements are typically uninteresting, so
# use synthetic source to reduce storage size.
_source:
mode: synthetic
settings:
index:
sort.field: "@timestamp"
sort.order: desc
5 changes: 4 additions & 1 deletion changelogs/head.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ https://github.com/elastic/apm-server/compare/8.6\...main[View commits]
==== Breaking Changes
- `transaction.failure_count` has been removed. `transaction.success_count` type has changed to `aggregated_metric_double` {pull}9791[9791]
- `transaction.success_count` has been moved to `event.success_count` {pull}9819[9819]
- Improve APM UI query performance by introducing `10m` and `60m` aggregation intervals for transaction metrics. Store them into dedicated data streams `metrics-apm.transaction.${interval}`, and remove transaction metrics from `metrics-apm.interval` {pull}9846[9846]
- Stop indexing transaction metrics from `metrics-apm.interval` {pull}9846[9846]
- Stop indexing span destination metrics from `metrics-apm.interval` {pull}9926[9926]

[float]
==== Deprecations
Expand All @@ -28,3 +29,5 @@ https://github.com/elastic/apm-server/compare/8.6\...main[View commits]
- Dedicated overflow buckets for transaction and service aggregation to limit cardinality {pull}9856[9856]
- Automatically scale `MaxGroups` and `MaxTransactionGroups` based on available memory {pull}9856[9856]
- Set `_doc_count` for service destination metrics {pull}9931[9931]
- Improve APM UI query performance by producing `1m`, `10m` and `60m` aggregation intervals for transaction metrics. Store them into dedicated data streams `metrics-apm.transaction.${interval}` {pull}9846[9846]
- Improve APM UI query performance by producing `1m`, `10m` and `60m` aggregation intervals for span destination metrics. Store them into dedicated data streams `metrics-apm.span_destination.${interval}` {pull}9926[9926]
3 changes: 2 additions & 1 deletion dev_docs/trace_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ aggregation: span events describing an operation that involves another service
are grouped by the originating and target services, and the span latency is
accumulated. For these metrics we record only a count and sum, enabling calculation
of throughput and average latency. Once again, a default limit of 10000 groups is
imposed.
imposed. From 8.7.0 onwards, the Service destination aggregator publishes metrics
for 3 different periods: `1m`, `10m`, and `60m`.

## Dealing with sampling

Expand Down
1 change: 1 addition & 0 deletions docs/data-model.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,7 @@ You can filter and group by these dimensions:
* `service.environment`: The environment of the service that made the request
* `service.target.name`: The target service name, for example `customer_db`
* `service.target.type`: The target service type, for example `mysql`
* `metricset.interval`: A string with the aggregation interval the metricset represents.
--

The `@timestamp` field of these documents holds the start of the aggregation interval.
Expand Down
1 change: 1 addition & 0 deletions docs/data-streams.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Metrics are stored in the following data streams:
// tag::metrics-data-streams[]
- APM internal metrics: `metrics-apm.internal-<namespace>`
- APM transaction metrics: `metrics-apm.transaction.<metricset.interval>-<namespace>`
- APM service destination metrics: `metrics-apm.service_destination.<metricset.interval>-<namespace>`
- Application metrics: `metrics-apm.app.<service.name>-<namespace>`
// end::metrics-data-streams[]
+
Expand Down
5 changes: 1 addition & 4 deletions internal/beater/config/aggregation.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ import (
const (
defaultTransactionAggregationHDRHistogramSignificantFigures = 2

defaultServiceDestinationAggregationInterval = time.Minute
defaultServiceDestinationAggregationMaxGroups = 10000

defaultServiceAggregationInterval = time.Minute
Expand All @@ -46,8 +45,7 @@ type TransactionAggregationConfig struct {

// ServiceDestinationAggregationConfig holds configuration related to span metrics aggregation for service maps.
type ServiceDestinationAggregationConfig struct {
Interval time.Duration `config:"interval" validate:"min=1"`
MaxGroups int `config:"max_groups" validate:"min=1"`
MaxGroups int `config:"max_groups" validate:"min=1"`
}

// ServiceAggregationConfig holds configuration related to service metrics aggregation.
Expand All @@ -64,7 +62,6 @@ func defaultAggregationConfig() AggregationConfig {
HDRHistogramSignificantFigures: defaultTransactionAggregationHDRHistogramSignificantFigures,
},
ServiceDestinations: ServiceDestinationAggregationConfig{
Interval: defaultServiceDestinationAggregationInterval,
MaxGroups: defaultServiceDestinationAggregationMaxGroups,
},
Service: ServiceAggregationConfig{
Expand Down
2 changes: 0 additions & 2 deletions internal/beater/config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,6 @@ func TestUnpackConfig(t *testing.T) {
HDRHistogramSignificantFigures: 1,
},
ServiceDestinations: ServiceDestinationAggregationConfig{
Interval: time.Minute,
MaxGroups: 456,
},
Service: ServiceAggregationConfig{
Expand Down Expand Up @@ -424,7 +423,6 @@ func TestUnpackConfig(t *testing.T) {
HDRHistogramSignificantFigures: 2,
},
ServiceDestinations: ServiceDestinationAggregationConfig{
Interval: time.Minute,
MaxGroups: 10000,
},
Service: ServiceAggregationConfig{
Expand Down
17 changes: 8 additions & 9 deletions systemtest/aggregation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -151,14 +151,7 @@ func TestTransactionAggregationShutdown(t *testing.T) {

func TestServiceDestinationAggregation(t *testing.T) {
systemtest.CleanupElasticsearch(t)
srv := apmservertest.NewUnstartedServerTB(t)
srv.Config.Aggregation = &apmservertest.AggregationConfig{
ServiceDestinations: &apmservertest.ServiceDestinationAggregationConfig{
Interval: time.Second,
},
}
err := srv.Start()
require.NoError(t, err)
srv := apmservertest.NewServerTB(t)

// Send spans to the server to be aggregated.
tracer := srv.Tracer()
Expand All @@ -176,7 +169,13 @@ func TestServiceDestinationAggregation(t *testing.T) {
tx.End()
tracer.Flush(nil)

result := systemtest.Elasticsearch.ExpectDocs(t, "metrics-apm.internal-*",
// Wait for the transaction to be indexed, indicating that Elasticsearch
// indices have been setup and we should not risk triggering the shutdown
// timeout while waiting for the aggregated metrics to be indexed.
systemtest.Elasticsearch.ExpectMinDocs(t, 6, "traces-apm*", nil)
require.NoError(t, srv.Close())

result := systemtest.Elasticsearch.ExpectDocs(t, "metrics-apm.service_destination*",
estest.ExistsQuery{Field: "span.destination.service.response_time.count"},
)
systemtest.ApproveEvents(t, t.Name(), result.Hits.Hits)
Expand Down
19 changes: 1 addition & 18 deletions systemtest/apmservertest/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -296,8 +296,7 @@ func (m *MonitoringConfig) MarshalJSON() ([]byte, error) {

// AggregationConfig holds APM Server metrics aggregation configuration.
type AggregationConfig struct {
Service *ServiceAggregationConfig `json:"service,omitempty"`
ServiceDestinations *ServiceDestinationAggregationConfig `json:"service_destinations,omitempty"`
Service *ServiceAggregationConfig `json:"service,omitempty"`
}

// ServiceAggregationConfig holds APM Server service metrics aggregation configuration.
Expand All @@ -323,22 +322,6 @@ func (s *ServiceAggregationConfig) MarshalJSON() ([]byte, error) {
})
}

// ServiceDestinationAggregationConfig holds APM Server service destination metrics aggregation configuration.
type ServiceDestinationAggregationConfig struct {
Interval time.Duration
}

func (s *ServiceDestinationAggregationConfig) MarshalJSON() ([]byte, error) {
// time.Duration is encoded as int64.
// Convert time.Durations to durations, to encode as duration strings.
type config struct {
Interval string `json:"interval,omitempty"`
}
return json.Marshal(config{
Interval: durationString(s.Interval),
})
}

func durationString(d time.Duration) string {
if d == 0 {
return ""
Expand Down
Loading