Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make intrinsic dimensions configurable and disable status_message by default #1960

Merged
merged 13 commits into from
Dec 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ output:

linters:
enable:
- deadcode
- errcheck
- goconst
- gofmt
Expand All @@ -53,12 +52,10 @@ linters:
- megacheck
- misspell
- revive
- structcheck
- typecheck
- unconvert
- unparam
- unused
- varcheck

linters-settings:
errcheck:
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ Internal types are updated to use `scope` instead of `instrumentation_library`.
* [BUGFIX] New wal file separator '+' for the NTFS filesystem and backward compatibility with the old separator ':' [#1700](https://github.com/grafana/tempo/pull/1700) (@kilian-kier)
* [ENHANCEMENT] metrics-generator: extract `status_message` field from spans [#1786](https://github.com/grafana/tempo/pull/1786), [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
* [ENHANCEMENT] metrics-generator: handle collisions between user defined and default dimensions [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
**BREAKING CHANGE** Custom dimensions colliding with intrinsic dimensions will be prefixed with `__`.
* [ENHANCEMENT] metrics-generator: make intrinsic dimensions configurable and disable `status_message` by default [#1960](https://github.com/grafana/tempo/pull/1960) (@stoewer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe there's a breaking change here (or perhaps one in #1786) that should be mentioned. If the user has a configured normal attribute dimension called status_message then it's name will change to __status_message.

See below for comments on the renaming behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think the actual breaking change was in #1794. I marked the change as breaking in the changelog.

* [ENHANCEMENT] distributor: Log span names when `distributor.log_received_spans.include_all_attributes` is on [#1790](https://github.com/grafana/tempo/pull/1790) (@suraciii)
* [ENHANCEMENT] metrics-generator: truncate label names and values exceeding a configurable length [#1897](https://github.com/grafana/tempo/pull/1897) (@kvrhdn)
* [ENHANCEMENT] Add parquet WAL [#1878](https://github.com/grafana/tempo/pull/1878) (@joe-elliott, @mdisibio)
Expand Down
24 changes: 21 additions & 3 deletions docs/tempo/website/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,10 +284,26 @@ metrics_generator:
# Buckets for the latency histogram in seconds.
[histogram_buckets: <list of float> | default = 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.02, 2.05, 4.10]

# Additional dimensions to add to the metrics along with the default dimensions
# (service, span_name, span_kind, status_code, and status_message). Dimensions are searched
# for in the resource and span attributes and are added to the metrics if present.
# Configure intrinsic dimensions to add to the metrics. Intrinsic dimensions are taken
# directly from the respective resource and span properties.
intrinsic_dimensions:
# Whether to add the name of the service the span is associated with.
[service: <bool> | default = true]
# Whether to add the name of the span.
[span_name: <bool> | default = true]
# Whether to add the span kind describing the relationship between spans.
[span_kind: <bool> | default = true]
# Whether to add the span status code.
[status_code: <bool> | default = true]
# Whether to add a status message. Important note: The span status message may
# contain arbitrary strings and thus have a very high cardinality.
[status_message: <bool> | default = false]

# Additional dimensions to add to the metrics along with the intrinsic dimensions.
# Dimensions are searched for in the resource and span attributes and are added to
# the metrics if present.
[dimensions: <list of string>]


# Registry configuration
registry:
Expand Down Expand Up @@ -1152,6 +1168,8 @@ overrides:
[metrics_generator_processor_service_graphs_histogram_buckets: <list of float>]
[metrics_generator_processor_service_graphs_dimensions: <list of string>]
[metrics_generator_processor_span_metrics_histogram_buckets: <list of float>]
# Allowed keys for intrinsic dimensions are: service, span_name, span_kind, status_code, and status_message.
[metrics_generator_processor_span_metrics_intrinsic_dimensions: <map string to bool>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to add a quick note about what strings are legal as keys

[metrics_generator_processor_span_metrics_dimensions: <list of string>]

# Maximum number of active series in the registry, per instance of the metrics-generator. A
Expand Down
64 changes: 33 additions & 31 deletions docs/tempo/website/configuration/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ go run ./cmd/tempo --storage.trace.backend=local --storage.trace.local.path=/tmp

## Complete configuration

> **Note**: This manifest was generated on 2022-09-13.
> **Note**: This manifest was generated on 2022-12-19.

```yaml
target: all
Expand All @@ -31,6 +31,8 @@ server:
grpc_listen_address: ""
grpc_listen_port: 9095
grpc_listen_conn_limit: 0
tls_cipher_suites: ""
tls_min_version: ""
http_tls_config:
cert_file: ""
key_file: ""
Expand Down Expand Up @@ -86,6 +88,8 @@ distributor:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand All @@ -104,6 +108,7 @@ distributor:
receivers: {}
override_ring_key: distributor
log_received_traces: false
forwarders: []
extend_writes: true
search_tags_deny_list: []
ingester_client:
Expand All @@ -129,6 +134,8 @@ ingester_client:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
metrics_generator_client:
pool_config:
checkinterval: 15s
Expand All @@ -152,6 +159,8 @@ metrics_generator_client:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
querier:
search:
query_timeout: 30s
Expand All @@ -163,7 +172,6 @@ querier:
max_concurrent_queries: 5
frontend_worker:
frontend_address: 127.0.0.1:9095
scheduler_address: ""
dns_lookup_duration: 10s
parallelism: 2
match_max_concurrent: true
Expand All @@ -185,39 +193,13 @@ querier:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
query_relevant_ingesters: false
query_frontend:
log_queries_longer_than: 0s
max_body_size: 0
query_stats_enabled: false
max_outstanding_per_tenant: 100
querier_forget_delay: 0s
scheduler_address: ""
scheduler_dns_lookup_period: 0s
scheduler_worker_concurrency: 0
grpc_client_config:
max_recv_msg_size: 0
max_send_msg_size: 0
grpc_compression: ""
rate_limit: 0
rate_limit_burst: 0
backoff_on_ratelimits: false
backoff_config:
min_period: 0s
max_period: 0s
max_retries: 0
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
instance_interface_names: []
address: ""
port: 0
downstream_url: ""
max_retries: 2
query_shards: 20
search:
concurrent_jobs: 50
target_bytes_per_job: 10485760
Expand All @@ -227,6 +209,7 @@ query_frontend:
query_backend_after: 15m0s
query_ingesters_until: 1h0m0s
trace_by_id:
query_shards: 20
hedge_requests_at: 2s
hedge_requests_up_to: 2
compactor:
Expand All @@ -252,6 +235,8 @@ compactor:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand Down Expand Up @@ -307,6 +292,8 @@ ingester:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand Down Expand Up @@ -342,6 +329,7 @@ ingester:
max_block_bytes: 1073741824
complete_block_timeout: 15m0s
override_ring_key: ring
use_flatbuffer_search: false
metrics_generator:
ring:
kvstore:
Expand All @@ -365,6 +353,8 @@ metrics_generator:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand Down Expand Up @@ -410,10 +400,17 @@ metrics_generator:
- 4.096
- 8.192
- 16.384
intrinsic_dimensions:
service: true
span_name: true
span_kind: true
status_code: true
dimensions: []
registry:
collection_interval: 15s
stale_duration: 15m0s
max_label_name_length: 1024
max_label_value_length: 2048
storage:
path: ""
wal:
Expand All @@ -437,13 +434,14 @@ storage:
blocksfilepath: /tmp/tempo/wal/blocks
encoding: snappy
search_encoding: none
version: v2
ingestion_time_range_slack: 2m0s
block:
index_downsample_bytes: 1048576
index_page_size_bytes: 256000
bloom_filter_false_positive: 0.01
bloom_filter_shard_size_bytes: 102400
version: v2
version: vParquet
encoding: zstd
search_encoding: snappy
search_page_size_bytes: 1048576
Expand Down Expand Up @@ -518,6 +516,7 @@ overrides:
max_traces_per_user: 10000
max_global_traces_per_user: 0
max_search_bytes_per_trace: 5000
forwarders: []
metrics_generator_ring_size: 0
metrics_generator_processors: null
metrics_generator_max_active_series: 0
Expand All @@ -529,6 +528,7 @@ overrides:
metrics_generator_processor_service_graphs_dimensions: []
metrics_generator_processor_span_metrics_histogram_buckets: []
metrics_generator_processor_span_metrics_dimensions: []
metrics_generator_processor_span_metrics_intrinsic_dimensions: {}
block_retention: 0s
max_bytes_per_tag_values_query: 5000000
max_search_duration: 0s
Expand Down Expand Up @@ -569,6 +569,8 @@ memberlist:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
usage_report:
reporting_enabled: true
backoff:
Expand Down
12 changes: 10 additions & 2 deletions modules/generator/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import (
"flag"
"time"

"github.com/pkg/errors"

"github.com/grafana/tempo/modules/generator/processor/servicegraphs"
"github.com/grafana/tempo/modules/generator/processor/spanmetrics"
"github.com/grafana/tempo/modules/generator/registry"
Expand Down Expand Up @@ -50,7 +52,7 @@ func (cfg *ProcessorConfig) RegisterFlagsAndApplyDefaults(prefix string, f *flag
}

// copyWithOverrides creates a copy of the config using values set in the overrides.
func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userID string) ProcessorConfig {
func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userID string) (ProcessorConfig, error) {
copyCfg := *cfg

if buckets := o.MetricsGeneratorProcessorServiceGraphsHistogramBuckets(userID); buckets != nil {
Expand All @@ -65,6 +67,12 @@ func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userI
if dimensions := o.MetricsGeneratorProcessorSpanMetricsDimensions(userID); dimensions != nil {
copyCfg.SpanMetrics.Dimensions = dimensions
}
if dimensions := o.MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID); dimensions != nil {
err := copyCfg.SpanMetrics.IntrinsicDimensions.ApplyFromMap(dimensions)
if err != nil {
return ProcessorConfig{}, errors.Wrap(err, "fail to apply overrides")
}
}

return copyCfg
return copyCfg, nil
}
32 changes: 24 additions & 8 deletions modules/generator/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"github.com/grafana/tempo/modules/generator/processor/servicegraphs"
"github.com/grafana/tempo/modules/generator/processor/spanmetrics"
Expand All @@ -16,20 +17,23 @@ func TestProcessorConfig_copyWithOverrides(t *testing.T) {
Dimensions: []string{},
},
SpanMetrics: spanmetrics.Config{
HistogramBuckets: []float64{1, 2},
Dimensions: []string{"namespace"},
HistogramBuckets: []float64{1, 2},
Dimensions: []string{"namespace"},
IntrinsicDimensions: spanmetrics.IntrinsicDimensions{Service: true},
},
}

t.Run("overrides buckets and dimension", func(t *testing.T) {
o := &mockOverrides{
serviceGraphsHistogramBuckets: []float64{1, 2},
serviceGraphsDimensions: []string{"namespace"},
spanMetricsHistogramBuckets: []float64{1, 2, 3},
spanMetricsDimensions: []string{"cluster", "namespace"},
serviceGraphsHistogramBuckets: []float64{1, 2},
serviceGraphsDimensions: []string{"namespace"},
spanMetricsHistogramBuckets: []float64{1, 2, 3},
spanMetricsDimensions: []string{"cluster", "namespace"},
spanMetricsIntrinsicDimensions: map[string]bool{"status_code": true},
}

copied := original.copyWithOverrides(o, "tenant")
copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.NotEqual(t, *original, copied)

Expand All @@ -38,19 +42,31 @@ func TestProcessorConfig_copyWithOverrides(t *testing.T) {
assert.Equal(t, []string{}, original.ServiceGraphs.Dimensions)
assert.Equal(t, []float64{1, 2}, original.SpanMetrics.HistogramBuckets)
assert.Equal(t, []string{"namespace"}, original.SpanMetrics.Dimensions)
assert.Equal(t, spanmetrics.IntrinsicDimensions{Service: true}, original.SpanMetrics.IntrinsicDimensions)

// assert overrides were applied
assert.Equal(t, []float64{1, 2}, copied.ServiceGraphs.HistogramBuckets)
assert.Equal(t, []string{"namespace"}, copied.ServiceGraphs.Dimensions)
assert.Equal(t, []float64{1, 2, 3}, copied.SpanMetrics.HistogramBuckets)
assert.Equal(t, []string{"cluster", "namespace"}, copied.SpanMetrics.Dimensions)
assert.Equal(t, spanmetrics.IntrinsicDimensions{Service: true, StatusCode: true}, copied.SpanMetrics.IntrinsicDimensions)
})

t.Run("empty overrides", func(t *testing.T) {
o := &mockOverrides{}

copied := original.copyWithOverrides(o, "tenant")
copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.Equal(t, *original, copied)
})

t.Run("invalid overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsIntrinsicDimensions: map[string]bool{"invalid": true},
}

_, err := original.copyWithOverrides(o, "tenant")
require.Error(t, err)
})
}
5 changes: 4 additions & 1 deletion modules/generator/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,10 @@ func (i *instance) watchOverrides() {

func (i *instance) updateProcessors() error {
desiredProcessors := i.overrides.MetricsGeneratorProcessors(i.instanceID)
desiredCfg := i.cfg.Processor.copyWithOverrides(i.overrides, i.instanceID)
desiredCfg, err := i.cfg.Processor.copyWithOverrides(i.overrides, i.instanceID)
if err != nil {
return err
}

i.processorsMtx.RLock()
toAdd, toRemove, toReplace, err := i.diffProcessors(desiredProcessors, desiredCfg)
Expand Down
Loading