Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make intrinsic dimensions configurable and disable status_message by default #1960

Merged
merged 13 commits into from
Dec 21, 2022

Conversation

stoewer
Copy link
Contributor

@stoewer stoewer commented Dec 16, 2022

What this PR does:

The metrics generator always adds the intrinsic dimensions service, span_name, span_kind, status_code, and status_message. The status_message may contain arbitrary strings and can have a very high cardinality (see #1932). Therefore it is safer not to add this dimension by default.

With this PR:

  • All above mentioned intrinsic dimensions are configurable via configuration file and overrides.
  • The status_message is not added by default.

Which issue(s) this PR fixes:
Fixes #1932

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@stoewer stoewer changed the title Optional status message Make intrinsic dimensions configurable and disable status_message by default Dec 16, 2022
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If you can re-generate docs/tempo/website/configuration/manifest.md, that'd be great. It's due an update.

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts.

@@ -1152,6 +1168,7 @@ overrides:
[metrics_generator_processor_service_graphs_histogram_buckets: <list of float>]
[metrics_generator_processor_service_graphs_dimensions: <list of string>]
[metrics_generator_processor_span_metrics_histogram_buckets: <list of float>]
[metrics_generator_processor_span_metrics_intrinsic_dimensions: <map string to bool>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to add a quick note about what strings are legal as keys

@@ -43,3 +44,7 @@ func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsHistogramBuckets(use
func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsDimensions(userID string) []string {
return m.spanMetricsDimensions
}

func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID string) map[string]bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extend TestProcessorConfig_copyWithOverrides to include the intrinsic dimensions?

Also might be worth adding a case to Test_instance_updateProcessors to make sure the diff works with the intrinsics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good. I added both tests.

if cfg.IntrinsicDimensions.StatusMessage {
labels = append(labels, dimStatusMessage)
}
intrinsicDimensions := labels[:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why copy the slice here? it's pointing to the same underlying array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed in order to detect collisions with configured intrinsic dimensions. This is no longer needed as dimensions are now also treated as collisions when the colliding intrinsic dimension is absent.

@@ -108,7 +128,7 @@ func (p *Processor) aggregateMetricsForSpan(svcName string, rs *v1.Resource, spa
p.spanMetricsDurationSeconds.ObserveWithExemplar(registryLabelValues, latencySeconds, tempo_util.TraceIDToHexString(span.TraceId))
}

func sanitizeLabelNameWithCollisions(name string) string {
func sanitizeLabelNameWithCollisions(name string, intrinsicDimensions []string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizing attribute names only if they conflict with configured intrinsics has some weird properties. if someone configures status_message as a normal attribute and then later adds the matching intrinsic then the name of the normal attribute label changes. this may break dashboards or queries b/c it changes the meaning of their metric labels. similar situations can occur when disabling intrinsic dimensions.

my instinct is to sanitize against all intrinsics even if they are not configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -97,6 +97,7 @@ Internal types are updated to use `scope` instead of `instrumentation_library`.
* [BUGFIX] New wal file separator '+' for the NTFS filesystem and backward compatibility with the old separator ':' [#1700](https://github.com/grafana/tempo/pull/1700) (@kilian-kier)
* [ENHANCEMENT] metrics-generator: extract `status_message` field from spans [#1786](https://github.com/grafana/tempo/pull/1786), [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
* [ENHANCEMENT] metrics-generator: handle collisions between user defined and default dimensions [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
* [ENHANCEMENT] metrics-generator: make intrinsic dimensions configurable and disable `status_message` by default [#1960](https://github.com/grafana/tempo/pull/1960) (@stoewer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe there's a breaking change here (or perhaps one in #1786) that should be mentioned. If the user has a configured normal attribute dimension called status_message then it's name will change to __status_message.

See below for comments on the renaming behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think the actual breaking change was in #1794. I marked the change as breaking in the changelog.

MetricsGeneratorProcessorServiceGraphsDimensions []string `yaml:"metrics_generator_processor_service_graphs_dimensions" json:"metrics_generator_processor_service_graphs_dimensions"`
MetricsGeneratorProcessorSpanMetricsHistogramBuckets []float64 `yaml:"metrics_generator_processor_span_metrics_histogram_buckets" json:"metrics_generator_processor_span_metrics_histogram_buckets"`
MetricsGeneratorProcessorSpanMetricsDimensions []string `yaml:"metrics_generator_processor_span_metrics_dimensions" json:"metrics_generator_processor_span_metrics_dimensions"`
MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions map[string]bool `yaml:"metrics_generator_processor_span_metrics_intrinsic_dimensions" json:"metrics_generator_processor_span_metrics_intrinsic_dimensions"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail to load overrides if a value is specified here that is not an intrinsic dimension?

metrics_generator_processor_span_metrics_intrinsic_dimensions:
  status_message: true
  foo: false

Copy link
Contributor Author

@stoewer stoewer Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not think of a good way to validate they intrinsic dimensions while loading the overrides without running into import cycles or repeating dimension definitions in the overrides package.

I solved it now by checking for valid keys when the overrides are applied.

@stoewer stoewer force-pushed the optional-status-message branch from 6db8e29 to eec42ea Compare December 19, 2022 07:15
@stoewer
Copy link
Contributor Author

stoewer commented Dec 19, 2022

If you can re-generate docs/tempo/website/configuration/manifest.md, that'd be great. It's due an update.

Good idea. I regenerated it.

The linters deadcode, structcheck, and varcheck are deprecated since
golangci-lint v1.49.0. They are replace by unused which is already enabled
in our config
Custom dimensions from span attributs are also treated as conflicts
when the conflicting intrinsic dimension is not actually present. This
prevents those dimensions from being renamed when the intrinsic dimension
configuration changes
@stoewer stoewer force-pushed the optional-status-message branch from ba5aedd to 6ac650a Compare December 19, 2022 23:49
@stoewer stoewer force-pushed the optional-status-message branch from 47e95d3 to 9088882 Compare December 20, 2022 01:44
@stoewer stoewer force-pushed the optional-status-message branch from 416ffd8 to 027a59a Compare December 20, 2022 01:55
@stoewer stoewer marked this pull request as ready for review December 20, 2022 02:10
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work

@mapno mapno merged commit 0ed9ddc into grafana:main Dec 21, 2022
@stoewer stoewer deleted the optional-status-message branch March 20, 2023 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to remove dimension from intrinsicDimensions
3 participants