Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics generator: make status_message a default dimension #1794

Merged
merged 5 commits into from
Oct 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ query_frontend:
* [ENHANCEMENT] Vulture now has improved distribution of the random traces it searches. [#1763](https://github.com/grafana/tempo/pull/1763) (@rfratto)
* [ENHANCEMENT] Upgrade opentelemetry-proto submodule to v0.18.0 [#1754](https://github.com/grafana/tempo/pull/1754) (@mapno)
Internal types are updated to use `scope` instead of `instrumentation_library`. This is a breaking change in trace by ID queries if JSON is requested.
* [ENHANCEMENT] Metrics generator: extract `status_message` field from spans [#1786](https://github.com/grafana/tempo/pull/1786) (@stoewer)
* [ENHANCEMENT] metrics-generator: extract `status_message` field from spans [#1786](https://github.com/grafana/tempo/pull/1786), [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
* [ENHANCEMENT] metrics-generator: handle collisions between user defined and default dimensions [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
* [ENHANCEMENT] distributor: Log span names when `distributor.log_received_spans.include_all_attributes` is on [#1790](https://github.com/grafana/tempo/pull/1790) (@suraciii)
* [BUGFIX] Honor caching and buffering settings when finding traces by id [#1697](https://github.com/grafana/tempo/pull/1697) (@joe-elliott)
* [BUGFIX] Correctly propagate errors from the iterator layer up through the queriers [#1723](https://github.com/grafana/tempo/pull/1723) (@joe-elliott)
Expand Down
5 changes: 2 additions & 3 deletions docs/tempo/website/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,9 +248,8 @@ metrics_generator:
[histogram_buckets: <list of float> | default = 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.02, 2.05, 4.10]

# Additional dimensions to add to the metrics along with the default dimensions
# (service, span_name, span_kind and span_status). Dimensions are searched for in the
# resource and span attributes as well as `status.message`. Listed dimensions are
# added to the metrics if present.
# (service, span_name, span_kind, status_code, and status_message). Dimensions are searched
# for in the resource and span attributes and are added to the metrics if present.
[dimensions: <list of string>]

# Registry configuration
Expand Down
4 changes: 4 additions & 0 deletions docs/tempo/website/metrics-generator/span_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ The following metrics are exported:

> **Note:** In Tempo 1.4 and 1.4.1 the histogram metric was called `traces_spanmetrics_duration_seconds`. This was changed later to be consistent with the metrics generated by the Grafana Agent and the OpenTelemetry Collector.

By default, the metrics processor adds the following labels to each metric: `service`, `span_name`, `span_kind`, `status_code`, `status_message`.
Additional user defined labels can be created using the [`dimensions` configuration option]({{< relref "../configuration/#metrics-generator" >}}).
When a configured dimension collides with one of the default labels (e.g. `status_code`), the label for the respective dimension is prefixed with double underscore (i.e. `__status_code`).

## Example

<p align="center"><img src="../span-metrics-example.png" alt="Span metrics overview"></p>
38 changes: 28 additions & 10 deletions modules/generator/processor/spanmetrics/spanmetrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ const (
metricSizeTotal = "traces_spanmetrics_size_total"
)

var (
intrinsicDimensions = []string{"service", "span_name", "span_kind", "status_code", "status_message"}
joe-elliott marked this conversation as resolved.
Show resolved Hide resolved
)

type Processor struct {
Cfg Config

Expand All @@ -34,9 +38,10 @@ type Processor struct {
}

func New(cfg Config, registry registry.Registry) gen.Processor {
labels := []string{"service", "span_name", "span_kind", "status_code"}
labels := make([]string, 0, len(intrinsicDimensions)+len(cfg.Dimensions))
labels = append(labels, intrinsicDimensions...)
for _, d := range cfg.Dimensions {
labels = append(labels, strutil.SanitizeLabelName(d))
labels = append(labels, sanitizeLabelNameWithCollisions(d))
}

return &Processor{
Expand Down Expand Up @@ -79,16 +84,17 @@ func (p *Processor) aggregateMetricsForSpan(svcName string, rs *v1.Resource, spa
latencySeconds := float64(span.GetEndTimeUnixNano()-span.GetStartTimeUnixNano()) / float64(time.Second.Nanoseconds())

labelValues := make([]string, 0, 4+len(p.Cfg.Dimensions))
labelValues = append(labelValues, svcName, span.GetName(), span.GetKind().String(), span.GetStatus().GetCode().String())
// important: the order of labelValues must correspond to the order of labels / intrinsicDimensions
labelValues = append(
joe-elliott marked this conversation as resolved.
Show resolved Hide resolved
labelValues,
svcName,
span.GetName(),
span.GetKind().String(),
span.GetStatus().GetCode().String(),
span.GetStatus().GetMessage())

for _, d := range p.Cfg.Dimensions {
var value string
switch d {
case "status.message", "status_message":
value = span.Status.GetMessage()
default:
value, _ = processor_util.FindAttributeValue(d, rs.Attributes, span.Attributes)
}
value, _ := processor_util.FindAttributeValue(d, rs.Attributes, span.Attributes)
labelValues = append(labelValues, value)
}

Expand All @@ -98,3 +104,15 @@ func (p *Processor) aggregateMetricsForSpan(svcName string, rs *v1.Resource, spa
p.spanMetricsSizeTotal.Inc(registryLabelValues, float64(span.Size()))
p.spanMetricsDurationSeconds.ObserveWithExemplar(registryLabelValues, latencySeconds, tempo_util.TraceIDToHexString(span.TraceId))
}

func sanitizeLabelNameWithCollisions(name string) string {
sanitized := strutil.SanitizeLabelName(name)

for _, dim := range intrinsicDimensions {
if sanitized == dim {
return "__" + sanitized
}
}

return sanitized
}
60 changes: 54 additions & 6 deletions modules/generator/processor/spanmetrics/spanmetrics_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,11 @@ func TestSpanMetrics(t *testing.T) {
fmt.Println(testRegistry)

lbls := labels.FromMap(map[string]string{
"service": "test-service",
"span_name": "test",
"span_kind": "SPAN_KIND_CLIENT",
"status_code": "STATUS_CODE_OK",
"service": "test-service",
"span_name": "test",
"span_kind": "SPAN_KIND_CLIENT",
"status_code": "STATUS_CODE_OK",
"status_message": "OK",
})

assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_calls_total", lbls))
Expand All @@ -56,7 +57,7 @@ func TestSpanMetrics_dimensions(t *testing.T) {
cfg := Config{}
cfg.RegisterFlagsAndApplyDefaults("", nil)
cfg.HistogramBuckets = []float64{0.5, 1}
cfg.Dimensions = []string{"status.message", "foo", "bar", "does-not-exist"}
cfg.Dimensions = []string{"foo", "bar", "does-not-exist"}

p := New(cfg, testRegistry)
defer p.Shutdown(context.Background())
Expand All @@ -67,7 +68,6 @@ func TestSpanMetrics_dimensions(t *testing.T) {
// Add some attributes
for _, rs := range batch.ScopeSpans {
for _, s := range rs.Spans {
s.Status.Message = "OK"
s.Attributes = append(s.Attributes, &common_v1.KeyValue{
Key: "foo",
Value: &common_v1.AnyValue{Value: &common_v1.AnyValue_StringValue{StringValue: "foo-value"}},
Expand Down Expand Up @@ -103,6 +103,54 @@ func TestSpanMetrics_dimensions(t *testing.T) {
assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_latency_sum", lbls))
}

func TestSpanMetrics_collisions(t *testing.T) {
testRegistry := registry.NewTestRegistry()

cfg := Config{}
cfg.RegisterFlagsAndApplyDefaults("", nil)
cfg.HistogramBuckets = []float64{0.5, 1}
cfg.Dimensions = []string{"span.kind", "status_message"}

p := New(cfg, testRegistry)
defer p.Shutdown(context.Background())

batch := test.MakeBatch(10, nil)
for _, rs := range batch.ScopeSpans {
for _, s := range rs.Spans {
s.Attributes = append(s.Attributes, &common_v1.KeyValue{
Key: "span.kind",
Value: &common_v1.AnyValue{Value: &common_v1.AnyValue_StringValue{StringValue: "colliding_kind"}},
})
s.Attributes = append(s.Attributes, &common_v1.KeyValue{
Key: "status_message",
Value: &common_v1.AnyValue{Value: &common_v1.AnyValue_StringValue{StringValue: "colliding_message"}},
})
}
}

p.PushSpans(context.Background(), &tempopb.PushSpansRequest{Batches: []*trace_v1.ResourceSpans{batch}})

fmt.Println(testRegistry)

lbls := labels.FromMap(map[string]string{
"service": "test-service",
"span_name": "test",
"span_kind": "SPAN_KIND_CLIENT",
"status_code": "STATUS_CODE_OK",
"status_message": "OK",
"__span_kind": "colliding_kind",
"__status_message": "colliding_message",
})

assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_calls_total", lbls))

assert.Equal(t, 0.0, testRegistry.Query("traces_spanmetrics_latency_bucket", withLe(lbls, 0.5)))
assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_latency_bucket", withLe(lbls, 1)))
assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_latency_bucket", withLe(lbls, math.Inf(1))))
assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_latency_count", lbls))
assert.Equal(t, 10.0, testRegistry.Query("traces_spanmetrics_latency_sum", lbls))
}

func withLe(lbls labels.Labels, le float64) labels.Labels {
lb := labels.NewBuilder(lbls)
lb = lb.Set(labels.BucketLabel, strconv.FormatFloat(le, 'f', -1, 64))
Expand Down
3 changes: 2 additions & 1 deletion pkg/util/test/req.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ func MakeSpan(traceID []byte) *v1_trace.Span {
SpanId: make([]byte, 8),
Kind: v1_trace.Span_SPAN_KIND_CLIENT,
Status: &v1_trace.Status{
Code: 1,
Code: 1,
Message: "OK",
},
StartTimeUnixNano: uint64(now.UnixNano()),
EndTimeUnixNano: uint64(now.Add(time.Second).UnixNano()),
Expand Down