Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data quality metric to measure traces without a root #3812

Merged
merged 2 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* [ENHANCEMENT] Protect ingesters from panics by adding defer/recover to all read path methods. [#3790](https://github.com/grafana/tempo/pull/3790) (@joe-elliott)
* [ENHANCEMENT] Added a boolean flag to enable or disable dualstack mode on Storage block config for S3 [#3721](https://github.com/grafana/tempo/pull/3721) (@sid-jar, @mapno)
* [ENHANCEMENT] Add caching to query range queries [#3796](https://github.com/grafana/tempo/pull/3796) (@mapno)
* [ENHANCEMENT] Add data quality metric to measure traces without a root [#3812](https://github.com/grafana/tempo/pull/3812) (@mapno)
* [BUGFIX] Fix metrics queries when grouping by attributes that may not exist [#3734](https://github.com/grafana/tempo/pull/3734) (@mdisibio)
* [BUGFIX] Fix frontend parsing error on cached responses [#3759](https://github.com/grafana/tempo/pull/3759) (@mdisibio)
* [BUGFIX] max_global_traces_per_user: take into account ingestion.tenant_shard_size when converting to local limit [#3618](https://github.com/grafana/tempo/pull/3618) (@kvrhdn)
Expand Down
5 changes: 5 additions & 0 deletions pkg/dataquality/warnings.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
const (
reasonOutsideIngestionSlack = "outside_ingestion_time_slack"
reasonDisconnectedTrace = "disconnected_trace"
reasonRootlessTrace = "rootless_trace"

PhaseTraceFlushedToWal = "_flushed_to_wal"
PhaseTraceWalToComplete = "_wal_to_complete"
Expand All @@ -27,3 +28,7 @@ func WarnOutsideIngestionSlack(tenant string) {
func WarnDisconnectedTrace(tenant string, phase string) {
metric.WithLabelValues(tenant, reasonDisconnectedTrace+phase).Inc()
}

func WarnRootlessTrace(tenant string, phase string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't disconnected_trace_compactor_combine already cover this?

does the generator publish disconnected_trace_flushed_to_wal and _wal_to_complete. that would tell us a lot about the quality of traces produced by the generator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rootless traces would be a subset of disconnected traces. Maybe it makes more sense as an extension of that warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good on this PR and will approve, but I am struggling to think what it would mean to a Tempo operator to have the majority of their disconnected traces rootless or rooted. Would they investigate data quality issues differently?

That doesn't mean it doesn't have a use and we might be quite happy we added this at some point in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as a particularly bad type of disconnected trace—usually higher in the trace == more important. I think it just adds more information to the operator to investigate.

metric.WithLabelValues(tenant, reasonRootlessTrace+phase).Inc()
}
3 changes: 3 additions & 0 deletions tempodb/compactor.go
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,9 @@ func (rw *readerWriter) compact(ctx context.Context, blockMetas []*backend.Block
DisconnectedTrace: func() {
dataquality.WarnDisconnectedTrace(tenantID, dataquality.PhaseTraceCompactorCombine)
},
RootlessTrace: func() {
dataquality.WarnRootlessTrace(tenantID, dataquality.PhaseTraceCompactorCombine)
},
}

compactor := enc.NewCompactor(opts)
Expand Down
1 change: 1 addition & 0 deletions tempodb/encoding/common/interfaces.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ type CompactionOptions struct {
BytesWritten func(compactionLevel, bytes int)
SpansDiscarded func(traceID string, rootSpanName string, rootServiceName string, spans int)
DisconnectedTrace func()
RootlessTrace func()
}

type Iterator interface {
Expand Down
3 changes: 3 additions & 0 deletions tempodb/encoding/vparquet3/compactor.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ func (c *Compactor) Compact(ctx context.Context, l log.Logger, r backend.Reader,
if !connected {
c.opts.DisconnectedTrace()
}
if tr != nil && tr.RootSpanName == "" {
c.opts.RootlessTrace()
}

c.opts.ObjectsCombined(int(compactionLevel), 1)
return sch.Deconstruct(pool.Get(), tr), nil
Expand Down
3 changes: 3 additions & 0 deletions tempodb/encoding/vparquet3/wal_block.go
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,9 @@ func (b *walBlock) AppendTrace(id common.ID, trace *tempopb.Trace, start, end ui
if !connected {
dataquality.WarnDisconnectedTrace(b.meta.TenantID, dataquality.PhaseTraceFlushedToWal)
}
if b.buffer != nil && b.buffer.RootSpanName == "" {
dataquality.WarnRootlessTrace(b.meta.TenantID, dataquality.PhaseTraceFlushedToWal)
}

start, end = b.adjustTimeRangeForSlack(start, end, 0)

Expand Down
3 changes: 3 additions & 0 deletions tempodb/encoding/vparquet4/compactor.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ func (c *Compactor) Compact(ctx context.Context, l log.Logger, r backend.Reader,
if !connected {
c.opts.DisconnectedTrace()
}
if tr != nil && tr.RootSpanName == "" {
c.opts.RootlessTrace()
}

c.opts.ObjectsCombined(int(compactionLevel), 1)
return sch.Deconstruct(pool.Get(), tr), nil
Expand Down
3 changes: 3 additions & 0 deletions tempodb/encoding/vparquet4/wal_block.go
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,9 @@ func (b *walBlock) AppendTrace(id common.ID, trace *tempopb.Trace, start, end ui
if !connected {
dataquality.WarnDisconnectedTrace(b.meta.TenantID, dataquality.PhaseTraceFlushedToWal)
}
if b.buffer != nil && b.buffer.RootSpanName == "" {
dataquality.WarnRootlessTrace(b.meta.TenantID, dataquality.PhaseTraceFlushedToWal)
}

start, end = b.adjustTimeRangeForSlack(start, end, 0)

Expand Down
Loading