Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equivalent of "logLevel" for spans #3205

Open
svrnm opened this issue Feb 13, 2023 · 39 comments
Open

Equivalent of "logLevel" for spans #3205

svrnm opened this issue Feb 13, 2023 · 39 comments
Labels
spec:trace Related to the specification/trace directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted

Comments

@svrnm
Copy link
Member

svrnm commented Feb 13, 2023

What are you trying to achieve?

Note: Maybe this has been discussed before and I didn't find the conclusion/reason behind it, so please point me to it if so.

A very common property of logs is the level (info, warn, debug, error, ...). This level is useful for plenty of applications, so I was wondering if a similar equivalent would be valuable for spans:

  • A super detailed trace could hold "debug" spans that can be dropped if not needed and kept in the circumstances they are valuable (a related discussion to this happened in the #otel-php-auto-instr slack channel a while back for a very verbose instrumentation. Quoting @pdelewski: "[...] his verbosity is unexpected from application developer point of view, but might be useful from framework development side"
  • Sampling decisions might already take the Status of a span into account, this might be even more helpful (keep 100% of error level, 50% of warning level, 20% of info level, 0% of debug level, etc.)

What did you expect to see?

By either updating the Span Status to allow more attributes or by attributing an additional attribute or a semantic convention, this could be addressed. Based on the fact that the tracing API is stable, I assume mapping this with a semantic convention would be best.

Additional Context

There might be alternative approaches to this, like setting a priority or importance on a span. I just wanted to get some feedback if this might be a valuable addition to spans&traces.

Updated 2024/01/26

Based on the discussions a key point of this issue is the need for being able to drop a partial span tree, e.g.

Span 1 (important) -> Span 2 (not important) -> Span 3 (not important) -> Span 4 (important)

would become

Span 1 -> Span 4.

As stated throughout the discussions this is not a trivial thing to do, but my hope was that there are potential ways to accomplish the same and that this issue could provide a place to discuss

@svrnm svrnm added the spec:trace Related to the specification/trace directory label Feb 13, 2023
@brettmc
Copy link
Contributor

brettmc commented Feb 15, 2023

This is very interesting to me, when thinking about PHP's auto-instrumentation. When developing auto-instrumentation implementations, we currently lack the mechanism to optionally create spans for certain methods which are possibly interesting but also potentially frequent, eg open-telemetry/opentelemetry-demo#659 or a database query's fetch operation.

I've been thinking over the past couple of days whether to invent a variable to control this, something like OTEL_TRACE_LEVEL or OTEL_TRACE_VERBOSITY.
I don't think log-like values of debug/info/warn/error align well with spans, perhaps something like quiet/minimal/normal/detailed?

I don't mind the idea of a priority or importance as a span attribute which could be used to ignore a trace (provided that it doesn't break parent/child relationships, if an ignored span happened to have a child span that was not ignored?)

@pdelewski
Copy link
Member

pdelewski commented Feb 15, 2023

Generally, it's good idea to have priority or level. One of the problem with this approach is deciding for user. For some use cases the same span generated from the same call might have high priority, but low for others, in other words priority might depend on the context. @brettmc recently, I'm thinking more and more about configurability per specific call. Having said that I think that some calls should be mandatory.

@brettmc
Copy link
Contributor

brettmc commented Feb 23, 2023

I found another use-case that could fit in here.... Using auto-instrumentation to create spans for some low-level i/o functions (eg writing to a file/stream), can have a side-effect of creating unwanted spans when doing things like emitting a log after a http request is processed. Because the logging happens outside of the http request processing, the spans are root/parent which is slightly annoying.
Some pseudo-code to illustrate:

on.request
  response = process(request) // <-- all auto-instrumented
  send response
  span.create(attribute: debug).activate()
  log(request, response) // <-- generates some auto-instrumented spans, which are orphaned
  span.end()

As a work-around for this, I discovered that with a bit of hackery I could create and activate a valid, non-recording span around the post-request code, which has the effect of collecting and silencing a bunch of annoying i/o spans.

Applying the concept of span attributes here could achieve the same result - in this case, I'm imagining creating a span at debug/silly/off/ignore level, and leaving it to the sampler to decide to ignore them, or not.

@svrnm
Copy link
Member Author

svrnm commented Feb 27, 2023

@brettmc @pdelewski thanks for sharing your thoughts on this!

One of the problem with this approach is deciding for user.

This is probably the hart issue that needs to be solved but that eventually will give the true benefit of a decent solution. I think there is already some implicit priority within spans (Root Spans are important, spans of SpanKind Client,Server,Consumer,Producer since the represent cross-service communcation are also important, spans with SpanKind internal might be skipped more often, etc.) but as @brettmc outlined there are also corner cases like that example of i/o functions or some end-users complaining that auto-instrumentation creates too much spans, etc.

@RichiCoder1
Copy link

Sampling that works both in the broad context of a span's outcome (tail-based sampling) and spans part (is this span or sub-span interesting?) would be an awesome way to cut down noise. The idea of introducing a priority value to spans would be a neat way to get there.

@hmm01i
Copy link

hmm01i commented Mar 3, 2023

I would also be very interested in this functionality. I've been working on a project and basically using tracing instead of logging to debug and understand performance. But my app is a batch job that will processes 5000+ items for a request, (each iteration over item consisting of multiple steps and tasks). While developing I've added nested spans under a request to a pretty low level. But in production that would create extremely verbose traces. With just two instrumented spans I'm generating over 10000 spans for a single request trace. Using go routines I even hit the default batch limit and spans were being dropped, I increased the limit and solved that specific problem. But at this point I have started commenting out span code in select functions for prod deployment. I could probably wrap my spans in a flag if traceDebug == true {} But when I'm adding attributes to spans within a function this would become tedious and messy.

A extremely simple example of a function I was gathering spans from during dev. There's probably little value to this span in production, but it really helped me visualize the steps in my process and debug as I was iterating on the app.

func ownerUser(ctx context.Context, owner AppOwner) string {
	var userEmail string
	// _, span := tracer.Start(ctx, "ownerUser", trace.WithAttributes(attribute.String("owner", owner.DisplayName)))
	// defer span.End()

	// search for email in owner fields
	m := regexp.MustCompile(`^([a-zA-Z0-9_\-\.]+)@example.com`)

	// For now we are just checking UserPrincipalName
	result := m.FindString(owner.UserPrincipalName)
	// span.SetAttributes(attribute.String("userEmail", result)) // <- ref to span that needs to be if toggled with the span start if i want to enable tracing thru this function.
	
	return userEmail
}

Supporting a Tracing Level would be extremely useful to allow controlling how detailed a trace is. It would be preferable not to remove instrumentation just because they are in too noisy for production.

I feel like this pattern is well established in traditional logging. I would love to see it extended to tracing.

<edit: removed fictitious code example by ChatGPT>

@hmm01i
Copy link

hmm01i commented Mar 3, 2023

Looking closer at the opentelemetry-go it looks like I might be able to add the functionality I want by writing a custom SpanProcessor that would drop a span if it has a specific level Attribute.

Using this as an inspiration: https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/span_processor_filter_example_test.go#L51

I might create something like this. Psudo code just trying to capture the concept:

type TraceLevelFilter struct {
	// Next is the next SpanProcessor in the chain.
	Next SpanProcessor

	// Level is an int similar to a LogLevel scale
	Level int
}

func (f TraceLevelFilter) OnEnd(s ReadOnlySpan) {
	if  s.Attributes["level"] <= f.Level {
		// Drop spans less then requested level
		return
	}
	f.Next.OnEnd(s)
}

I'll experiment with this and let you know if it works as hoped.

@hmm01i
Copy link

hmm01i commented Mar 3, 2023

I was able to implement the functionality I was looking for in Go creating a custom SpanProcessor. I can specify a level (eg INFO) and Spans below that level (eg DEBUG) can remain instrumented but are not exported.

I basically followed the pattern used in the filter_example I linked above. This seems a little involved and I don't really like having to explicitly code the batchExporter into the span processing chain. This is not a clearly documented configuration and breaks if an exporter is added as an option to the NewTracerProvider(), ie all spans get exported.

Maybe there is cleaner approach. But TBH the otel libraries and docs are a bit challenging to grok. I'm open to suggestions for improving.

Ideally it would be nicer if the attribute was a string, eg "trace_level": "DEBUG". An int isn't as clear when reviewing spans. Probably could create a map[int]string of level name/num and update attributes in the filter.

Anyways, I'm happy I can toggle spans without removing or commenting out code.

// Begin TraceLevel Implementation

// Define some const for convenience.
const (
	traceLevelKey = "trace_level"

	traceLevelTRACE = 0
	traceLevelDEBUG = 1
	traceLevelINFO  = 2
	traceLevelWARN  = 3
	traceLevelERROR = 4
	traceLevelFATAL = 5
)

// This could be set from a config or env var
var traceLevel = traceLevelINFO

// The struct handling the filtering logic 
type TraceLevelFilter struct {
	// Next is the next SpanProcessor in the chain.
	Next trace.SpanProcessor

	// Level is an int similar to a numeric LogLevel
	Level int
}

func (f TraceLevelFilter) OnStart(parent context.Context, s trace.ReadWriteSpan) {
	f.Next.OnStart(parent, s)
}
func (f TraceLevelFilter) Shutdown(ctx context.Context) error { return f.Next.Shutdown(ctx) }
func (f TraceLevelFilter) ForceFlush(ctx context.Context) error {
	return f.Next.ForceFlush(ctx)
}

// The actually filtering logic
func (f TraceLevelFilter) OnEnd(s trace.ReadOnlySpan) {
	for _, a := range s.Attributes() {
		if a.Key == traceLevelKey {
			if int(a.Value.AsInt64()) < f.Level {
				// Drop spans less then requested level
				return
			}
	// Continue to next SpanProcessor
	f.Next.OnEnd(s)
}

// setupTraceProvider is a sets up all the necessary pieces and should be called early in main()
func setupTraceProvider() *trace.TracerProvider {
	tp := trace.NewTracerProvider(
		trace.WithResource(resource.Default()),
		trace.WithSpanProcessor(filter),
	)
	filter := TraceLevelFilter{
		Next:  trace.NewBatchSpanProcessor(exporter, trace.WithMaxQueueSize(10000)),
		Level: traceLevel,
	}
	return tp
}
// End TraceLevel implementation code

// Example usage of traceLevel attribute
// ownerEmail is a noisy function. Tracing is instrumented but spans are only exported when filter is set to traceLevelDEBUG or lower.
func ownerEmail(ctx context.Context, owner AppOwner) string {
	var userEmail string
	// start a child span from using parent context. this span should only be exported at debug level
	_, span := tracer.Start(ctx, "ownerUser",
		trace.WithAttributes(
			attribute.String("owner", owner.DisplayName),
			attribute.Int(traceLevelKey, traceLevelDEBUG), // <- Setting the level via an attribute
		))
	defer span.End()

	// search for email in owner fields
	m := regexp.MustCompile(`^([a-zA-Z0-9_\-\.]+)@example.com`)

	// For now we are just checking UserPrincipalName
	result := m.FindString(owner.UserPrincipalName)
	span.SetAttributes(attribute.String("userEmail", result))
	
	return userEmail
}

@cyrille-leclerc
Copy link
Member

cyrille-leclerc commented Mar 3, 2023

This feature would be great as would bring consistency in observability APIs providing the same verbosity concept for developers when capturing logs and traces.

The Jenkins OpenTelemetry Plugin has been happily using this pattern a lot for 12+ months.
Our coding style in Java is:

Tracer tracer = logger.isLoggable(Level.FINE) ? this.tracer : TracerProvider.noop().get("noop");
Span span = tracer.spanBuilder("a-span-for-detailed-troubleshooting").startSpan();
...

See examples here: https://github.com/jenkinsci/opentelemetry-plugin/blob/opentelemetry-2.11.0/src/main/java/io/jenkins/plugins/opentelemetry/job/log/OverallLog.java

@ericmustin
Copy link
Contributor

ericmustin commented Mar 3, 2023

As far as I understand, the approaches above all introduce broken traces, where the parent span will point to a span that's been dropped.

Suppose a simple trace like this:
important span A => unimportant span B => unimportant span C => important span D

if you drop unimportant span B and unimportant span C, then important span D will have a parent_id pointing to a non-existent span.

The appropriate place to make these choices is a Sampler, at span creation time. The issue here imo is a combination of noisy instrumentation (which should be addressed at the specific instrumentation level, not at the broader api/sdk level), and a lack of understanding of how to write custom samplers.

I don't think we should add cruft to the span or trace data model to work around this. Instead, users should open issues with the instrumentation they find noisy, and more education ought to occur wrt how to write and compose custom samplers.

@hmm01i
Copy link

hmm01i commented Mar 3, 2023

@ericmustin You are correct that my approach results in orphaned spans if a parent is dropped but a child is not. But I have run into that problem for a variety of other reasons as well. To generate useful traces it's on me to architect my span relationships, pass context, and add attributes correctly. At this point I don't have a unimportant parent with important children. In my use case I'm excluding spans from lowest child up, as a means of controlling depth of visibility. It's possible as I work with more complex traces this approach will break down.

I will also look into Samplers as alternative so spans get filtered at creation time.

I will add that one of the reasons the TraceLevel is relevant is because I'm using spans were I'd traditionally use log messages and LogLevels. Perhaps this is not the intent of tracing but spans provide significantly greater value as they are capturing timing, express relationships, can have tags, status and events.

It remains to be seen if this is a sustainable usage pattern.

@Aneurysm9
Copy link
Member

Yes, it is possible to adjust tracing levels in OpenTelemetry for Golang to control which spans are exported at what level. OpenTelemetry provides several levels of tracing that can be used to categorize spans, including:

What? OTel Go maintainer here and this is not at all how things work. Is this supposed to be a description of how you would like things to work? If so, it needs to be clarified.

@jkwatson
Copy link
Contributor

jkwatson commented Mar 4, 2023

Yes, it is possible to adjust tracing levels in OpenTelemetry for Golang to control which spans are exported at what level. OpenTelemetry provides several levels of tracing that can be used to categorize spans, including:

What? OTel Go maintainer here and this is not at all how things work. Is this supposed to be a description of how you would like things to work? If so, it needs to be clarified.

@Aneurysm9 I think you missed the preamble:

On a side note, when I asked ChatGPT if this was possible, it provided this response, which was very enticing. (When I tried to confirm this functionality I ended up at this GH issue.)

@svrnm
Copy link
Member Author

svrnm commented Mar 4, 2023

As far as I understand, the approaches above all introduce broken traces, where the parent span will point to a span that's been dropped.
Suppose a simple trace like this:
important span A => unimportant span B => unimportant span C => important span D
if you drop unimportant span B and unimportant span C, then important span D will have a parent_id pointing to a non-
existent span.

Yes, this is a problem you introduce with that. My assumption was that there might be a way to resolve this to
important span A => important span D

The appropriate place to make these choices is a Sampler, at span creation time.

I read through Sampling and indeed it can see how this might be accomplished (i.e. use SpanKind INTERNAL to make a decision with ShouldSample do return DROP), but what is not clear to me: if I drop unimportant span B to Span C, how do I connect span A and span D in that case? I can not read that from the spec.

The issue here imo is a combination of noisy instrumentation (which should be addressed at the specific instrumentation level, not at the broader api/sdk level),

The initial premise of this issue was that there are examples in community built automatic instrumentations (e.g. PHP) which are noisy and that some users might like to see that noise and some users might not be interested in it. Sure, the PHP auto instrumentation can come up with a one off solution, but I assume the same can be relevant for other auto instrumentations.

and a lack of understanding of how to write custom samplers.

Yes, that's the case for me

@ericmustin
Copy link
Contributor

@svrnm

To be clear, i'm not saying that Samplers alone today can provide the functionality you're hoping to provide (complete traces with arbitrary internal chunks removed), i'm saying that "Log levels" would generate fundamentally "broken" traces (ie, incomplete traces with arbitrary internal chunks removed), which i feel is an anti-pattern and should not be encouraged via specification. It doesn't help that there's huge blobs of totally imagined chatgpt code being thrown around in this issue 🙃 (would you mind editing your comment and removing that bit please @hmm01i ? TIA )

Rather than pushing forward this anti pattern, I think the first option should be to open issues with the "noisy" instrumentation and improve those instrumentations, such as by adding configuration to toggle on/off the "debatably" useless spans, or changing certain nested spans into span events on their parents, etc. Here's an example of how we've added such configuration in ruby.

If that isn't enough, I think the second option should be leaning into Sampler configuration to control cost. Right now the appropriate way to manage noisy-ness is via a Sampler, which guarantees at least "complete" traces, and also ensures the reduced overhead of "non recording spans", although it may not offer the flexibility to perfectly paper-over bad/noisy auto-instrumentation.

If neither of those solutions are practical, then I think the correct approach to address this at a specification level is something like what @brettmc suggests here wrt verbosity, but on a dynamic, per trace basis. This would also require modifications to the sdk tracer to handle re-parentage, as well as introduction of new sdk samplers that respect specific baggage keys that dictate verbosity, as well as an agreed definition of what the appropriate flavors of verbosity are and how that translates to specific span kinds (and potentially combinations of span kinds and span attributes). I'm not sure whether this would all be additive changes to the specification, and the burden on individual language SIGS would be pretty high to implement all this functionality, so I'm hesitant to encourage a proposal along those lines when it's not clear the first two options I mentioned have been exhausted. So I will, again, politely encourage exhausting the first two options (improve instrumentation and sample more).

Admittedly we leverage an internal implementation of verbosity at the moment that takes the approach i've outline above, I can inquire about whether that's something we want to OSS (I am not the author of this work).

@svrnm
Copy link
Member Author

svrnm commented Mar 6, 2023

To be clear, i'm not saying that Samplers alone today can provide the functionality you're hoping to provide (complete traces with arbitrary internal chunks removed), i'm saying that "Log levels" would generate fundamentally "broken" traces (ie, incomplete traces with arbitrary internal chunks removed), which i feel is an anti-pattern and should not be encouraged via specification.

The levels (or any other solution) are not leading to broken traces, they are first of all just an annotation of a span like any other attribute. Like any other attribute they can be used to make sampling decisions (they could also be used in a backend to filter spans based on that attribute, similar to any other attribute.

The question is: are they valuable or is there maybe even something better?

It doesn't help that there's huge blobs of totally imagined chatgpt code being thrown around in this issue 🙃 (would you mind editing your comment and removing that bit please @hmm01i ? TIA )

Agreed with that, while I appreciate the time you spend in researching that @hmm01i, I think it can be misleading for someone reading on this issue.

Rather than pushing forward this anti pattern, I think the first option should be to open issues with the "noisy" instrumentation and improve those instrumentations, such as by adding configuration to toggle on/off the "debatably" useless spans, or changing certain nested spans into span events on their parents, etc. Here's an example of how we've added such configuration in ruby.

I disagree with that, there is nothing to fix: What is "noisy" and "not noise" lays in the eye of the end-user. The examples from the PHP auto instrumentation shared are a good example: sometimes the extra spans from the framework are helpful, sometimes not, it's not even always the same for the same person. It all depends on what you want to accomplish.

Furthermore the person doing the implementation of the tracing might not be the same person consuming the telemetry, so at implementation time it might not be even clear what is "noise" and what is relevant?

The configuration of that is so far a language-specific out-of-spec solution, so part of the answer I am looking for could be a standardized configuration for that. @brettmc suggested something similar as well (i.e. introducing a variable like OTEL_TRACE_VERBOSITY).

If that isn't enough, I think the second option should be leaning into Sampler configuration to control cost. Right now the appropriate way to manage noisy-ness is via a Sampler, which guarantees at least "complete" traces, and also ensures the reduced overhead of "non recording spans", although it may not offer the flexibility to perfectly paper-over bad/noisy auto-instrumentation.

If I have 100 traces with 100 spans each I can reduce the amount of data by either sampling the number of traces or by reducing the number of spans. Right now I can only do sampling of traces, this seems incomplete to me.

If neither of those solutions are practical, then I think the correct approach to address this at a specification level is something like what @brettmc suggests here wrt verbosity, but on a dynamic, per trace basis.

I come to realize that my issue title is not choosen wisely because the "logLevel" is just part of the solution, not the problem. What I am looking for is exactly that per-trace sampling capability. Would "Per Trace Sampling" or "span-based sampling" be a more apropriate title?

This would also require modifications to the sdk tracer to handle re-parentage, as well as introduction of new sdk samplers that respect specific baggage keys that dictate verbosity, as well as an agreed definition of what the appropriate flavors of verbosity are and how that translates to specific span kinds (and potentially combinations of span kinds and span attributes).

I don't think it is required to do all of that:

  • Re-parentage: yes, this is mandatory of course
  • Respect specific baggage keys that dictate verbosity: while this sounds interesting, I don't think it's a mandatory feature, each service participating in a trace can have their own level of verbosity.
  • Agreed definition of what the appropriate flavors of verbosity are: yes, that's what those levels or any alternative might be
  • translate to specific span kinds & span attributes: Can you elaborate on that one?

I'm not sure whether this would all be additive changes to the specification, and the burden on individual language SIGS would be pretty high to implement all this functionality, so I'm hesitant to encourage a proposal along those lines when it's not clear the first two options I mentioned have been exhausted. So I will, again, politely encourage exhausting the first two options (improve instrumentation and sample more).

I agree that this might add significant additional functionality, but I neither see improved instrumentation nor "sample more" as the solution to this problem.

Just to add that, "out of scope" or "not right now" is not the answer I am looking for, but I anticipate that this might be the case.

Admittedly we leverage an internal implementation of verbosity at the moment that takes the approach i've outline above, I can inquire about whether that's something we want to OSS (I am not the author of this work).

That would be interesting to see indeed.

@ericmustin
Copy link
Contributor

Like any other attribute they can be used to make sampling decisions (they could also be used in a backend to filter spans based on that attribute, similar to any other attribute.

While i think it would make sense to append an attribute signifying that a span has had some type of verbosity sampling applied to it, at the time of the sampling decision (using a Sampler ), what I understand is you're proposing we add attributes signifying choices a sampler could/should make out of process. I disagree with this approach, mainly bc i dont think it's possible to reliably do re-parentage out of process, at least via any of the existing processors in the otel-collector (maybe I am wrong here?)

Can you elaborate on that one

in the same way that a user may want verbosity to mean only keeping producer/consumer , client/service spans, some users may want to only keep clients that are of a certain type, for example only those that contain db.statement attributes, or something along those lines.

Just to add that, "out of scope" or "not right now" is not the answer I am looking for, but I anticipate that this might be the case.

Sure fair enough, i'll followup with ruby SIG

@lmolkova
Copy link
Contributor

lmolkova commented Mar 6, 2023

span verbosity came up a lot in discussions around open-telemetry/oteps#172 . One finding there was that custom samplers do not allow to suppress spans as they create a new span and context, just not record it.

So in theory verbosity can be implemented with samplers if verbosity is an attribute on the span which is set at start time and therefore visible to samplers, but we'd need another sampling result like 'suppressed' and define what it means (no new context, no attributes attached, etc).
Given that suppression is quite useful in other places (e.g. in java instrumentations we use it to dedup auto-instrumentations), I think verbosity and dedup need a better story than custom sampling.

@hmm01i
Copy link

hmm01i commented Mar 6, 2023

@ericmustin I've removed the ChatGPT code as requested. I can see how it would be confusing.

I have a couple of additional thoughts on this whole thing.

  1. I agree that broken spans are probably not a good practice or something to perpetuate.
  2. I believe the value of spans is dependent on the audience and circumstances. We can't assume that all spans always have the same value.
  3. My intent and desire is to have ability to specify levels of verbosity of a trace. Just as most software have highly verbose (DEBUG) logging that can be enabled to assist with troubleshooting. I want to be able to "dial-down" the trace detail in a production app, but "dial up" in dev / non-prod.

A question related to 3. With the high value of traces and the ability to add events to a span, what is the value of traditional logging? In my most recent project I've basically replaced most of my logging with detailed spans. This may be an unconventional use of tracing, but traces provide so much more value that logs, I could easily see myself almost completely replacing logging with tracing in the future. (I suppose the main difference would be that logs are generally more concise and human readable when sent to stdout.)

@pdelewski
Copy link
Member

pdelewski commented Mar 9, 2023

Let me share my thoughts. The whole discussion regarding this topic started
as a result of open-telemetry/opentelemetry-demo#659 mentioned by @brettmc , however we thought about this problem before.
It was solved by not installing one of the instrumented libraries (PSR-15) which means that
we just disabled instrumentation for all methods/functions in the package as hooking into them does not make sense from
application developer perspective. This can be also done selectively during compilation time without zero performance overhead (and seems to be easy in case of PHP) or applying some form of configuration already mentioned by @ericmustin (and leave decision to user).
Other cases IMO can be also handled in the same way. So for instance, I can describe my intention
using pattern matching or set of conditions to drop
(or not create at all) spans that don't have parent at all, have parent with specific name or have ancestor with specific attribute.
Therefore, I think most cases can be solved by eager filtering and deciding whether create or not create a span based on some form of configuration
and that (this form of formalism) is something we can take into consideration thinking about standarization in the form of spec.
Having said that, there might be cases where sampling will be more appropriate.

@svrnm
Copy link
Member Author

svrnm commented Aug 4, 2023

Following up on this after a while, hoping to reignite the discussion:

From what I read throughout the comments, there are different situations where a span might be removed from a trace:

  1. a span has "lower value" (denoted via verbosity, a priority) and someone sets a threshold to not keep it,
  2. a rule has been setup to drop spans based on a pattern matching, as @pdelewski suggested in the comment above.
  3. deduplication of spans as mentioned by @lmolkova (via Instrumentation layers and suppressing duplicates oteps#172)
  4. a trace becomes big (see https://cloud-native.slack.com/archives/CJFCJHG4Q/p1691064852470689) and there is a need to sample down those traces (even if they are of "equal value")

For all of these, the main reason for doing this is to reduce the amount of data being generated and preferable remove noisy spans. All that before the data hits the backend and is stored somehow.

I believe that this way of reducing data can be an extremely valuable addition to trace-based-sampling (dropping the whole trace), but I owe you some data to backup this assumption.

There are now different potential solutions for that, which have been proposed throughout the discussion:

  1. Optimization: Do not create those spans in the first place
  2. Configuration to disable/enable certain spans of being generated
  3. Suppression, as Java has it implemented
  4. Verbosity, similar to how logs work (DEBUG, INFO, ERROR, etc.)
  5. Other ways of weighting (setting a numeric priority)
  6. Filter rules (pattern matching)
  7. Span-level sampling (vs "trace-level sampling" where a whole trace is dropped, here only spans are removed from the trace)

The challenging ones are those which make decisions if a span is recorded at runtime (3?,4,5,6,7), because by removing the span, relationships could be broken:

  1. parent-child relationship: if span A is parent of span B, and span A is removed, span B remains parentless
  2. attributes, events, exceptions, etc.: any data attached to that span which might be valuable is lost.
  3. span links: if span C links to span D, and span D is removed, the span link from C points to a non-existing span
  4. signal-correlation: if a log E has the span_ID of span F set, but the span F is removed, the log points to an non-existing span. Same is true for an exemplar

For (1) there is probably a requirement to climb ob the tree and change the parent of a child span to their parent's parent.

(2) is either a non-issue (if a trace is dropped the attributes are also gone) or there is maybe an opportunity to "bubble them up" while changing the parent. This could for example be relevant if a span representing a network connection (HTTP) has some sub-spans for DNS, TLS and other connection-level interactions that might not be relevant all the time.

For (3) and (4) I am wondering if/how this is solved with trace-based sampling today? What I mean: if a trace is dropped by a sampler, what happens to potential span links or logs/metrics that point to that trace?

@adriangb
Copy link

adriangb commented Nov 9, 2023

I've played around with custom samplers a bit and one thing that seems unfortunate is that Sampler gets a trace ID but not a span ID. I feel like it would be trivial to make the span ID available since it's generated around the same place (in the Python implementation just a couple of LOC down) and is very cheap to generate. Then a SampleIdRatioBased Sampler would be easy to implement, mirroring TraceIdRatioBased but using a sample rate that possibly comes from an attribute, from context, globally, whatever implementers want.

@Oberon00
Copy link
Member

Oberon00 commented Nov 10, 2023

@adriangb The sampler intentionally does not receive the span ID, see #621. But I wonder if the arguments from there are still usable, after all an unsampled span also receives its own span ID, doesn't it? CC @carlosalberto

@adriangb
Copy link

adriangb commented Nov 10, 2023

Thanks for linking me to that!

It seems to me like the reasoning was that there was no use case so why not remove it? Because as you say the span ID is generated anyway so even if it were expensive the fact that the Sampler doesn't receive it changes nothing.

If so I agree this use case seems like a strong argument to add it back. And to be clear by "this use case" I don't mean any specific implementation but rather giving users the ability to implement a custom Sampler along the lines of what's been discussed here, in #3725 and other places.

@brettmc
Copy link
Contributor

brettmc commented Nov 14, 2023

@adriangb span attribute are already available for making sampling decisions (and the spec mentions that span attributes should be set early so that they are available to custom samplers).

I might be missing something, but the main use-case for having trace id available is for ratio-based sampling. If you also tried to do ratio-based sampling based on a span id, and based on that you dropped some spans within a trace, then that would lead to broken traces wouldn't it? (which is one of the potential issues identified up-thread here)

@adriangb
Copy link

span attribute are already available for making sampling decisions

Yes they are but you can only do all-of-nothing sampling: you don't have any source of randomness that you can sample a ratio from. The reason to make the span ID available is to allow span ID ratio-based sampling based on an attribute.

If you also tried to do ratio-based sampling based on a span id, and based on that you dropped some spans within a trace, then that would lead to broken traces wouldn't it?

If you do it in an exporter yes. If you do it in a Sampler and return a non-recording span then all child spans are also not sampled and thus you prune an entire subtree of spans and the trace is not broken.

@nifrasinnovent
Copy link

Hi folks,
Any implementation plan for this PR ?

@adriangb
Copy link

adriangb commented Dec 19, 2023

I opened https://github.com/open-telemetry/opentelemetry-python/pull/3574 as a proposal to address these feature requests. I'd love some feedback on it.

@adriangb
Copy link

adriangb commented Jan 5, 2024

I was told to discuss that PR here further. Does anyone have feedback on it? As far as I can tell it allows resolving all of the use cases presented here with minimal code changes.

@svrnm
Copy link
Member Author

svrnm commented Jan 6, 2024

Thanks for creating a prototype @adriangb, I am happy to take a look next week when normal operations continue :)

@adriangb
Copy link

Hi @svrnm did you have a chance to look? Thanks!

@svrnm
Copy link
Member Author

svrnm commented Jan 17, 2024

yes, I did. I think the feature I was asking for is different from what you are looking for: your proposal focuses on having the span_id available while sampling, while my initial ask was around "sampling" individual spans (and potentially even keeping their child spans) based on different criteria. So, I don't know much about the span_id being available during the shouldSample method, except the issues @Oberon00 pointed out already

@adriangb
Copy link

your proposal focuses on having the span_id available while sampling, while my initial ask was around "sampling" individual spans

The point of having the span id available is that it gives you something random to sample off of. Consider that you could write (or eventually OTEL could ship) a sampler that considers a "debug" level 0% sampled and anything above 100% sampled. Currently, you can write a sampler that has access to attributes and could use that to sample individual spans but it's all or nothing, it can't statistically sample spans. This approach is compatible with that feature request but is more general, e.g. to allow setting a sample rate for a specific span.

my initial ask was around "sampling" individual spans (and potentially even keeping their child spans)

The keeping child spans thing is going to be a problem for any approach because as others have mentioned there's no easy way to avoid "broken" traces with gaps in the span tree.

@svrnm
Copy link
Member Author

svrnm commented Jan 17, 2024

my initial ask was around "sampling" individual spans (and potentially even keeping their child spans)

The keeping child spans thing is going to be a problem for any approach because as others have mentioned there's no easy way to avoid "broken" traces with gaps in the span tree.

I am fully aware of that, but this is the subject of this issue.

@adriangb
Copy link

Perhaps that is lost in the discussion. Several issues point to this issue and for better or worse it is the most popular issue for something like a log level for spans, which certainly doesn't require that children spans be kept and from a re-skim of the title and OP I can't tell that this issue is strictly about that either. Would it not be useful to you at all if a non-sampled span (say debug) didn't have its children emitted/recorded?

@svrnm
Copy link
Member Author

svrnm commented Jan 17, 2024

You are right, the initial issue description does not indicate that, for me it was implicit part of the original problem statement because of the initial example from PHP (see screenshots here: open-telemetry/opentelemetry-demo#659): the verbose span-sub-tree has 14 spans, the not-so-verbose one has 3, but they are not exclusively at the top of that sub-tree (POST /getquote is, but not {closure} and calculate-quote), so if they would have some concept of which ones are important and which ones are not, you could not simply drop an unimportant span and then decide that all children are unimportant as well.

@adriangb
Copy link

I do see how that could be useful but I also see how it will be more difficult to get across the finish line.

Also I don’t want to take over your issue. Do you ant to edit the title and description to make the use case clearer, or should we open a new issue?

@adriangb
Copy link

@svrnm do you want to edit this issue to make it clearer in one direction or another and if you want to keep the "child spans when parents are not sampled" aspect of this please let me know so we can move the rest of the discussion to different issue. Thanks.

@svrnm
Copy link
Member Author

svrnm commented Jan 26, 2024

@svrnm do you want to edit this issue to make it clearer in one direction or another and if you want to keep the "child spans when parents are not sampled" aspect of this please let me know so we can move the rest of the discussion to different issue. Thanks.

done, see the updated description of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:trace Related to the specification/trace directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted
Projects
None yet
Development

No branches or pull requests