-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/spanmetrics] [processor/tailsampling] Tail sampling phases #30320
Conversation
…m could leverage the information about whether the trace is going to be sampled or not
…on attribute value
ff304d0
to
c5e6828
Compare
component: connector/spanmetrics, processor/tail_sampling | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: "Feature to support pre and post tail sampling phases, so other processors or connectors could see which traces will be exported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need an update to reflect the latest changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes senses to split the PR into two, or provide two updates into the change log.
@@ -231,6 +245,14 @@ type Config struct { | |||
// ExpectedNewTracesPerSec sets the expected number of new traces sending to the tail sampling processor | |||
// per second. This helps with allocating data structures with closer to actual usage size. | |||
ExpectedNewTracesPerSec uint64 `mapstructure:"expected_new_traces_per_sec"` | |||
// Processor mode to configure tailsampling phase | |||
// during presample phase traces are marked as sampled but not dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the words you used above, "pre-sampling stage".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea in general, but I have questions about the implementation.
- Marking all spans with the sampling decision doesn't look right to me,
- Using the filter processor to drop spans that are not sampled sounds like a misuse of that processor: we are marking all spans to be dropped, when we really mean that a trace should be dropped. While the implications are the same, I feel like the solution is more like a hack than pieces working correctly together.
- There has to be a processor after this to remove the attribute from the spans
In any case, I would appreciate a review from the SIG Sampling folks, especially @jmacd and @kentquirk.
cumulative = "AGGREGATION_TEMPORALITY_CUMULATIVE" | ||
delta = "AGGREGATION_TEMPORALITY_DELTA" | ||
cumulative = "AGGREGATION_TEMPORALITY_CUMULATIVE" | ||
dynamicExemplarAttributeKeyName = "tail_sampling.sampled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the folks from SIG Sampling should have a say on this feature as a whole, but especially on this attribute, as it has the potential for becoming a semconv to be used by other components.
cc @jmacd, @kentquirk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute is configurable so I don't see a reason why to introduce it as a convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute is configurable so I don't see a reason why to introduce it as a convention.
It's more if we introduce the notion of a sampling key, does it align with what other groups have envisioned or can it be used across more projects.
error_mode: ignore | ||
traces: | ||
span: | ||
- attributes["tail_sampling.sampled"] == nil # Drop spans that are not sampled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sampling decision is trace-based, not span-based. Either a whole trace is dropped, or nothing from the trace is dropped. I understand this specific component (filter) does span filtering, but I believe adding this as the main example will confuse users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say all samplings are span based. Tail-sampling itself does not guarantee that whole trace will be sampled or not. We can never put a strong guarantee that specific trace wont be sampled but we can guarantee that a specific span wont be sampled. Also there is no way to attach decision on a trace as all processors operate on a span level anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a collection of spans in memory, all of the spans belonging to the same trace will either be stored or discarded. Sure, late arriving spans might get a different sampling decision by the tail-sampling processor, but given the right conditions, the decision is consistent.
Also there is no way to attach decision on a trace as all processors operate on a span level anyway.
If you are storing traces (collections of spans belonging to the same trace ID) in memory in the first phase, you can certainly add the sampling decision to the root span only, instead of all spans in that tree.
But I'll defer to other folks with opinions related to sampling (@jmacd and @kentquirk).
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the code are solid, however, this need more consideration on how this fits into the wider process of including sampling decisions across the project(s).
I believe @jpkrohling has already pinged the appropriate people.
cumulative = "AGGREGATION_TEMPORALITY_CUMULATIVE" | ||
delta = "AGGREGATION_TEMPORALITY_DELTA" | ||
cumulative = "AGGREGATION_TEMPORALITY_CUMULATIVE" | ||
dynamicExemplarAttributeKeyName = "tail_sampling.sampled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute is configurable so I don't see a reason why to introduce it as a convention.
It's more if we introduce the notion of a sampling key, does it align with what other groups have envisioned or can it be used across more projects.
MaxPerDataPoint *int `mapstructure:"max_per_data_point"` | ||
Enabled bool `mapstructure:"enabled"` | ||
MaxPerDataPoint *int `mapstructure:"max_per_data_point"` | ||
DynamicConfig *ExemplarDynamicExportConfig `mapstructure:"dynamic"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this as a pointer doesn't provide much value due to the fact that ExamplarDynamicExportConfig
has the field Enabled
which will be false as the zero value.
component: connector/spanmetrics, processor/tail_sampling | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: "Feature to support pre and post tail sampling phases, so other processors or connectors could see which traces will be exported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes senses to split the PR into two, or provide two updates into the change log.
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description: Tail sampling phases
Link to tracking Issue: Issue 30319
Testing: Adjusted spanmetricsconnector tests
Documentation: Documented new properties and their defaults in README.md