-
Notifications
You must be signed in to change notification settings - Fork 50
Add details about supporting qps based sampler. #158
Conversation
trace/Sampling.md
Outdated
applied to a child `Span` of a **sampled** parent `Span`, the child `Span` keeps the sampling decision. | ||
applied to a child `Span` of a **sampled** parent `Span`, the child `Span` keeps the sampling | ||
decision. | ||
* `QPS` - sampler that tries to sample traces based on QPS (e.g. 1 trace/second). When applied to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure QPS
is a good name for this. It's common in Google to use QPS to mean rate of anything (not just queries) but I think this is less common elsewhere in my experience, but I would be interested to hear non-Google perspectives on this. What about "fixed rate sampler"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Jaeger we call them rate limiting samplers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for rate limiting - my previous suggestion of fixed rate isn't good because if the application produces fewer than this number of spans, the rate will actually be lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
trace/Sampling.md
Outdated
### QPS sampler implementation details | ||
The problem we are trying to solve is: | ||
1. Getting QPS based sampling. | ||
2. Providing real sampling probabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? Do you mean that we can attach to each span the actual sampling probability in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my idea that we may want to do this.
I think Amazon calls it reservoir sampler
|
Reservoir sampling is an algorithm to obtain a fixed size sample from a continuous stream where each item (usually) has a fixed chance of being included in the sample. Seems like it could be useful as a technique for implementing this but I don't like it as a name since it is more descriptive of the "how" rather than the "what". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I agree that "rate limiting samplers" sounds like a better name.
trace/Sampling.md
Outdated
applied to a child `Span` of a **sampled** parent `Span`, the child `Span` keeps the sampling decision. | ||
applied to a child `Span` of a **sampled** parent `Span`, the child `Span` keeps the sampling | ||
decision. | ||
* `QPS` - sampler that tries to sample traces based on QPS (e.g. 1 trace/second). When applied to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nevinheintze mentioned offline that we might want to set the default rate significantly lower than 1 trace/sec to avoid issues when there are a very large number of tasks - maybe 0.1 trace/sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
trace/Sampling.md
Outdated
@@ -33,3 +37,26 @@ The OpenCensus library samples based on the following rules: | |||
3. If the span is a child of a local `Span` the sampling decision will be: | |||
* If a "span-scoped" `Sampler` is provided, use it to determine the sampling decision. | |||
* Else keep the sampling decision from the parent. | |||
|
|||
### QPS sampler implementation details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative here would be to use reservoir sampling I think, should we explore that? Sounds like it's what is used to accomplish this for other systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I would look into that. The advantage of this algorithm is that we update only one global atomic variable, happy to see a small pseudocode for the reservoir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A disadvantage for this is we most likely apply the probability using the trace-id (or last part of the trace-id) which we need to be randomly generated to ensure the correctness OR we generate a random number at that point.
What is your thought on this?
FWIW rate limited is less jargon anyway so good by me
|
This idea belongs to a different Google engineer who does not have a github account to link.