Rename or rejig Sampler #92

codefromthecrypt · 2016-01-03T11:25:21Z

Sampled returning false in sleuth is different than most instrumentation I've seen. For example, a span is managed regardless of whether it is sampled (i.e. still generates callbacks etc). The Sampled state controls the exportable field, which in turn controls propagation and storage (and a special-case for users of TraceManager.addAnnotation(key, value) as opposed to getCurrentSpan)

I'd suggest 2 paths, which could both be taken.

Rename Sampler to ShouldExport. Document the propagation and storage aspects there.
Move the ShouldExport logic to the end of a span. Since we are gathering a Span regardless anyway, we might as well use it in the export decision. In other words, change the invocation from before-the-fact (before a span id is created) until after-the-fact (when a span ends or times out).

If controlling overhead thing is an intended goal of the Sampler, then maybe the before-the-fact is still the right choice. I'd have a few suggestions on that.

Trim the state needed.. seems we shouldn't be creating callbacks and the like for unsampled spans. That or it should be clear why we do (maybe it is to others).
Expand guarding to things beyond storage, propagation, TraceManager.setAttribute, or invert the logic. Ex. don't use the Span object at all if not sampled.. use something different (assuming the goal of keeping it around is to coordinate ids or something).

The text was updated successfully, but these errors were encountered:

codefromthecrypt · 2016-01-07T14:01:52Z

here's a related issue with the ruby tracer, which was accidentally gathering state for spans that are tossed openzipkin/zipkin-ruby#40

marcingrzejszczak · 2016-02-19T13:55:57Z

I think that this is a little bit related to our discussions about the "baggage api". For the upcoming release I'd take the 1st path. However ATM the decission about exporting takes place in the instrumentation logic (e.g.):

How would you like to move it to the end of the span?

marcingrzejszczak · 2016-05-10T07:01:44Z

I wouldn't do anything about this for the 1.0.0 release. Let's move it for the next big release.

codefromthecrypt · 2017-01-03T00:34:28Z

since a year has passed, probably needs another look in case (my) understanding is better now

marcingrzejszczak · 2017-02-23T10:35:57Z

I actually have an idea that we could leave the sampler as a name but maybe think of moving sampling as far as possible? Actually to the moment when the span is closed? That way users can do adaptive sampling (which is impossible ATM?). WDYT @adriancole ?

codefromthecrypt · 2017-02-27T08:54:07Z

You can think of adaptive sampling as a combination of parts. first, it is expressed like reservoir sampling, say accepting up to 1k traces/second. That's the target rate. Your actual rate will be different, and need to be corrected (adapted). In easiest case, you never have 1k traces/second, so you never have to correct the rate. How would you know? First, you need to count sample requests over time, like sample requests per minute. Then, you need to count how many sample requests you approve over time. When the latter rate is higher than the target, you need to adapt the sample rate below 100% (you might start with 0%) Commonly, you would "check" your rate at a higher frequency, like once a second. If you are higher than your target, you use some strategy to lower the sample rate. If the latter, you raise the rate. This is the adaptive part. This isn't super-smart because Zipkin doesn't have a capacity in terms of traces because traces can have an arbitrary amount of spans. However, it is smarter than a fixed percentage of traces (because it allows for budget). If you want to get smarter, you can either get smarter about what you keep (trace with error vs normal trace), or get smarter about how much you can keep (system capacity). Getting smarter about what you keep is not possible in sleuth, as it cannot see the result of downstream, and has no means to coordinate a sampling decision. For this, you'd need to do something like send 100% to zipkin-sparkstreaming and move the sample decision there. Getting smarter about how much you can keep is possible. For example, if you read back a metric of spans accepted by a collector, you can do some math to adjust to a target rate based on that. Tricky part is coordination as multiple servers may be sampling (even if it is equivalent nodes in the same cluster). For example, do you split 10k spans/second equally between 10 nodes? what if one dies, or 10 more are added? This gets to rather more complex code, like the zookeeper sampler used by twitter (note this is a collector sampler) https://github.com/openzipkin/zipkin/tree/master/zipkin-zookeeper I don't want to distract the issue here, too much, just think of adaptive as some function and the hard part is modeling the flow of information that can tell you to accept more or less. You can get arbitrarily fancy and the edge cases can become really difficult.

marcingrzejszczak · 2017-02-27T08:58:06Z

Thanks for the analysis. Makes perfect sense. I simplified the adaptive sampling issue. I think that we should close this one for now.

with this pull request we have rewritten the whole Sleuth internals to use Brave. That way we can leverage all the functionalities & instrumentations that Brave already has (https://github.com/openzipkin/brave/tree/master/instrumentation). Migration guide is available here: https://github.com/spring-cloud/spring-cloud-sleuth/wiki/Spring-Cloud-Sleuth-2.0-Migration-Guide fixes #711 - Brave instrumentation fixes #92 - we move to Brave's Sampler fixes #143 - Brave is capable of passing context fixes #255 - we've moved away from Zipkin Stream server fixes #305 - Brave has GRPC instrumentation (https://github.com/openzipkin/brave/tree/master/instrumentation/grpc) fixes #459 - Brave (openzipkin/brave#510) & Zipkin (openzipkin/zipkin#1754) will deal with the AWS XRay instrumentation fixes #577 - Messaging instrumentation has been rewritten

dsyer modified the milestone: 1.0.0.RC1 Feb 18, 2016

marcingrzejszczak added the discussion label Feb 19, 2016

dsyer modified the milestones: 1.0.0.RC1, 1.0.0.RC2 Mar 31, 2016

dsyer modified the milestones: 1.0.0.RC2, 1.0.0 Apr 18, 2016

dsyer modified the milestones: 1.0.0, 1.0.1 May 11, 2016

dsyer modified the milestones: 1.0.1, 1.0.2 Jun 11, 2016

spencergibb modified the milestones: 1.0.2, 1.1.0 Jul 2, 2016

marcingrzejszczak modified the milestones: 1.2.0.M1, 1.1.0.M1 Nov 24, 2016

marcingrzejszczak removed this from the 1.2.0.M1 milestone Jan 16, 2017

marcingrzejszczak added this to the 2.0.0.M6 milestone Jan 19, 2018

marcingrzejszczak mentioned this issue Jan 19, 2018

Sleuth now uses Brave #829

Merged

marcingrzejszczak self-assigned this Jan 19, 2018

marcingrzejszczak added the in progress label Jan 19, 2018

marcingrzejszczak closed this as completed in #829 Jan 19, 2018

marcingrzejszczak removed the in progress label Jan 19, 2018

codefromthecrypt mentioned this issue May 18, 2018

Does Zipkin support ratelimiting sampling openzipkin/brave#705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename or rejig Sampler #92

Rename or rejig Sampler #92

codefromthecrypt commented Jan 3, 2016

codefromthecrypt commented Jan 7, 2016

marcingrzejszczak commented Feb 19, 2016

marcingrzejszczak commented May 10, 2016

codefromthecrypt commented Jan 3, 2017

marcingrzejszczak commented Feb 23, 2017 •

edited

Loading

codefromthecrypt commented Feb 27, 2017 via email •

edited

Loading

marcingrzejszczak commented Feb 27, 2017

Rename or rejig Sampler #92

Rename or rejig Sampler #92

Comments

codefromthecrypt commented Jan 3, 2016

codefromthecrypt commented Jan 7, 2016

marcingrzejszczak commented Feb 19, 2016

marcingrzejszczak commented May 10, 2016

codefromthecrypt commented Jan 3, 2017

marcingrzejszczak commented Feb 23, 2017 • edited Loading

codefromthecrypt commented Feb 27, 2017 via email • edited Loading

marcingrzejszczak commented Feb 27, 2017

marcingrzejszczak commented Feb 23, 2017 •

edited

Loading

codefromthecrypt commented Feb 27, 2017 via email •

edited

Loading