-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: Add trace sampling configuration to the route, to override the route level #6986
Conversation
4cb46dd
to
2f4e82b
Compare
/retest |
🔨 rebuilding |
@dio Coverage now fixed. |
@dio @mattklein123 Is this ok to merge? |
@dio can you take a first pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, I can see this is following the pattern of the decorator. Just one question and do you mind to add the release note to the version_history
as an update?
1be06e8
to
1ca93dd
Compare
@dio Added version_history entry. |
@objectiser can you fix format and I will take a look? /wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks generally looks great. Flushing out a few comments.
/wait
api/envoy/api/v2/route/route.proto
Outdated
// 'tracing.client_sampling' in the :ref:`HTTP Connection Manager | ||
// <config_http_conn_man_runtime>`. | ||
// Default: 100% | ||
envoy.type.Percent client_sampling = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have an opportunity here to "redo" the API can we use FractionalPercent
? It performs better and also allows for smaller percentages. Also, I think having the defaults be 100% was a mistake in the original tracing code. I guess we should be consistent, but do you think we should consider changing them to 0 potentially? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to FractionalPercent
.
Regarding changing the sampling - this was done in Istio a while ago, from 100% to 1%, but users still raise issues regularly about not being able to see tracing data - and have to be directed to the FAQ on setting the sampling rate. So there are pros and cons with any setting - but possibly 0 is good for envoy as it essentially means tracing is disabled by default and has to be explicitly configured with a suitable sampling rate.
include/envoy/router/router.h
Outdated
* This method returns the client sampling percentage. | ||
* @return the client sampling percentage | ||
*/ | ||
virtual uint64_t getClientSampling() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above these should return fractional percent objects.
} | ||
ASSERT(stream_info_.downstreamRemoteAddress() != nullptr); | ||
|
||
ASSERT(!cached_route_); | ||
refreshCachedRoute(); | ||
|
||
if (!state_.is_internally_created_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry why is this new check necessary? Can you add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure can add a comment - essentially previously the call to mutateTracingRequestHeader
was performed inside mutateRequestHeaders
which was subject to the same condition just above. So thought, even though the call has been pulled out from mutateRequestHeaders
, it should still be subject to the same condition? Let me know if not correct.
Is this condition only true for the initial processing of a request at the first proxy? In which case this would prevent the request id being modified mid way through a mesh I assume.
source/common/router/config_impl.h
Outdated
@@ -650,6 +677,7 @@ class RouteEntryImplBase : public RouteEntry, | |||
const std::multimap<std::string, std::string> opaque_config_; | |||
|
|||
const DecoratorConstPtr decorator_; | |||
const RouteTracingConstPtr routeTracing_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: route_tracing_
…he settings defined on the filter Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
d3ccb69
to
69a0441
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great. A few small comments.
/wait
@@ -234,23 +234,30 @@ void ConnectionManagerUtility::mutateTracingRequestHeader(HeaderMap& request_hea | |||
return; | |||
} | |||
|
|||
envoy::type::FractionalPercent client_sampling = config.tracingConfig()->client_sampling_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use a const pointer for these variables to avoid the copy?
source/common/router/config_impl.cc
Outdated
@@ -324,6 +324,22 @@ void DecoratorImpl::apply(Tracing::Span& span) const { | |||
|
|||
const std::string& DecoratorImpl::getOperation() const { return operation_; } | |||
|
|||
RouteTracingImpl::RouteTracingImpl(const envoy::api::v2::route::Tracing& tracing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you are defaulting these to 100%? Can you double check this and add tests if that is what we want to do for defaults? (Or change the docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your previous comment, I was wondering whether a default of 0% would be appropriate here, as they are specific to the route - however the overall_sampling
default of 0% does not seem right.
So think it may be better to use 100% for consistency now, and then if the decision is made to change to 0% at a later date, that can be done at both levels.
… change route tracing sampling defaults to 100% Signed-off-by: Gary Brown <gary@brownuk.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great. 2 small comments. @dio any further comments?
/wait
source/common/router/config_impl.cc
Outdated
} | ||
if (!tracing.has_random_sampling()) { | ||
random_sampling_.set_numerator(10000); | ||
random_sampling_.set_denominator(envoy::type::FractionalPercent::TEN_THOUSAND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think there is any reason to actually set the denominator here to 10,000, right? I think just making it 100/100 like the others is sufficient? Mostly just trying to avoid questions later about why this is 10,000 here.
client_sampling.set_numerator( | ||
tracing_config.has_client_sampling() ? tracing_config.client_sampling().value() : 100); | ||
envoy::type::FractionalPercent random_sampling; | ||
random_sampling.set_numerator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you add a small TODO/comment here that random sampling historically was an integer and default to out of 10,000 but we should deprecate that and move to a straight fractional percent config? The way we have this now is pretty confusing for historical reasons.
This is nice. Thanks! @mattklein123 no more comments from me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo Matt's.
Signed-off-by: Gary Brown <gary@brownuk.com>
@mattklein123 Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
/retest
🤷♀️ nothing to rebuild. |
/azp run |
Description: Enable trace sampling to be overridden at the route level.
Risk Level: low
Testing: added unit tests and manually updated examples/jaeger-tracing to try it out.
Docs Changes: Assume model docs are autogenerated. Not sure if any other docs need to be changed.
Release Notes:
Sampling associated with individual routes can now be overridden to define route specific values.
Fixes #6915