Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: Add trace sampling configuration to the route, to override the route level #6986

Merged
merged 9 commits into from
May 28, 2019

Conversation

objectiser
Copy link
Contributor

@objectiser objectiser commented May 17, 2019

Description: Enable trace sampling to be overridden at the route level.

Risk Level: low
Testing: added unit tests and manually updated examples/jaeger-tracing to try it out.
Docs Changes: Assume model docs are autogenerated. Not sure if any other docs need to be changed.
Release Notes:
Sampling associated with individual routes can now be overridden to define route specific values.

Fixes #6915

@objectiser objectiser force-pushed the routetraceconfig branch 2 times, most recently from 4cb46dd to 2f4e82b Compare May 17, 2019 18:02
@dio
Copy link
Member

dio commented May 18, 2019

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6986 (comment) was created by @dio.

see: more, trace.

@objectiser
Copy link
Contributor Author

@dio Coverage now fixed.

@objectiser
Copy link
Contributor Author

@dio @mattklein123 Is this ok to merge?

@mattklein123 mattklein123 self-assigned this May 22, 2019
@mattklein123
Copy link
Member

@dio can you take a first pass?

@dio dio changed the title tracing: Add trace sampling configuration to the route, to override t… tracing: Add trace sampling configuration to the route, to override the route level May 22, 2019
Copy link
Member

@dio dio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I can see this is following the pattern of the decorator. Just one question and do you mind to add the release note to the version_history as an update?

source/common/http/conn_manager_impl.cc Show resolved Hide resolved
@objectiser
Copy link
Contributor Author

@dio Added version_history entry.

@mattklein123
Copy link
Member

@objectiser can you fix format and I will take a look?

/wait

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks generally looks great. Flushing out a few comments.

/wait

// 'tracing.client_sampling' in the :ref:`HTTP Connection Manager
// <config_http_conn_man_runtime>`.
// Default: 100%
envoy.type.Percent client_sampling = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have an opportunity here to "redo" the API can we use FractionalPercent? It performs better and also allows for smaller percentages. Also, I think having the defaults be 100% was a mistake in the original tracing code. I guess we should be consistent, but do you think we should consider changing them to 0 potentially? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to FractionalPercent.

Regarding changing the sampling - this was done in Istio a while ago, from 100% to 1%, but users still raise issues regularly about not being able to see tracing data - and have to be directed to the FAQ on setting the sampling rate. So there are pros and cons with any setting - but possibly 0 is good for envoy as it essentially means tracing is disabled by default and has to be explicitly configured with a suitable sampling rate.

* This method returns the client sampling percentage.
* @return the client sampling percentage
*/
virtual uint64_t getClientSampling() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above these should return fractional percent objects.

source/common/http/async_client_impl.h Show resolved Hide resolved
}
ASSERT(stream_info_.downstreamRemoteAddress() != nullptr);

ASSERT(!cached_route_);
refreshCachedRoute();

if (!state_.is_internally_created_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry why is this new check necessary? Can you add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can add a comment - essentially previously the call to mutateTracingRequestHeader was performed inside mutateRequestHeaders which was subject to the same condition just above. So thought, even though the call has been pulled out from mutateRequestHeaders, it should still be subject to the same condition? Let me know if not correct.

Is this condition only true for the initial processing of a request at the first proxy? In which case this would prevent the request id being modified mid way through a mesh I assume.

@@ -650,6 +677,7 @@ class RouteEntryImplBase : public RouteEntry,
const std::multimap<std::string, std::string> opaque_config_;

const DecoratorConstPtr decorator_;
const RouteTracingConstPtr routeTracing_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: route_tracing_

…he settings defined on the filter

Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great. A few small comments.

/wait

@@ -234,23 +234,30 @@ void ConnectionManagerUtility::mutateTracingRequestHeader(HeaderMap& request_hea
return;
}

envoy::type::FractionalPercent client_sampling = config.tracingConfig()->client_sampling_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use a const pointer for these variables to avoid the copy?

source/common/http/conn_manager_config.h Show resolved Hide resolved
@@ -324,6 +324,22 @@ void DecoratorImpl::apply(Tracing::Span& span) const {

const std::string& DecoratorImpl::getOperation() const { return operation_; }

RouteTracingImpl::RouteTracingImpl(const envoy::api::v2::route::Tracing& tracing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you are defaulting these to 100%? Can you double check this and add tests if that is what we want to do for defaults? (Or change the docs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your previous comment, I was wondering whether a default of 0% would be appropriate here, as they are specific to the route - however the overall_sampling default of 0% does not seem right.

So think it may be better to use 100% for consistency now, and then if the decision is made to change to 0% at a later date, that can be done at both levels.

… change route tracing sampling defaults to 100%

Signed-off-by: Gary Brown <gary@brownuk.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great. 2 small comments. @dio any further comments?

/wait

}
if (!tracing.has_random_sampling()) {
random_sampling_.set_numerator(10000);
random_sampling_.set_denominator(envoy::type::FractionalPercent::TEN_THOUSAND);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think there is any reason to actually set the denominator here to 10,000, right? I think just making it 100/100 like the others is sufficient? Mostly just trying to avoid questions later about why this is 10,000 here.

client_sampling.set_numerator(
tracing_config.has_client_sampling() ? tracing_config.client_sampling().value() : 100);
envoy::type::FractionalPercent random_sampling;
random_sampling.set_numerator(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you add a small TODO/comment here that random sampling historically was an integer and default to out of 10,000 but we should deprecate that and move to a straight fractional percent config? The way we have this now is pretty confusing for historical reasons.

@dio
Copy link
Member

dio commented May 27, 2019

This is nice. Thanks! @mattklein123 no more comments from me.

dio
dio previously approved these changes May 27, 2019
Copy link
Member

@dio dio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo Matt's.

Signed-off-by: Gary Brown <gary@brownuk.com>
@objectiser
Copy link
Contributor Author

@mattklein123 Done

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

/retest

@repokitteh-read-only
Copy link

🤷‍♀️ nothing to rebuild.

🐱

Caused by: a #6986 (review) was submitted by @mattklein123.

see: more, trace.

@mattklein123 mattklein123 merged commit e353807 into envoyproxy:master May 28, 2019
@dio
Copy link
Member

dio commented May 28, 2019

/azp run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overriding trace sampling configuration on a per route basis
3 participants