-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: Enable decorated operation in outbound (egress) listener to … #1858
Conversation
…be passed to inbound (ingress) listener to override server span operation Signed-off-by: Gary Brown <gary@brownuk.com>
@objectiser quick pre-review note that we will need docs for this change. It would be good to add this header into the router headers section that I mentioned in the issue, and also link to their from the tracing docs. Thank you. |
Signed-off-by: Gary Brown <gary@brownuk.com>
x-envoy-decorator-operation | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
If this header is present on ingress requests, it's value will override any locally defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/it's/its
include/envoy/http/header_map.h
Outdated
@@ -234,6 +234,7 @@ class HeaderEntry { | |||
HEADER_FUNC(EnvoyUpstreamRequestTimeoutAltResponse) \ | |||
HEADER_FUNC(EnvoyUpstreamRequestTimeoutMs) \ | |||
HEADER_FUNC(EnvoyUpstreamServiceTime) \ | |||
HEADER_FUNC(EnvoyDecoratorOperation) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Alpha order
include/envoy/router/router.h
Outdated
* This method returns the operation name. | ||
* @return the operation name | ||
*/ | ||
virtual const std::string getOperation() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::string&
ConnectionManagerImpl::chargeTracingStats(tracing_decision.reason, | ||
connection_manager_.config_.tracingStats()); | ||
|
||
if (tracing_decision.is_tracing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Now that you are in a new function, to reduce nesting can you do:
if (!tracing_decision.is_tracing) {
return;
}
active_span_ = connection_manager_.tracer_.startSpan(*this, *request_headers_, request_info_); | ||
if (cached_route_.value() && cached_route_.value()->decorator()) { | ||
cached_route_.value()->decorator()->apply(*active_span_); | ||
if (connection_manager_.config_.tracingConfig()->operation_name_ == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize the header propagation was also part of this. Can you make that more clear in the docs. Also, can you add some comments here about the logic in place? It's a little hard to quickly untangle what we are doing and why.
@@ -12,6 +12,7 @@ depends on the :ref:`use_remote_address <config_http_conn_man_use_remote_address | |||
|
|||
Envoy will potentially sanitize the following headers: | |||
|
|||
* :ref:`x-envoy-decorator-operation <config_http_filters_router_x-envoy-decorator-operation>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be implemented? (Remove header on external requests, need to add test for that also in conn_manager_utility)
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
if (request_headers_->EnvoyDecoratorOperation()) { | ||
decorator_operation = request_headers_->EnvoyDecoratorOperation()->value().c_str(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, there is some work going on in Istio to be able to decode a request using swagger spec/openapi spec.. And if that is the case, one might want to set the operation name based on the API spec. Would it make sense to have an interface for a filter to overwrite the operation name, in addition to the inbound header name doing the full override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense, but let's track in a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Few small things.
|
||
active_span_ = connection_manager_.tracer_.startSpan(*this, *request_headers_, request_info_); | ||
|
||
// If a decorator has been defined, apply it to the active span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: please end sentences with '.'. Here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be sorted.
@@ -502,6 +502,11 @@ void ConnectionManagerImpl::ActiveStream::decodeHeaders(HeaderMapPtr&& headers, | |||
state_.saw_connection_close_ = true; | |||
} | |||
|
|||
std::string decorator_operation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of allocating a string here on the stack, it would be faster to just pass the pointer value of the header entry into the trace function as you did before. Any reason to switch it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a core dump when running the coverage tests - although strangely not when doing the bazel.dev build which also runs the tests. The problem seems to be caused by the header entry being removed/unallocated.
[----------] 38 tests from HttpConnectionManagerImplTest
[ RUN ] HttpConnectionManagerImplTest.HeaderOnlyRequestAndResponse
[ OK ] HttpConnectionManagerImplTest.HeaderOnlyRequestAndResponse (2 ms)
[ RUN ] HttpConnectionManagerImplTest.InvalidPathWithDualFilter
[ OK ] HttpConnectionManagerImplTest.InvalidPathWithDualFilter (1 ms)
[ RUN ] HttpConnectionManagerImplTest.StartAndFinishSpanNormalFlow
pure virtual method called
terminate called without an active exception
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:101] Caught Aborted, suspect faulting address 0x3e8000008f7
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:85] Backtrace obj</lib/x86_64-linux-gnu/libc.so.6> thr<0> (use tools/stack_decode.py):
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<0> #0 0x7f26ff0d0428
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<0> #1 0x7f26ff0d2029
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<0> obj</usr/lib/x86_64-linux-gnu/libstdc++.so.6>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the code back? If it fails I can help you figure out what the issue is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
57116af
to
40b0b28
Compare
@@ -502,6 +502,8 @@ void ConnectionManagerImpl::ActiveStream::decodeHeaders(HeaderMapPtr&& headers, | |||
state_.saw_connection_close_ = true; | |||
} | |||
|
|||
const HeaderEntry* decorator_operation = request_headers_->EnvoyDecoratorOperation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So at minimum, I think you are grabbing a header before headers are cleaned, which means this can get deleted. Can you remove the local variable and do this inline below? (Or just pass the full constant header map ref to the trace function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a go tomorrow - although I think the problem is that the x-envoy-decorator-operation
is being removed in the sanitize step, as previously when I commented this line out the tests/coverage worked.
Signed-off-by: Gary Brown <gary@brownuk.com>
… internal request Signed-off-by: Gary Brown <gary@brownuk.com>
@mattklein123 Hi Matt, need some help. https://github.com/envoyproxy/envoy/pull/1858/files#diff-ed63d9053f7d0b3777c96f2aaf8f2f2fR400 - this test is failing when code for sanitizing the header is included. So this suggests this test is representing an external request, even though the comment https://github.com/envoyproxy/envoy/pull/1858/files#diff-ed63d9053f7d0b3777c96f2aaf8f2f2fR409 suggests it is internal. If this test is handling an external request, then I can change it to correctly check for a null header - and then introduce another test for internal request handling? If this is the case - how should I change the test to represent an internal request? Other option is to remove the santization code for now and deal with in a separate PR? thoughts? |
@objectiser I can check it out locally and see, but from a quick glance, given that |
@mattklein123 Adding that header made the test work - however looks like the sanitization is not having the desired effect. I tried out the jaegertracing example, and the decorated op name is not being propagated from the egress listener to the ingress: Is it ok to remove the sanitization for now and look at as a separate issue? |
Sure that's fine. Please open an issue so that we can track/remember. |
Signed-off-by: Gary Brown <gary@brownuk.com>
…1858) Description: Introduces an initial heuristic for attempting a connection using a socket with a bound interface. The heuristic is entirely contained within reportNetworkUsage(...), and as such is straightforward to iterate and experiment upon. In support of the heuristic, current network/configuration state is identified with a configuration_key that prevents stale information from influencing internal tracking. ~Current network state is stored and updated automatically.~ This allows static accessors to begin to report OS updates during Envoy's startup period, and ensures OS updates have a timely effect on new requests. The initial heuristic is as follows: For both WiFi and cellular network types, there is a default socket configuration that does not use a bound socket, and an alternative "override" configuration that will attempt to use an interface associated with the *opposite* network type (a cellular-bound socket when the preferred network type is WiFi, and a WiFi-bound socket when network type is cellular). A "network fault" is defined as a request that terminates in error while having received no upstream bytes. If a connection has never handled a request without a fault, it is allowed only one fault before the alternative will be tried. If a connection has ever handled a request without a fault, it is allowed up to three consecutive faults before the alternative will be tried. The above behavior is disabled by default, and may be enabled by settings enableInterfaceBinding(true) on the public EngineBuilders. Risk Level: Moderate Testing: Manual and new/updated coverage Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>
…1858) Description: Introduces an initial heuristic for attempting a connection using a socket with a bound interface. The heuristic is entirely contained within reportNetworkUsage(...), and as such is straightforward to iterate and experiment upon. In support of the heuristic, current network/configuration state is identified with a configuration_key that prevents stale information from influencing internal tracking. ~Current network state is stored and updated automatically.~ This allows static accessors to begin to report OS updates during Envoy's startup period, and ensures OS updates have a timely effect on new requests. The initial heuristic is as follows: For both WiFi and cellular network types, there is a default socket configuration that does not use a bound socket, and an alternative "override" configuration that will attempt to use an interface associated with the *opposite* network type (a cellular-bound socket when the preferred network type is WiFi, and a WiFi-bound socket when network type is cellular). A "network fault" is defined as a request that terminates in error while having received no upstream bytes. If a connection has never handled a request without a fault, it is allowed only one fault before the alternative will be tried. If a connection has ever handled a request without a fault, it is allowed up to three consecutive faults before the alternative will be tried. The above behavior is disabled by default, and may be enabled by settings enableInterfaceBinding(true) on the public EngineBuilders. Risk Level: Moderate Testing: Manual and new/updated coverage Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>
…be passed to inbound (ingress) listener to override server span operation
Resolves #1849
Signed-off-by: Gary Brown gary@brownuk.com