Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: Enable decorated operation in outbound (egress) listener to … #1858

Merged
merged 10 commits into from
Oct 18, 2017

Conversation

objectiser
Copy link
Contributor

…be passed to inbound (ingress) listener to override server span operation

Resolves #1849

Signed-off-by: Gary Brown gary@brownuk.com

…be passed to inbound (ingress) listener to override server span operation

Signed-off-by: Gary Brown <gary@brownuk.com>
@mattklein123
Copy link
Member

@objectiser quick pre-review note that we will need docs for this change. It would be good to add this header into the router headers section that I mentioned in the issue, and also link to their from the tracing docs. Thank you.

Signed-off-by: Gary Brown <gary@brownuk.com>
x-envoy-decorator-operation
^^^^^^^^^^^^^^^^^^^^^^^^^^^

If this header is present on ingress requests, it's value will override any locally defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/it's/its

@@ -234,6 +234,7 @@ class HeaderEntry {
HEADER_FUNC(EnvoyUpstreamRequestTimeoutAltResponse) \
HEADER_FUNC(EnvoyUpstreamRequestTimeoutMs) \
HEADER_FUNC(EnvoyUpstreamServiceTime) \
HEADER_FUNC(EnvoyDecoratorOperation) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Alpha order

* This method returns the operation name.
* @return the operation name
*/
virtual const std::string getOperation() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::string&

ConnectionManagerImpl::chargeTracingStats(tracing_decision.reason,
connection_manager_.config_.tracingStats());

if (tracing_decision.is_tracing) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Now that you are in a new function, to reduce nesting can you do:

if (!tracing_decision.is_tracing) {
  return;
}

active_span_ = connection_manager_.tracer_.startSpan(*this, *request_headers_, request_info_);
if (cached_route_.value() && cached_route_.value()->decorator()) {
cached_route_.value()->decorator()->apply(*active_span_);
if (connection_manager_.config_.tracingConfig()->operation_name_ ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize the header propagation was also part of this. Can you make that more clear in the docs. Also, can you add some comments here about the logic in place? It's a little hard to quickly untangle what we are doing and why.

@@ -12,6 +12,7 @@ depends on the :ref:`use_remote_address <config_http_conn_man_use_remote_address

Envoy will potentially sanitize the following headers:

* :ref:`x-envoy-decorator-operation <config_http_filters_router_x-envoy-decorator-operation>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be implemented? (Remove header on external requests, need to add test for that also in conn_manager_utility)

Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
if (request_headers_->EnvoyDecoratorOperation()) {
decorator_operation = request_headers_->EnvoyDecoratorOperation()->value().c_str();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, there is some work going on in Istio to be able to decode a request using swagger spec/openapi spec.. And if that is the case, one might want to set the operation name based on the API spec. Would it make sense to have an interface for a filter to overwrite the operation name, in addition to the inbound header name doing the full override?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, but let's track in a separate issue?

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Few small things.


active_span_ = connection_manager_.tracer_.startSpan(*this, *request_headers_, request_info_);

// If a decorator has been defined, apply it to the active span
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: please end sentences with '.'. Here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be sorted.

@@ -502,6 +502,11 @@ void ConnectionManagerImpl::ActiveStream::decodeHeaders(HeaderMapPtr&& headers,
state_.saw_connection_close_ = true;
}

std::string decorator_operation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating a string here on the stack, it would be faster to just pass the pointer value of the header entry into the trace function as you did before. Any reason to switch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a core dump when running the coverage tests - although strangely not when doing the bazel.dev build which also runs the tests. The problem seems to be caused by the header entry being removed/unallocated.

[----------] 38 tests from HttpConnectionManagerImplTest
[ RUN ] HttpConnectionManagerImplTest.HeaderOnlyRequestAndResponse
[ OK ] HttpConnectionManagerImplTest.HeaderOnlyRequestAndResponse (2 ms)
[ RUN ] HttpConnectionManagerImplTest.InvalidPathWithDualFilter
[ OK ] HttpConnectionManagerImplTest.InvalidPathWithDualFilter (1 ms)
[ RUN ] HttpConnectionManagerImplTest.StartAndFinishSpanNormalFlow
pure virtual method called
terminate called without an active exception
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:101] Caught Aborted, suspect faulting address 0x3e8000008f7
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:85] Backtrace obj</lib/x86_64-linux-gnu/libc.so.6> thr<0> (use tools/stack_decode.py):
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<0> #0 0x7f26ff0d0428
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<0> #1 0x7f26ff0d2029
[2017-10-17 19:29:59.086][2295][critical][backtrace] bazel-out/local-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<0> obj</usr/lib/x86_64-linux-gnu/libstdc++.so.6>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the code back? If it fails I can help you figure out what the issue is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
@@ -502,6 +502,8 @@ void ConnectionManagerImpl::ActiveStream::decodeHeaders(HeaderMapPtr&& headers,
state_.saw_connection_close_ = true;
}

const HeaderEntry* decorator_operation = request_headers_->EnvoyDecoratorOperation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at minimum, I think you are grabbing a header before headers are cleaned, which means this can get deleted. Can you remove the local variable and do this inline below? (Or just pass the full constant header map ref to the trace function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a go tomorrow - although I think the problem is that the x-envoy-decorator-operation is being removed in the sanitize step, as previously when I commented this line out the tests/coverage worked.

Signed-off-by: Gary Brown <gary@brownuk.com>
… internal request

Signed-off-by: Gary Brown <gary@brownuk.com>
@objectiser
Copy link
Contributor Author

@mattklein123 Hi Matt, need some help.

https://github.com/envoyproxy/envoy/pull/1858/files#diff-ed63d9053f7d0b3777c96f2aaf8f2f2fR400 - this test is failing when code for sanitizing the header is included. So this suggests this test is representing an external request, even though the comment https://github.com/envoyproxy/envoy/pull/1858/files#diff-ed63d9053f7d0b3777c96f2aaf8f2f2fR409 suggests it is internal.

If this test is handling an external request, then I can change it to correctly check for a null header - and then introduce another test for internal request handling? If this is the case - how should I change the test to represent an internal request?

Other option is to remove the santization code for now and deal with in a separate PR? thoughts?

@mattklein123
Copy link
Member

@objectiser I can check it out locally and see, but from a quick glance, given that use_remote_address_ is being set to false, I think you just need to set XFF to an internal address. So in the dispatched headers add something like x-forwarded-for: 10.0.0.1

@objectiser
Copy link
Contributor Author

@mattklein123 Adding that header made the test work - however looks like the sanitization is not having the desired effect. I tried out the jaegertracing example, and the decorated op name is not being propagated from the egress listener to the ingress:

envoy-op-override

Is it ok to remove the sanitization for now and look at as a separate issue?

@mattklein123
Copy link
Member

Is it ok to remove the sanitization for now and look at as a separate issue?

Sure that's fine. Please open an issue so that we can track/remember.

Signed-off-by: Gary Brown <gary@brownuk.com>
jpsim pushed a commit that referenced this pull request Nov 28, 2022
…1858)

Description: Introduces an initial heuristic for attempting a connection using a socket with a bound interface. The heuristic is entirely contained within reportNetworkUsage(...), and as such is straightforward to iterate and experiment upon. In support of the heuristic, current network/configuration state is identified with a configuration_key that prevents stale information from influencing internal tracking. ~Current network state is stored and updated automatically.~ This allows static accessors to begin to report OS updates during Envoy's startup period, and ensures OS updates have a timely effect on new requests.

The initial heuristic is as follows:

For both WiFi and cellular network types, there is a default socket configuration that does not use a bound socket, and an alternative "override" configuration that will attempt to use an interface associated with the *opposite* network type (a cellular-bound socket when the preferred network type is WiFi, and a WiFi-bound socket when network type is cellular).

A "network fault" is defined as a request that terminates in error while having received no upstream bytes.
If a connection has never handled a request without a fault, it is allowed only one fault before the alternative will be tried.
If a connection has ever handled a request without a fault, it is allowed up to three consecutive faults before the alternative will be tried.

The above behavior is disabled by default, and may be enabled by settings enableInterfaceBinding(true) on the public EngineBuilders.

Risk Level: Moderate
Testing: Manual and new/updated coverage

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
…1858)

Description: Introduces an initial heuristic for attempting a connection using a socket with a bound interface. The heuristic is entirely contained within reportNetworkUsage(...), and as such is straightforward to iterate and experiment upon. In support of the heuristic, current network/configuration state is identified with a configuration_key that prevents stale information from influencing internal tracking. ~Current network state is stored and updated automatically.~ This allows static accessors to begin to report OS updates during Envoy's startup period, and ensures OS updates have a timely effect on new requests.

The initial heuristic is as follows:

For both WiFi and cellular network types, there is a default socket configuration that does not use a bound socket, and an alternative "override" configuration that will attempt to use an interface associated with the *opposite* network type (a cellular-bound socket when the preferred network type is WiFi, and a WiFi-bound socket when network type is cellular).

A "network fault" is defined as a request that terminates in error while having received no upstream bytes.
If a connection has never handled a request without a fault, it is allowed only one fault before the alternative will be tried.
If a connection has ever handled a request without a fault, it is allowed up to three consecutive faults before the alternative will be tried.

The above behavior is disabled by default, and may be enabled by settings enableInterfaceBinding(true) on the public EngineBuilders.

Risk Level: Moderate
Testing: Manual and new/updated coverage

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants