Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracer.activeSpan() returning null if Spring Cloud Gateway is present #1807

Closed
blake-bauman opened this issue Dec 15, 2020 · 7 comments
Closed

Comments

@blake-bauman
Copy link
Contributor

Description

After updating from Release Train 2020.0.0-M4 to 2020.0.0-M6, Tracer.activeSpan() is returning null if Spring Cloud Gateway is present. Issue is still present in RC1. It's as if OpenTracing isn't configured properly with the latest 2020.0.0 release and Spring Cloud Gateway. My guess is it has to do with #1757 breaking SCG integration?

If I revert to M4 or a Hoxton release, it works. If I remove Spring Cloud Gateway, it works (but of course, my routes no longer work).

Sample

I've created a sample project which demonstrates the problem here: https://github.com/blake-bauman/test-spring-cloud-sleuth-gateway

With SCG dependency included, the result is null, but remove the dependency and you get an active Span.

For reference, the application is:

@SpringBootApplication
public class DemoApplication {

    public static void main(final String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }

@Bean
    public RouterFunction<ServerResponse> helloWorld(final WebClient.Builder builder, final Tracer tracer) {
        return route(GET("/hello"), request -> Mono.fromCallable(() -> "Active Span is " + tracer.activeSpan())
                                                   .flatMap(result -> ServerResponse.ok().contentType(TEXT_PLAIN).body(fromValue(result))));
    }
}

In our real project, we've also hooked up the trace data to the logging, and the trace data is also missing from the logs with SCG included. Without SCG, it shows up.

@marcingrzejszczak
Copy link
Contributor

Please check the documentation https://docs.spring.io/spring-cloud-sleuth/docs/3.0.0-RC1/reference/html/integrations.html#sleuth-http-server-webflux-integration

With the latest releases we turn on manual reactor instrumentation. It's reliable (it actually works!) and you'll see an immense performance improvement. You can check the following part of our tests to see how to make this work

@Configuration(proxyBeanMethods = false)
@EnableAutoConfiguration
static class TestManualConfiguration {
Span spanInFoo;
@Bean
RouterFunction<ServerResponse> handlers(org.springframework.cloud.sleuth.Tracer tracing,
CurrentTraceContext currentTraceContext, ManualRequestSender requestSender) {
return route(GET("/noFlatMap"), request -> {
ServerWebExchange exchange = request.exchange();
WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, exchange,
() -> LOGGER.info("noFlatMap"));
Flux<Integer> one = requestSender.getAll().map(String::length);
return ServerResponse.ok().body(one, Integer.class);
}).andRoute(GET("/withFlatMap"), request -> {
ServerWebExchange exchange = request.exchange();
WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, exchange,
() -> LOGGER.info("withFlatMap"));
Flux<Integer> one = requestSender.getAll().map(String::length);
Flux<Integer> response = one
.flatMap(size -> requestSender.getAll().doOnEach(sig -> WebFluxSleuthOperators
.withSpanInScope(sig.getContext(), () -> LOGGER.info(sig.getContext().toString()))))
.map(string -> {
WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, exchange,
() -> LOGGER.info("WHATEVER YEAH"));
return string.length();
});
return ServerResponse.ok().body(response, Integer.class);
}).andRoute(GET("/foo"), request -> {
ServerWebExchange exchange = request.exchange();
WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, exchange, () -> {
LOGGER.info("foo");
this.spanInFoo = tracing.currentSpan();
});
return ServerResponse.ok().body(Flux.just(1), Integer.class);
});
}

In order to make your code work

@SpringBootApplication
public class DemoApplication {

    public static void main(final String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }

@Bean
    public RouterFunction<ServerResponse> helloWorld(final WebClient.Builder builder, org.springframework.cloud.sleuth.Tracer tracing, CurrentTraceContext currentTraceContext)) {
        return route(GET("/hello"), request -> Mono.fromCallable(() ->  WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, request.exchange(), () -> "Active Span is " + tracer.activeSpan())
                                                   .flatMap(result -> ServerResponse.ok().contentType(TEXT_PLAIN).body(fromValue(result))));
    }
}

The most important part is

WebFluxSleuthOperators.withSpanInScope(tracing, currentTraceContext, request.exchange(), () -> "Active Span is " + tracer.activeSpan())

When making such a call, this lambda () -> "Active Span is " + tracer.activeSpan()) will contain the current span in context.

@marcingrzejszczak
Copy link
Contributor

Another option (discouraged) is to set explicitly spring.sleuth.reactor.instrumentation-type=decorate-on-each but eventually I think we'll move away from that approach at all.

@blake-bauman
Copy link
Contributor Author

But the weird thing is, my code works perfectly fine until I add Spring Cloud Gateway as a dependency. Merely adding that dependency causes the code to break. If I'm on RC1 and no SCG, my code works just fine. Why is that?

@blake-bauman
Copy link
Contributor Author

Thanks for this. I have a couple of questions for clarification.

  1. We've got a bunch of code that gets shared among apps, some which have SCG and some which don't. Does the WebFluxSleuthOperators solution work whether instrumentation-type is set to manual or not?
  2. Does SCG still propagate the OpenTracing headers to the downstream service when instrumentation-type is set to manual?
  3. If instrumentation-type is set to manual and you're making a call with WebClient, do you have to use WebFluxSleuthOperators to get the headers sent?
  4. The SLF4J MDC no longer seems to have the tracing data when instrumentation-type is set to manual. How can we enable logging to include the tracing data?

@blake-bauman
Copy link
Contributor Author

Finally: what's the reason that including SCG forces this property to "manual" by default? Wouldn't it be best to keep the default the same between SCG apps and non-SCG apps for compatibility?

@marcingrzejszczak
Copy link
Contributor

We've got a bunch of code that gets shared among apps, some which have SCG and some which don't. Does the WebFluxSleuthOperators solution work whether instrumentation-type is set to manual or not?

What WebFluxSleuthOperators is it retrieves the tracing context that was set either in the ReactorContext or ServerWebExchange. The lamda that you pass to it will contain the tracing context put in scope. That means that your e.g. logging framework will contain proper MDC entries.

Does SCG still propagate the OpenTracing headers to the downstream service when instrumentation-type is set to manual?

SCG's instrumentation works regardless of the chosen instrumentation type. It reuses the ServerWebExchange to retrieve this information.

If instrumentation-type is set to manual and you're making a call with WebClient, do you have to use WebFluxSleuthOperators to get the headers sent?

WebClient should retrieve the current span from the ReactorContext so that should work out of the box. RestTemplate however, or anything else that operates on a ThreadLocal (like MDC and logging) requires the usage of WebFluxSleuthOperators.

The SLF4J MDC no longer seems to have the tracing data when instrumentation-type is set to manual. How can we enable logging to include the tracing data?

If you wrap logging in WebFluxSleuthOperators you will see MDC entries.

Finally: what's the reason that including SCG forces this property to "manual" by default? Wouldn't it be best to keep the default the same between SCG apps and non-SCG apps for compatibility?

Cause other instrumentation is broken and SCG works regardless of the chosen instrumentation type. The instrumentation type changes the way your custom code behaves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants