Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring Boot 3 Webflux project missing traceId and spanId in logs #33372

Closed
sanderino666 opened this issue Nov 26, 2022 · 72 comments
Closed

Spring Boot 3 Webflux project missing traceId and spanId in logs #33372

sanderino666 opened this issue Nov 26, 2022 · 72 comments
Labels
status: superseded An issue that has been superseded by another theme: observability Issues related to observability

Comments

@sanderino666
Copy link

When upgrading to Spring Boot 3 (and Micrometer) I noticed the traceId and spanId are not added to the logs when it's a Webflux project.

Webflux: https://github.com/sanderino666/webflux-micrometer-logs

2022-11-26T17:27:44.991+01:00 INFO [,,] 19376 --- [ctor-http-nio-3] com.example.demo.DemoApplication : home() has been called

MVC: https://github.com/sanderino666/mvc-micrometer-logs

2022-11-26T17:24:12.050+01:00 INFO [,63823dabe8475ad8d31ec1fc81853dcc,d31ec1fc81853dcc] 1848 --- [nio-8080-exec-1] com.example.demomvc.DemoMvcApplication : home() has been called

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 26, 2022
@bclozel
Copy link
Member

bclozel commented Nov 26, 2022

This is the expected behavior. See spring-projects/spring-framework#29466

I'm keeping this opened for now as we need to find the best place to document this.

@bclozel bclozel self-assigned this Nov 26, 2022
@sanderino666
Copy link
Author

Interesting, we are switching from Spring Cloud Sleuth to Micrometer and with SCS this all worked just like it did in a MVC app. Was that sheer luck?

I am trying to understand the answers spring-projects/spring-framework#29466, does it mean we have to wrap each time we want pass the traceId and spanId to the logs?

@codependent
Copy link

I was also surprised when upgraded and didn't see the traceId/spanId in the logs anymore. It'd be really helpful to document the change of behaviour and how to make it work in webflux applications.

@codependent
Copy link

codependent commented Nov 28, 2022

I guess we are expected to open the scope using the observation located in the context, aren't we?

This allowed the creation of the traceId/spanId. However the WebClient doesn't seem to be propagating the correlation headers. Could you point at how we can achieve that? I think Sleuth automatically instrumented the WebClient, is there an equivalent with Micrometer Tracing or we should implement it by ourselves?

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        return webClient.get().uri("http://localhost:7654/helloWc")
            .retrieve()
            .bodyToMono(String::class.java)
            .doOnNext {
                logger.info("Retrieved hello")
            }
            .contextWrite{ context: Context ->
                val observation = context.get<Observation>(ObservationThreadLocalAccessor.KEY)
                observation.openScope()
                logger.info("Generated traceId/spanId")
                context
            }
    }

    @GetMapping("/helloWc")
    fun helloWc(): Mono<String> {
        return Mono.just("HelloWc")
            .map {
                logger.info("HelloWc map!")
                it
            }
            .contextWrite{ context: Context ->
                val observation = context.get<Observation>(ObservationThreadLocalAccessor.KEY)
                observation.openScope()
                logger.info("Propagated traceId - generated spanId")
                context
            }
    }
2022-11-28T23:34:15.761+01:00  INFO [server,638537626c6674288df2d3f47ac89a95,8df2d3f47ac89a95] 15811 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Generated traceId/spanId
2022-11-28T23:34:20.625+01:00  INFO [server,6385376b55b888c43c9ede86ca645b6f,3c9ede86ca645b6f] 15811 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Propagated traceId - generated spanId
2022-11-28T23:34:22.242+01:00  INFO [server,6385376b55b888c43c9ede86ca645b6f,3c9ede86ca645b6f] 15811 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc map!
2022-11-28T23:34:23.074+01:00  INFO [server,638537626c6674288df2d3f47ac89a95,8df2d3f47ac89a95] 15811 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved hello

Notice the Propagated traceId - generated spanId has a new traceId (6385376b55b888c43c9ede86ca645b6f) instead of the original one (638537626c6674288df2d3f47ac89a95)

@wilkinsona
Copy link
Member

/cc @marcingrzejszczak

@marcingrzejszczak
Copy link
Contributor

WebClient automatically injects trace context to the headers - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/test/java/org/springframework/web/reactive/function/client/WebClientObservationTests.java . We have also standalone samples for that here https://github.com/micrometer-metrics/micrometer-samples/tree/main/webclient

I guess we are expected to open the scope using the observation located in the context, aren't we?

You don't have to because you will pollute the threads. What we do is we require the ObservationThreadLocalAccessor.KEY to be there in Reactor Context and then we manually correlate observations - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/main/java/org/springframework/web/reactive/function/client/DefaultWebClient.java#L464

Have you created the WebFilter as a bean like this? https://github.com/micrometer-metrics/micrometer-samples/blob/main/webclient/src/main/java/com/example/micrometer/WebClientApplication.java#L48 Maybe your ObservationRegistry wasn't injected. For sure you shouldn't open scopes as you showed that in the sample.

@marcingrzejszczak
Copy link
Contributor

I am trying to understand the answers spring-projects/spring-framework#29466, does it mean we have to wrap each time we want pass the traceId and spanId to the logs?

For now yes, you can read more about it here https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries

@codependent
Copy link

@marcingrzejszczak thanks for the clarification, I'll go over the references you provided.

@codependent
Copy link

@marcingrzejszczak

I've updated the sample, now it looks like this:

@RestController
@SpringBootApplication
class DemotracingApplication(
    private val webClient: WebClient,
    private val observationRegistry: ObservationRegistry,
    private val tracer: Tracer
) {

    private val logger = LoggerFactory.getLogger(javaClass)

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        val observation: Observation = Observation.start("webclient-sample", observationRegistry)
        return Mono.just(observation).flatMap { span ->
            observation.scoped {
                logger.info("Hello!")
            }
            webClient.get().uri("http://localhost:7654/helloWc")
                .retrieve()
                .bodyToMono(String::class.java)
                .doOnNext {
                    observation.scoped {
                        logger.info("Retrieved hello")
                    }
                }
        }.doFinally { signalType ->
            observation.stop()
        }.contextWrite { it.put(ObservationThreadLocalAccessor.KEY, observation) }
    }

    @GetMapping("/helloWc")
    fun helloWc(): Mono<String> {
        val observation: Observation = Observation.start("webclient-samplewc", observationRegistry)
        observation.scoped {
            logger.info("HelloWc!")
        }
        return Mono.just("HelloWc")
            .map {
                observation.scoped {
                    logger.info("HelloWc map!")
                }
                it
            }
    }
}

The WebClient is defined as a bean:

@Configuration
internal class WebClientConfiguration {

    @Bean
    fun webClient (): WebClient {
        return WebClient.builder().build()
    }
}

I can see the generated traceId/spanId in the observation.scoped block logs. However, the service that is invoked by the WebClient still shows a different traceId:

2022-11-29T17:05:58.008+01:00  INFO [server,63862de503de9741d4bf5e608895f0b6,d4bf5e608895f0b6] 19035 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Hello!
2022-11-29T17:05:58.243+01:00  INFO [server,63862de60d0ff4fcb42ff242ce42f22c,b42ff242ce42f22c] 19035 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc!
2022-11-29T17:05:58.244+01:00  INFO [server,63862de60d0ff4fcb42ff242ce42f22c,b42ff242ce42f22c] 19035 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc map!
2022-11-29T17:05:58.265+01:00  INFO [server,63862de503de9741d4bf5e608895f0b6,d4bf5e608895f0b6] 19035 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved hello

What am I missing?

@wilkinsona
Copy link
Member

@codependent You should inject the auto-configured WebClient.Builder rather than creating a new one and then call build() on it to create your WebClient bean. Using the auto-configured builder ensures that the client's instrumented for observation.

@codependent
Copy link

@wilkinsona How strange, I updated it to use the WebClient.Builder but still see that requests aren't being correlated.

The sample project I'm using is available here in case you can have a look a it: https://github.com/codependent/demotracing

@Configuration
internal class WebClientConfiguration {

    @Bean
    fun webClient (webClientBuilder: WebClient.Builder): WebClient {
        return webClientBuilder.build()
    }
}
2022-11-29T17:51:10.870+01:00  INFO [server,6386387e6592debc20bd5a56f3a991a9,20bd5a56f3a991a9] 19544 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Hello!
2022-11-29T17:51:10.969+01:00  INFO [server,6386387ed0cd998c9e9e1aa80a2fd1ea,9e9e1aa80a2fd1ea] 19544 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc!
2022-11-29T17:51:10.970+01:00  INFO [server,6386387ed0cd998c9e9e1aa80a2fd1ea,9e9e1aa80a2fd1ea] 19544 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc map!
2022-11-29T17:51:10.994+01:00  INFO [server,6386387e6592debc20bd5a56f3a991a9,20bd5a56f3a991a9] 19544 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved hello

@codependent
Copy link

I had a look at the Webflux sample and adapted the code:

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    val traceId = tracer.currentSpan()!!.context().traceId()
                    logger.info("<ACCEPTANCE_TEST> <TRACE:{}> Hello!", traceId)
                    webClient.get().uri("http://localhost:7654/helloWc")
                        .retrieve()
                        .bodyToMono(String::class.java)
                        .doOnNext {
                            logger.info("Retrieved helloWc {}", it)
                        }
                }
        }
    }

    @GetMapping("/helloWc")
    fun helloWc(): Mono<String> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    val traceId = tracer.currentSpan()!!.context().traceId()
                    logger.info("<ACCEPTANCE_TEST> <TRACE:{}> HelloWc", traceId)
                    Mono.just(traceId)
            }
        }
    }

Now I can see that the log correlation almost works as expected:

2022-11-29T22:08:33.824+01:00  INFO [server,638674d1a9fb39a38867f5b4c3bbe2ec,8867f5b4c3bbe2ec] 20745 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638674d1a9fb39a38867f5b4c3bbe2ec> Hello!
2022-11-29T22:08:33.938+01:00  INFO [server,638674d1a9fb39a38867f5b4c3bbe2ec,49ff37d958d5d315] 20745 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638674d1a9fb39a38867f5b4c3bbe2ec> HelloWc
2022-11-29T22:08:33.960+01:00  INFO [server,,] 20745 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved helloWc io.micrometer.context.DefaultContextSnapshot$DefaultScope@7739ec0c

If you look at the last line of the log, that correspondes to the doOnNext block, it doesn't show the traceId/spanId.

Modifying the code to include another nested ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY) call inside the ´doOnNext{}´, the traceId/spanId are shown:

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    val traceId = tracer.currentSpan()!!.context().traceId()
                    logger.info("<ACCEPTANCE_TEST> <TRACE:{}> Hello!", traceId)
                    webClient.get().uri("http://localhost:7654/helloWc")
                        .retrieve()
                        .bodyToMono(String::class.java)
                        .doOnNext {
                            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                                .use { scope: ContextSnapshot.Scope ->
                                    logger.info("Retrieved helloWc {}", it)
                                }
                        }
                }
        }
    }
2022-11-29T22:12:58.103+01:00  INFO [server,638675da838c746176c5388aedc7127f,76c5388aedc7127f] 20849 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638675da838c746176c5388aedc7127f> Hello!
2022-11-29T22:12:58.215+01:00  INFO [server,638675da838c746176c5388aedc7127f,3fd552c513cb339b] 20849 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638675da838c746176c5388aedc7127f> HelloWc
2022-11-29T22:12:58.237+01:00  INFO [server,638675da838c746176c5388aedc7127f,76c5388aedc7127f] 20849 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved helloWc 638675da838c746176c5388aedc7127f

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

BTW, thanks for your answers.

@marcingrzejszczak
Copy link
Contributor

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing doOnNext just call tap() or handle() and then you'll get the MDC correlated for you.

I'm in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.

@codependent
Copy link

codependent commented Nov 29, 2022

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing doOnNext just call tap() or handle() and then you'll get the MDC correlated for you.

I'm in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.

Great to know!

One last thing, it'd be good to document how to proceed with Kotlin coroutines as well. This approach doesn't look quite idiomatic, is there a better (more idiomatic) way to do the MDC setup in this case?

    @PostMapping
    @ResponseStatus(CREATED)
    suspend fun registerMetric(
        Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
            logger.info("registerMetric()")
            metricEventProducer.send(authentication.clientId, metric).toMono()
        }.awaitSingle()
    }

@marcingrzejszczak
Copy link
Contributor

Hmm theoretically if you have micrometer-core on the classpath it should work out of the box? (https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/kotlin/io/micrometer/core/instrument/kotlin/AsContextElement.kt)

@codependent
Copy link

codependent commented Nov 30, 2022

@marcingrzejszczak micrometer-core is actually on the classpath, I'm setting the context using asContextElement:

class DemotracingApplication(
    private val webClient: WebClient,
    private val tracer: Tracer,
    private val observationRegistry: ObservationRegistry
) {

    private val logger = LoggerFactory.getLogger(javaClass)

    @GetMapping("/hello")
    suspend fun hello(): String {
        val context = observationRegistry.asContextElement()
        return withContext(context) {
            val traceId = tracer.currentSpan()?.context()?.traceId()
            logger.info("<ACCEPTANCE_TEST> <TRACE:{}> Hello!", traceId)
            val response: String = webClient.get().uri("http://localhost:7654/helloWc")
                .retrieve()
                .bodyToMono(String::class.java)
                .awaitSingle()
            response
        }


    }

    @GetMapping("/helloWc")
    suspend fun helloWc(): String {
        val context = observationRegistry.asContextElement()
        return withContext(context) {
            val traceId = tracer.currentSpan()?.context()?.traceId()
            logger.info("<ACCEPTANCE_TEST> <TRACE:{}> HelloWc", traceId)
            "Hi there"
        }
    }

}

Unfortunately MDC doesn't seem to be set and traceId is null:

2022-11-30T12:41:30.734+01:00  INFO [server,,] 23335 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:null> Hello!
2022-11-30T12:45:08.144+01:00  INFO [server,,] 23335 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:null> HelloWc

@marcingrzejszczak
Copy link
Contributor

I'm not an expert with kotlin - can you debug that the kotlin classes in micrometer do actually get called?

@codependent
Copy link

codependent commented Dec 1, 2022

What I've seen so far is:

@GetMapping("/hello")
suspend fun hello(): String {
    val context: CoroutineContext = observationRegistry.asContextElement()
    return withContext(context) {

When withContext is invoked it ends up calling KotlinObservationContextElement.updateThreadContext():

package io.micrometer.core.instrument.kotlin;

class KotlinObservationContextElement implements ThreadContextElement<ContextSnapshot.Scope> {

    @Override
    @SuppressWarnings("MustBeClosedChecker")
    public ContextSnapshot.Scope updateThreadContext(CoroutineContext coroutineContext) {
        return this.contextSnapshot.setThreadLocals(ObservationThreadLocalAccessor.KEY::equals);
    }
    public Scope setThreadLocals(Predicate<Object> keyPredicate) {
        Map<Object, Object> previousValues = null;
        for (ThreadLocalAccessor<?> accessor : this.contextRegistry.getThreadLocalAccessors()) {

Here the this.contextRegistry.getThreadLocalAccessors() is an empty Collection.

Unlike the Reactive sample, with coroutines ObservationThreadLocalAccessor.setValue(Observation value) is never called

    @Override
    public void setValue(Observation value) {
        Observation.Scope scope = value.openScope();
        scopes.set(scope);
    }

@rp199
Copy link

rp199 commented Dec 2, 2022

@codependent I'm also interested in getting this to work in a webflux+coroutine environment and I think I know why you sample code doesn't work.

As far as I've understood, the observationRegistry.asContextElement() relies on the ThreadLocal to load the ContextSnapshot (not 100% sure what is this, but I understand that it's where the "observation settings and values" for the given span are "stored"). In your example, the ThreadLocal would be empty as there's nothing that configures/decorates for that request (unless something leaks from other requests ofc) . I guess this extensions was intended to be used on the 1 thread per request model, where the ThreadLocal can be (most of the times) trusted.

As we are in the "reactor" world here, we need to have a way of building this context element from the reactor context instead. I managed to get it to work by using the following snippet on filter (I'm using functional endpoints):

val coContextElement = ContextSnapshot.setThreadLocalsFrom(
                coroutineContext[ReactorContext]!!.context,
                ObservationThreadLocalAccessor.KEY
            ).use {
                observationRegistry.asContextElement()
            }

withContext(coContextElement){
// Everything should work just fine inside this scope
}

I've quickly tested this and it seems to be working fine, I'm able to log with all MDC fields, including trace and span id.

Anyway, I think there could be a simpler way to achieve this, as this is verbose and also feels that is doing more steps than needed:

  • Get reactor context from coroutine context,
  • Get observation "settings" (not sure how to call it 😄 ) from reactor context.
  • Set Thread locals just to get the correct contextElement.
  • Shift the coroutine context to this new context element.

I'm deducting this based on my understanding of coroutines and reactor contexts, I don't understand much what happens behind the scenes on the tracing part, so be aware that I could be assuming wrong things here.

It would be nice to have a better way to achieve here. Maybe an extension that could build the context element directly from the reactor context?

Would it be also nice it we could manage to get the traceId/spanId and any other baggage fields from the reactor context directly, instead of having to having to inject a tracer and doing these workarounds (on spring sleuth, it was possible to set and extract baggage values purely on the reactor context, here I couldn't find a direct way).

@codependent
Copy link

@rp199 you nailed it!

I agree with you, things were quite easier with Sleuth, where all the instrumentation was completely transparent from our code. This new approach feels kind of invasive but I guess there must be good reasons (performance, bugs...) for it.

As you mentioned it'd be nice to have a simpler way to deal with the obvervations and context propagation.

@akoufa
Copy link

akoufa commented Dec 2, 2022

Hello to all. I have created the following helper (still hacky code, will improve it later)

suspend fun Logger.debugAsync(message: String) { Mono.just(Unit).handle { t, u -> error(message) u.next(t) }.awaitSingleOrNull()

This works and I am seeing traceId and spanId in the logs.

Also using the io.micrometer.context-propagation the traceId is successfully passed to the other microservices

@codependent
Copy link

codependent commented Dec 2, 2022

@rp199 you mentioned that you applied it using a filter, I guess you meant a HandlerFilterFunction implementation, right? Would you mind showing some code of the function|handler|filter?

I was wondering if there could be a way to move up that configuration to a traditional WebFilter and came across two problems

  1. Logger invocations inside the filter aren't being correlated
@Component
class ObservationContextPropagationFilter: WebFilter {

    private val logger = LoggerFactory.getLogger(javaClass)
    override fun filter(exchange: ServerWebExchange, chain: WebFilterChain): Mono<Void> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    logger.info("Filter")
                    chain.filter(exchange)
                }
        }
    }
}
2022-12-02T11:15:40.322+01:00  INFO [server,,] 34193 --- [ctor-http-nio-3] .e.d.ObservationContextPropagationFilter : Filter
  1. It doesn't seem to be possible to set the coroutine context from a WebFilter and propagate it downstream.

I'm scraching my head to see if I can come up with a way to configure the appropriate context without having to modify lots of places in a huge codebase.

@akoufa That's a nice workaround, the only downside is that we would need to wrap quite a few methods, for different log levels and number of arguments:

suspend fun Logger.observedInfo(message: String) {
    observedInfoGen { info(message) }
}

suspend fun Logger.observedInfo(message: String, arg: Any) {
    observedInfoGen { info(message, arg) }
}

suspend fun Logger.observedInfo(message: String, arg: Any, arg2: Any) {
    observedInfoGen{ info(message, arg, arg2) }
}

private suspend fun observedInfoGen(f: () -> Any?) {
    Mono.just(Unit)
        .handle { t: Unit, u: SynchronousSink<Any> ->
            f()
            u.next(t)
        }.awaitSingleOrNull()
}

Besides, when having multiple calls...

logger.observedInfo("Hey")
...
logger.observedInfo("My message is {}", "Hey")

... each one of them sets up the current Observation into the context. I wonder if it might be less performant that setting it up only once per request using withContext(coContextElement).

@akoufa
Copy link

akoufa commented Dec 2, 2022

@codependent Yes unfortunately my solution has a downside to adapt every log statement or at least the important ones like debug, error needed for log correlation when debugging. But I did not found any other less invasive solution yet.

I think you just need to add .contextCapture() . It work's without it but according to the docs they are adding it after the handle or tap . But I did not figure out yet why it seems to work without it. Maybe @marcingrzejszczak can help here

@marcingrzejszczak
Copy link
Contributor

I think you just need to add .contextCapture() . It work's without it but according to the docs they are adding it after the handle or tap . But I did not figure out yet why it seems to work without it. Maybe @marcingrzejszczak can help here

  • .contextCapture() - that will take all the thread locals and put them to the reactor context
  • handle and tap will reuse whatever is there in the context and put it back to thread loacl

You don't need to use contextCapture if the reactor context already contains all the necessary information (like the ObservationThreadLocalAccessor.KEY entry)

@bclozel bclozel added type: documentation A documentation update theme: observability Issues related to observability and removed status: waiting-for-triage An issue we've not yet triaged labels Dec 2, 2022
@bclozel bclozel added the status: waiting-for-triage An issue we've not yet triaged label Feb 17, 2023
@VinishKumar1
Copy link

VinishKumar1 commented Feb 20, 2023

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well? @marcingrzejszczak

@bclozel
Copy link
Member

bclozel commented Feb 20, 2023

@VinishKumar1 this part will be covered with #34201

@akoufa
Copy link

akoufa commented Feb 20, 2023

@bclozel When I define the depedencies listed above, I get the following in the logs 2023-02-20 13:31:58.285 TRACE [00000000000000000000000000000000,0000000000000000,,,] but only for log statements inside mono blocks like:

mono {
// CoroutineScope
}

Is this expected ? Aren't Coroutines supported yet ?

@akoufa
Copy link

akoufa commented Feb 20, 2023

@marcingrzejszczak Yes using the manual approach e.g. creating a coroutine wrapper around handle did work. But now after removing the manual code and updating the libs to the latest versions, as you suggested above, I am getting the following:

  • Log statements in "normal" Reactor functions:
    2023-02-20 14:27:49.378 TRACE [a2b033b003e3640540db74bfe264f6f7,c4d8665e7925da14,12345,,]
  • Log statements inside Coroutines like mono {}
    2023-02-20 14:27:49.378 TRACE [00000000000000000000000000000000,0000000000000000,,,]

@akoufa
Copy link

akoufa commented Feb 20, 2023

@bclozel @marcingrzejszczak I have created a sample project:

https://github.com/akoufa/springbootissues

In this project I see wrong trace id and span id log statements inside mono {} coroutine blocks. Is there anything I may have configured wrongly ?

@fernandofraga109
Copy link

Please check this entry in the docs https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries . For your convenience I'm copying it below

In Reactor 3.5.3 release (through this PR) an option to turn on automated context propagation was added. To use this, please ensure that you’re using the following projects at minimum in the following versions:

To use the feature call the new Reactor’s Hook method (e.g. in your public static void main method) like this

Hooks.enableAutomaticContextPropagation();

So just manually bump the versions of all of the libraries presented above and you will have this feature available. I guess we're planning to add this support to the next Boot release cc @bclozel ?

@marcingrzejszczak using new versions, the behavior of the traceId and SpanId would be the same as Sleuth had? I tried using those but it is still not propagating the traceId and spanId to the other threads.

11:27:45->[XNIO-1 task-2] [INFO] - [63f8c961d374727e2acbbd4585e64428, 3ef262641c3e01ce] - log entry 1
11:29:57->[XNIO-1 task-2] [INFO] - [63f8c961d374727e2acbbd4585e64428, 3ef262641c3e01ce] - log entry 2
11:29:57->[reactor-http-nio-2] [INFO] - [, ] - log entry 3
11:29:57->[reactor-http-nio-2] [INFO] - [, ] - log entry 4

@bclozel
Copy link
Member

bclozel commented Feb 24, 2023

After checking with several samples, #34201 will indeed fix the problem initially reported. I'm closing this issue as superseded. We don't expect this issue to cover all possible behavior differences with Sleuth, so others should have dedicated issues.

@fernandofraga109 please create a new issue with a sample application demonstrating the behavior. It's hard to know without a concrete application where the problem is. It's very likely not in Spring Boot, but probably more in Spring Framework, Reactor Netty or Micrometer directly. We can start with an issue in Spring Framework. Thanks!

@bclozel bclozel closed this as not planned Won't fix, can't repro, duplicate, stale Feb 24, 2023
@bclozel bclozel added status: superseded An issue that has been superseded by another and removed status: waiting-for-triage An issue we've not yet triaged labels Feb 24, 2023
@silverwind85
Copy link

I have the same problem like you @fernandofraga109.
The first thread shows tracking Id, the second thread doesn't ( not propagating ). You opened some issue @fernandofraga109?
I don't use spring security so this solution is not needed @EnableReactiveMethodSecurity(useAuthorizationManager = false)
I have following version: > > * Reactor 3.5.4

  • Micrometer Context-Propagation 1.0.3
  • Micrometer 1.10.4
  • Micrometer Tracing 1.0.2

@fernandofraga109
Copy link

fernandofraga109 commented Mar 28, 2023

I have the same problem like you @fernandofraga109. The first thread shows tracking Id, the second thread doesn't ( not propagating ). You opened some issue @fernandofraga109? I don't use spring security so this solution is not needed @EnableReactiveMethodSecurity(useAuthorizationManager = false) I have following version: > > * Reactor 3.5.4

  • Micrometer Context-Propagation 1.0.3
  • Micrometer 1.10.4
  • Micrometer Tracing 1.0.2

Hi @silverwind85,
I was able to use the micrometer correctly...after some time I realized it works only with reactor version: '3.5.3+'

Kafka and Rabbit (messaging frameworks)
They need to have in the factory this setting below
.setObservationEnabled(true);

Reactive programming
Despite the new libraries and stuff, the property below needs to be added for each context opened.
.contextCapture()
Sample:
objectSampleService.getObjectSampleDTO(entityId)
.flatMap(objectSampleDTO -> this.sendMessage(objectSampleDTO)
.doOnError(throwable -> this.fallbackProcessing(entityId, throwable))
.contextCapture()
.subscribe();

I hope that helps you.

@silverwind85
Copy link

I have the same problem like you @fernandofraga109. The first thread shows tracking Id, the second thread doesn't ( not propagating ). You opened some issue @fernandofraga109? I don't use spring security so this solution is not needed @EnableReactiveMethodSecurity(useAuthorizationManager = false) I have following version: > > * Reactor 3.5.4

  • Micrometer Context-Propagation 1.0.3
  • Micrometer 1.10.4
  • Micrometer Tracing 1.0.2

Hi @silverwind85, I was able to use the micrometer correctly...after some time I realized it works only with reactor version: '3.5.3+'

Kafka and Rabbit (messaging frameworks) They need to have in the factory this setting below .setObservationEnabled(true);

Reactive programming Despite the new libraries and stuff, the property below needs to be added for each context opened. .contextCapture() Sample: objectSampleService.getObjectSampleDTO(entityId) .flatMap(objectSampleDTO -> this.sendMessage(objectSampleDTO) .doOnError(throwable -> this.fallbackProcessing(entityId, throwable)) .contextCapture() .subscribe();

I hope that helps you.

I reproduced case my issue https://github.com/silverwind85/TraceIdExample
example logs:
TrackingId: -6909011762361598389 TrackingId: -6909011762361598389 TrackingId: null
In the GlobalErrorAttributes class
traceId is null. This worked on SpringBoot 2 and Sleuth before migration. Is there any solution to make the traceId propagate everywhere @bclozel @marcingrzejszczak

@lukasT
Copy link

lukasT commented Apr 18, 2023

@silverwind85 I had the same problem. Lately I was able to get traceIds back in logs with spring-boot-starter version 3.0.5 without using the fix to set

@EnableReactiveMethodSecurity(useAuthorizationManager = false)

What I had to do:

  1. call Hooks.enableAutomaticContextPropagation() in the application's main class
  2. define a configuration like this:
@Configuration
class ObservationConfig(
    private val observationHandlers: Set<ObservationHandler<*>>
) {
    @Bean
    fun observationRegistry(tracer: Tracer): ObservationRegistry {
        val observationRegistry = ObservationRegistry.create()
        val observationConfig = observationRegistry.observationConfig()
        observationHandlers.forEach(observationConfig::observationHandler)
        return observationRegistry
    }
}

I found that as I debugged and saw that somehow the observationHandlers were not registered for the ObservationRegistry, thus making it return true when called isNoop().

I found that within io.micrometer.observation.SimpleObservationRegistry.isNoop()

@silverwind85
Copy link

@Configuration
class ObservationConfig(
    private val observationHandlers: Set<ObservationHandler<*>>
) {
    @Bean
    fun observationRegistry(tracer: Tracer): ObservationRegistry {
        val observationRegistry = ObservationRegistry.create()
        val observationConfig = observationRegistry.observationConfig()
        observationHandlers.forEach(observationConfig::observationHandler)
        return observationRegistry
    }
}

I added this bean but it didn't help
example
TrackingId: -4489328965775781104 TrackingId: -4489328965775781104 TrackingId: null
@lukasT You can check it in this repo https://github.com/silverwind85/TraceIdExample

@lukasT
Copy link

lukasT commented Apr 20, 2023

@silverwind85 sadly I wasn't able to create a pull-request for your repository but I was able to make it work when:

  1. implementing the ObservationConfig as mentioned earlier
  2. inject private val tracer: io.micrometer.tracing.Tracer instead of private val currentTraceContext: CurrentTraceContext and get current traceId like this:
val traceId = tracer.currentSpan()?.context()?.traceId()
  1. using an ErrorController instead of GlobalErrorAttributes like this:
@RestControllerAdvice
open class ErrorController(
    private val tracer: Tracer
) {

    @ExceptionHandler(Throwable::class)
    open fun handleAny(
        ex: Throwable,
    ): ResponseEntity<String> {
        val traceId = tracer.currentSpan()?.context()?.traceId()
        println("${ErrorController::class.simpleName}: TrackingId: $traceId")

        return ResponseEntity.ok(
            "whoops"
        )
    }
}

outcome:

RestApi: TrackingId: 644119754b98bf7d4260c359ebf57b73
JokesRetrieval: TrackingId: 644119754b98bf7d4260c359ebf57b73
ErrorController: TrackingId: 644119754b98bf7d4260c359ebf57b73

@silverwind85
Copy link

@lukasT Thanks for you help but I need use GlobalErrorAttributes extend DefaultErrorAttributes. Too many changes in the application. From my perspective, this should be corrected. I'm wonderd other people don't have this problem

@rsercano
Copy link

rsercano commented May 19, 2023

This's way too complicated after Sleuth, still no solution to propagate same traceID between microservices ?

P.s. I'm having same issue with Java spring and graphql client

@gabrielfmagalhaes
Copy link

Total noob question here, but from what I could see, even though I use Hooks.enableAutomaticContextPropagation(); to implement auto propagation, when publishing messages to be consumed in other microservices, it would still require some manual configurations and set ObservationThreadLocalAccessor.KEY to be in the Reactor Context

I plan to upgrade the SpringBoot in some applications, and I have a custom library that implements Sleuth, creates/reads spans, and sets in a ThreadLocalSpan (it works for MVC and Webflux projects once Sleuth automatically correlates the observations). By using Micrometer, from what I could see, this method wouldn't work until enabling ObservationThreadLocalAccessor.KEY in every webflux project

@rsercano
Copy link

rsercano commented Jun 9, 2023

This's way too complicated after Sleuth, still no solution to propagate same traceID between microservices ?

P.s. I'm having same issue with Java spring and graphql client

I can confirm that 3.0.7 RELEASE have fixed the main problem for me, still initial request have wrong traceIDs, probably another issue but after that every request have same traceID in between microservices.

@bclozel
Copy link
Member

bclozel commented Jun 9, 2023

@gabrielfmagalhaes if you require assistance for instrumenting your library, please reach out to the micrometer community on the Slack channel so we can help you.

@rsercano if there is a bug with graphql instrumentation, please share a sample application demonstrating the problem and create a new issue in the spring graphql project.

To anyone here: this issue is closed and is not meant to address any issue related to observability and Spring. Adding more comments about somewhat related problems is not helping. The issue title might be too broad in the first place.

@Storje
Copy link

Storje commented Jun 30, 2023

Even with Hooks.enableAutomaticContextPropagation(), unable to get the traceId in logs for a WebFilter ? Is there a workaround, please ?

@roja-polukonda
Copy link

@lukasT Thanks for you help but I need use GlobalErrorAttributes extend DefaultErrorAttributes. Too many changes in the application. From my perspective, this should be corrected. I'm wonderd other people don't have this problem

@silverwind85 I am encountering a similar issue and also need to use GlobalErrorAttributes extending DefaultErrorAttributes. Have you found any fix or workaround for the missing trace IDs?

@bclozel
Copy link
Member

bclozel commented Aug 1, 2024

@roja-polukonda I don't understand the link between DefaultErrorAttributes and tracing. You won't get help on this issue. Have you considered asking a question on StackOverflow showing what your application is doing?

@roja-polukonda
Copy link

roja-polukonda commented Aug 2, 2024

@bclozel In our application, we use DefaultErrorAttributes as a global exception handler, which used to log trace IDs and span IDs perfectly with Sleuth. However, post migrating to Spring WebFlux 3.x, these trace IDs are missing from the logs of this class (GlobalErrorAttributes extending DefaultErrorAttributes) . Switching the exception handling method involves too many changes in most of the applications. Therefore, I am looking to see if others are facing a similar issue and found this thread, so I asked around to see if a workaround is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: superseded An issue that has been superseded by another theme: observability Issues related to observability
Projects
None yet
Development

No branches or pull requests