ThreadLocals not cleared with `Hooks.enableAutomaticContextPropagation()`, same TraceId for every request #363

kzander91 · 2023-09-25T08:03:08Z

I originally opened this issue in reactor: reactor/reactor-core#3584, but it turned out that this is an issue in micrometer-tracing instead.
Worked in: 1.1.4
Broken in: 1.1.5, 1.1.6-SNAPSHOT, 1.2.0-SNAPSHOT
Also related: #356

Expected Behavior

Automatic context propagation (enabled with Hooks.enableAutomaticContextPropagation()) should clear the thread-locals afterwards.

Actual Behavior

After the first request handled by a particular thread, every subsequent request handled by that thread will reuse the same traceId.

Steps to Reproduce

Demo project: reproducer.zip

@SpringBootApplication
public class Demo1Application {

    public static void main(String[] args) {
        System.setProperty("reactor.netty.ioWorkerCount", "1");
        Hooks.enableAutomaticContextPropagation();
        SpringApplication.run(Demo1Application.class, args);
    }

}

@RestController
@RequiredArgsConstructor
class MyController {

    private final Tracer tracer;

    @GetMapping("/")
    public String traceId() {
        return Thread.currentThread().getName() + '-' + tracer.currentSpan().context().traceId();
    }

}

Run it and issue a few requests to localhost:8080, the output will be something like this:

With reactor.netty.ioWorkerCount=1:

C:\>curl localhost:8080
reactor-http-nio-1-65102dd571ea1cb67a25a401c93412b0
C:\>curl localhost:8080
reactor-http-nio-1-65102dd571ea1cb67a25a401c93412b0
C:\>curl localhost:8080
reactor-http-nio-1-65102dd571ea1cb67a25a401c93412b0
C:\>curl localhost:8080
reactor-http-nio-1-65102dd571ea1cb67a25a401c93412b0
C:\>curl localhost:8080
reactor-http-nio-1-65102dd571ea1cb67a25a401c93412b0
C:\>curl localhost:8080
reactor-http-nio-1-65102dd571ea1cb67a25a401c93412b0

With reactor.netty.ioWorkerCount=2:

C:\>curl localhost:8080
reactor-http-nio-1-65102d16b3d992cb7131386bf683cd9e
C:\>curl localhost:8080
reactor-http-nio-2-65102d0dc6d2b48f307d67a6db6fde3a
C:\>curl localhost:8080
reactor-http-nio-1-65102d16b3d992cb7131386bf683cd9e
C:\>curl localhost:8080
reactor-http-nio-2-65102d0dc6d2b48f307d67a6db6fde3a
C:\>curl localhost:8080
reactor-http-nio-1-65102d16b3d992cb7131386bf683cd9e
C:\>curl localhost:8080
reactor-http-nio-2-65102d0dc6d2b48f307d67a6db6fde3a

With reactor.netty.ioWorkerCount=3:

C:\>curl localhost:8080
reactor-http-nio-2-65102dfc81d58e1d99c2f64353b410a6
C:\>curl localhost:8080
reactor-http-nio-3-65102dfc7b1ff1d38702f4b2e4cb94a6
C:\>curl localhost:8080
reactor-http-nio-1-65102dfde7c20d2604248dd885469458
C:\>curl localhost:8080
reactor-http-nio-2-65102dfc81d58e1d99c2f64353b410a6
C:\>curl localhost:8080
reactor-http-nio-3-65102dfc7b1ff1d38702f4b2e4cb94a6
C:\>curl localhost:8080
reactor-http-nio-1-65102dfde7c20d2604248dd885469458

As we can see, requests handled by a thread will always reuse the same traceId, indicating that thread-locals popuplated by automatic context propagation are not cleared/reset after each request.

Without automatic context propagation, tracer.currentSpan() returns null.

Your Environment

Reactor version(s) used: 3.5.10
Other relevant libraries versions (eg. netty, ...):
- Spring Boot 3.1.4
- Spring Framework 6.0.12
- Micrometer Tracing: 1.1.5 (Also broken in 1.1.6-SNAPSHOT, 1.2.0-SNAPSHOT)
- Micrometer Observation: 1.11.4
JVM version (java -version):

openjdk version "17.0.8.1" 2023-08-24
OpenJDK Runtime Environment Temurin-17.0.8.1+1 (build 17.0.8.1+1)
OpenJDK 64-Bit Server VM Temurin-17.0.8.1+1 (build 17.0.8.1+1, mixed mode, sharing)

The text was updated successfully, but these errors were encountered:

xsreality · 2023-09-25T08:25:23Z

I think I am also hitting this issue after upgrading to SB 3.1.4 (Micrometer Tracing 1.1.5). The traces for actuator endpoints are using the same trace ID and becoming child traces instead of independent traces.

without this change we're clearing the scopes map upon ending of spans. The problem is that the scoping hierarchy gets broken and even though that OTLA works well (scopes are properly opened and closed) the spans are not. with this change we're not removing the scopes upon clearing of a TraceContext fixds gh-363

marcingrzejszczak · 2023-09-26T15:38:59Z

Fixed via 5a750bb

vpavic · 2023-10-09T09:40:12Z

Looking at the commit that resolved this, it doesn't appear there's anything preventing a regression like this from sneaking back in.
As this is quite a fundamental thing, could you ensure it's properly tested?

shakuzen · 2023-10-10T07:24:43Z

Yes, we should definitely do that. I've opened #397 to track it. Pull requests welcome.

nathandeamer · 2023-10-15T13:53:44Z

I think I am also hitting this issue after upgrading to SB 3.1.4 (Micrometer Tracing 1.1.5). The traces for actuator endpoints are using the same trace ID and becoming child traces instead of independent traces.

Think I am also seeing this but with Spring Cloud Gateway and latest Micrometer Tracing 1.1.6: #410

chemicL · 2023-10-31T11:36:46Z

@vpavic #421 is my proposed test for this case. @shakuzen @jonatan-ivanov I tested it against 1.1.5 and it fails due to the scope invalidation following an Observation stop signal.

vpavic · 2023-11-14T13:27:49Z

Thanks for the ping @chemicL.

Test case in your PR is focused on Reactor scenario, and while this issue also is, the root cause behind it really isn't tied to Reactor and it affected a broader set of users (we're not using Reactor in any of our projects).

Not a critique of your PR by any means just an observation that the tests covering this should probably also include something that's not specific to a certain runtime.

chemicL · 2023-11-14T16:15:20Z

@vpavic understood :) The interesting thing about it is the unique setup which reactor creates with the support for context propagation that we introduced. It was hard to replicate without the reactor in the equation, yet not impossible. Have you experienced this bug without reactor? I'd argue here that using reactor is both a means to an end – creating an environment in which we can reliably replicate the issue, and a validation that where the bug has surfaced is also covered by the tests.

If you have an idea that avoids using reactor in the test for this feature, I think it would be valuable to add.

kzander91 mentioned this issue Sep 25, 2023

ThreadLocals not cleared with Hooks.enableAutomaticContextPropagation(), same TraceId for every request reactor/reactor-core#3584

Closed

stefan-g mentioned this issue Sep 25, 2023

Custom Baggage not traced in server request/response logs #361

Closed

marcingrzejszczak added this to the 1.1.6 milestone Sep 26, 2023

marcingrzejszczak added the bug A general bug label Sep 26, 2023

marcingrzejszczak closed this as completed Sep 26, 2023

rgarciapariente mentioned this issue Oct 5, 2023

Thread locals dirty with micrometer-tracing 1.1.5 #378

Closed

kzander91 mentioned this issue Oct 8, 2023

Spring boot 3.1.4 micrometer tracing duplicated traceIds spring-projects/spring-boot#37744

Closed

vpavic mentioned this issue Oct 9, 2023

OutOfMemoryError after upgrade to Spring Boot 3.1.4 spring-projects/spring-boot#37798

Closed

shakuzen mentioned this issue Oct 10, 2023

Test for #363 #397

Closed

bclozel mentioned this issue Oct 17, 2023

In spring webflux, if Hooks.enableAutomaticContextPropagation() is enabled, the same traceId will be used around. spring-projects/spring-framework#31450

Closed

akrikheli mentioned this issue Oct 19, 2023

Kotlin coroutines: tracing baggage remains on MDC but should be removed [otel] #409

Closed

chemicL mentioned this issue Oct 31, 2023

Validating Observation termination doesn't invalidate scopes in other Threads #421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThreadLocals not cleared with `Hooks.enableAutomaticContextPropagation()`, same TraceId for every request #363

ThreadLocals not cleared with `Hooks.enableAutomaticContextPropagation()`, same TraceId for every request #363

kzander91 commented Sep 25, 2023 •

edited

Loading

xsreality commented Sep 25, 2023

marcingrzejszczak commented Sep 26, 2023

vpavic commented Oct 9, 2023

shakuzen commented Oct 10, 2023

nathandeamer commented Oct 15, 2023 •

edited

Loading

chemicL commented Oct 31, 2023

vpavic commented Nov 14, 2023

chemicL commented Nov 14, 2023

ThreadLocals not cleared with Hooks.enableAutomaticContextPropagation(), same TraceId for every request #363

ThreadLocals not cleared with Hooks.enableAutomaticContextPropagation(), same TraceId for every request #363

Comments

kzander91 commented Sep 25, 2023 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce

Your Environment

xsreality commented Sep 25, 2023

marcingrzejszczak commented Sep 26, 2023

vpavic commented Oct 9, 2023

shakuzen commented Oct 10, 2023

nathandeamer commented Oct 15, 2023 • edited Loading

chemicL commented Oct 31, 2023

vpavic commented Nov 14, 2023

chemicL commented Nov 14, 2023

ThreadLocals not cleared with `Hooks.enableAutomaticContextPropagation()`, same TraceId for every request #363

ThreadLocals not cleared with `Hooks.enableAutomaticContextPropagation()`, same TraceId for every request #363

kzander91 commented Sep 25, 2023 •

edited

Loading

nathandeamer commented Oct 15, 2023 •

edited

Loading