Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when using Prometheus Exemplars #40972

Closed
anvo1115 opened this issue Jun 2, 2024 · 9 comments
Closed

Deadlock when using Prometheus Exemplars #40972

anvo1115 opened this issue Jun 2, 2024 · 9 comments

Comments

@anvo1115
Copy link

anvo1115 commented Jun 2, 2024

It seems that the issue #33070 is reproduced again.

We use

[INFO] +- org.springframework.boot:spring-boot-starter-actuator:jar:3.2.4:compile
[INFO] |  +- org.springframework.boot:spring-boot-actuator-autoconfigure:jar:3.2.4:compile
[INFO] |  +- io.micrometer:micrometer-observation:jar:1.12.4:compile
[INFO] |  |  \- io.micrometer:micrometer-commons:jar:1.12.4:compile
[INFO] |  \- io.micrometer:micrometer-jakarta9:jar:1.12.4:compile

and it seems that https://github.com/spring-projects/spring-boot/blob/v3.2.4/spring-boot-project/spring-boot-actuator-autoconfigure/src/main/java/org/springframework/boot/actuate/autoconfigure/tracing/prometheus/PrometheusExemplarsAutoConfiguration.java causes a deadllock.

Please, refer to the attached thread dump.
thread_dump_with_deadlock.txt

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jun 2, 2024
@wilkinsona
Copy link
Member

@anvo1115 Thanks for the report, but, as far as I can tell from the thread dump, that doesn't look deadlocked to me. reactor-http-epoll-3 is waiting to lock 0x00000000cf5c3e78 which is held by main. However, main is waiting to lock 0x00000000f19e0530 which isn't held by any other thread in the dump. In other words, judging by the thread dump, once the request that's being made by MicroserviceWebClient has completed, processing can proceed.

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Jun 3, 2024
@anvo1115
Copy link
Author

anvo1115 commented Jun 3, 2024

The issue is that the request is not completed.
But after we exclude PrometheusExemplarsAutoConfiguration , the request passed.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jun 3, 2024
@anvo1115
Copy link
Author

anvo1115 commented Jun 3, 2024

image

@wilkinsona
Copy link
Member

Unfortunately, the thread dump doesn't explain why the request did not complete. None of the threads appear to be in the process of making an HTTP request so it's not clear why the thread that's waiting for one to complete is stuck. If you would like us to spend some more time investigating, please spend some time providing a complete yet minimal sample that reproduces the problem. You can share it with us by pushing it to a separate repository on GitHub or by zipping it up and attaching it to this issue.

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jun 3, 2024
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jun 10, 2024
@wilkinsona
Copy link
Member

Having looked again at spring-projects/spring-framework#32996, I think I now understand what's happening here.

The main thread is making an HTTP request during bean creation and while Framework's singleton lock is held. While the request's reactive and using WebClient, block() is being called so the main thread cannot proceed until the request has completed. reactor-http-epoll-3 is the thread that's performing the HTTP request. WebClient has been instrumented and the observation for the request is being stopped. This results in an attempt to update the last exemplar. This gets stuck because it tries to use LazyTracingSpanContextSupplier which needs to retrieve the Tracer. Doing so requires Framework's singleton lock which cannot be obtained as it's held by main.

This should already be fixed in Framework 6.2.0-M3 but we need to work something out for earlier releases.

@anvo1115, you could avoid the problem by not doing things on multiple threads while also blocking. That would either mean that you stop calling block or that you use an imperative HTTP client.

On our side, it's becoming increasingly apparent that we need a better way of breaking the MeterRegistry <-> Tracer cycle that exemplars cause. We'll discuss this with the observability team.

@wilkinsona wilkinsona removed the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jun 11, 2024
@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jun 11, 2024
@wilkinsona wilkinsona removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue labels Jun 11, 2024
@wilkinsona
Copy link
Member

@anvo1115 it would be interesting to know if defining the following bean works around the problem for you:

	@Bean
	TracerSpanContextSupplier spanContextSuppler(Tracer tracer) {
		return new TracerSpanContextSupplier(tracer);
	}
	
	static class TracerSpanContextSupplier implements SpanContextSupplier {

		private final Tracer tracer;

		TracerSpanContextSupplier(Tracer tracer) {
			this.tracer = tracer;
		}

		@Override
		public String getTraceId() {
			Span currentSpan = currentSpan();
			return (currentSpan != null) ? currentSpan.context().traceId() : null;
		}

		@Override
		public String getSpanId() {
			Span currentSpan = currentSpan();
			return (currentSpan != null) ? currentSpan.context().spanId() : null;
		}

		@Override
		public boolean isSampled() {
			Span currentSpan = currentSpan();
			if (currentSpan == null) {
				return false;
			}
			Boolean sampled = currentSpan.context().sampled();
			return sampled != null && sampled;
		}

		private Span currentSpan() {
			return this.tracer.currentSpan();
		}

	}

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Jun 11, 2024
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jun 18, 2024
@spring-projects-issues
Copy link
Collaborator

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@spring-projects-issues spring-projects-issues closed this as not planned Won't fix, can't repro, duplicate, stale Jun 25, 2024
@spring-projects-issues spring-projects-issues removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged labels Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants