Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring Boot startup with Lettuce and Micrometer metrics runs into Spring context singleton lock #2814

Closed
nicbillen opened this issue Dec 27, 2023 · 14 comments
Labels
for: external-project For an external project and not something we can fix

Comments

@nicbillen
Copy link

nicbillen commented Dec 27, 2023

Hello together,
we updated our Spring Boot Service from version 3.1.6 to version 3.2.0 (Spring Cloud version 2023.0.0). The application connects via Spring data (lettuce) to a Redis Sentinel deployment. For an encrypted connection to Redis it uses ssl bundles. With Spring Boot 3.1.6 everything works fine. But after the update to the new version, the application is not starting anymore. With trace logging we see these messages (some messages are masked with XXX):

`81474969","logger_name":"jdk.event.security","thread_name":"lettuce-nioEventLoop-4-1","level":"DEBUG","level_value":10000}

{"@timestamp":"2023-12-21T11:00:19.790904008Z","@version":"1","message":"[id: 0x7e9b7544, L:/XXX - R:redis-XXX/XXX:26379] HANDSHAKEN: protocol:TLSv1.3 cipher suite:TLS_AES_128_GCM_SHA256","logger_name":"io.netty.handler.ssl.SslHandler","thread_name":"lettuce-nioEventLoop-4-1","level":"DEBUG","level_value":10000}

{"@timestamp":"2023-12-21T11:00:19.805736687Z","@version":"1","message":"[channel=0xd036595a, /XXX -> redis-XXX/XXX:26379, epid=0x2, chid=0x1] Received: 125 bytes, 1 commands in the stack","logger_name":"io.lettuce.core.protocol.CommandHandler","thread_name":"lettuce-nioEventLoop-4-1","level":"DEBUG","level_value":10000}

{"@timestamp":"2023-12-21T11:00:19.805879889Z","@version":"1","message":"[channel=0xd036595a, /XXX -> redis-XXX/XXX:26379, epid=0x2, chid=0x1] Buffer: %6\r\n$6\r\nserver\r\n$5\r\nredis\r\n$7\r\nversion\r\n$5\r\n6.2.7\r\n$5\r\nproto\r\n:3\r\n$2\r\nid\r\n:1799500\r\n$4\r\nmode\r\n$8\r\nsentinel\r\n$7\r\nmodules\r\n*0","logger_name":"io.lettuce.core.protocol.CommandHandler","thread_name":"lettuce-nioEventLoop-4-1","level":"TRACE","level_value":5000}

{"@timestamp":"2023-12-21T11:00:19.806053831Z","@version":"1","message":"[channel=0xd036595a, /XXX -> redis-XXX/XXX:26379, epid=0x2, chid=0x1] Stack contains: 1 commands","logger_name":"io.lettuce.core.protocol.CommandHandler","thread_name":"lettuce-nioEventLoop-4-1","level":"DEBUG","level_value":10000}

{"@timestamp":"2023-12-21T11:00:19.816972165Z","@version":"1","message":"Decode done, empty stack: true","logger_name":"io.lettuce.core.protocol.RedisStateMachine","thread_name":"lettuce-nioEventLoop-4-1","level":"DEBUG","level_value":10000}

The “Decode done, empty stack: true” message is the last one in the log. Afterwards the application stucks and seems to wait for something.
We haven’t tested it in a minimal local setup, but maybe someone is able to see anything. I have found the following GitHub thread with a lettuce issue: redis/lettuce#2379 Maybe it is related.

For us it is a critical problem, because we need the updates.

All the best
Nicolas Billen

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Dec 27, 2023
@mp911de
Copy link
Member

mp911de commented Dec 27, 2023

Please attach a thread dump to inspect why the application is hanging.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Dec 27, 2023
@nicbillen
Copy link
Author

Hello,
here is the thread dump:

thread-dump.txt

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 27, 2023
@mp911de
Copy link
Member

mp911de commented Dec 28, 2023

The netty thread lettuce-nioEventLoop-4-1 which tries to connect to Redis is blocked by a Spring Framework lock while the main thread awaits completion of the Redis connection.

Here's the trace of the blocked thread:

"lettuce-nioEventLoop-4-1" #29 daemon prio=5 os_prio=0 cpu=670.91ms elapsed=9.12s tid=0x00007f165789c6d0 nid=0x3d waiting for monitor entry  [0x00007f1654611000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.getSingletonFactoryBeanForTypeCheck(AbstractAutowireCapableBeanFactory.java:976)
	- waiting to lock <0x00000000d5f0ed50> (a java.util.concurrent.ConcurrentHashMap)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.getTypeForFactoryBean(AbstractAutowireCapableBeanFactory.java:892)
	at org.springframework.beans.factory.support.AbstractBeanFactory.isTypeMatch(AbstractBeanFactory.java:620)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doGetBeanNamesForType(DefaultListableBeanFactory.java:573)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.getBeanNamesForType(DefaultListableBeanFactory.java:540)
	at org.springframework.beans.factory.BeanFactoryUtils.beanNamesForTypeIncludingAncestors(BeanFactoryUtils.java:260)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1635)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1395)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory$DependencyObjectProvider.getObject(DefaultListableBeanFactory.java:2068)
	at org.springframework.boot.actuate.autoconfigure.tracing.prometheus.PrometheusExemplarsAutoConfiguration$LazyTracingSpanContextSupplier$$Lambda$953/0x00000001007270c0.get(Unknown Source)
	at org.springframework.util.function.SingletonSupplier.get(SingletonSupplier.java:106)
	at org.springframework.util.function.SingletonSupplier.obtain(SingletonSupplier.java:127)
	at org.springframework.boot.actuate.autoconfigure.tracing.prometheus.PrometheusExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.currentSpan(PrometheusExemplarsAutoConfiguration.java:89)
	at org.springframework.boot.actuate.autoconfigure.tracing.prometheus.PrometheusExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.isSampled(PrometheusExemplarsAutoConfiguration.java:80)
	at io.prometheus.client.exemplars.DefaultExemplarSampler.doSample(DefaultExemplarSampler.java:46)
	at io.prometheus.client.exemplars.DefaultExemplarSampler.sample(DefaultExemplarSampler.java:34)
	at io.micrometer.prometheus.PrometheusTimer.updateLastExemplar(PrometheusTimer.java:128)
	at io.micrometer.prometheus.PrometheusTimer.recordNonNegative(PrometheusTimer.java:118)
	at io.micrometer.core.instrument.AbstractTimer.record(AbstractTimer.java:259)
	at io.lettuce.core.metrics.MicrometerCommandLatencyRecorder.recordCommandLatency(MicrometerCommandLatencyRecorder.java:91)
	at io.lettuce.core.metrics.MicrometerCommandLatencyRecorder.recordCommandLatency(MicrometerCommandLatencyRecorder.java:76)
	at io.lettuce.core.protocol.CommandHandler.recordLatency(CommandHandler.java:912)
	at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:769)
	at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:657)
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:597)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.9/Unknown Source)

That isn't something we can fix in Spring Data Redis because the metrics recorder is trying to access a bean from the Spring context. Paging @marcingrzejszczak and @wilkinsona. I think it is best to create a issue at https://github.com/spring-projects/spring-boot/issues for the time being.

@mp911de mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Dec 28, 2023
@mp911de mp911de added for: external-project For an external project and not something we can fix and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Dec 28, 2023
@mp911de mp911de changed the title After Spring Boot update 3.1.6 to 3.2.0 the application doesn't start anymore (Redis Sentinel + TLS) Spring Boot startup with Lettuce and Micrometer metrics runs into Spring context singleton lock Dec 28, 2023
@marcingrzejszczak
Copy link
Contributor

cc @jonatan-ivanov - exemplars and early bean accessing again

@nicbillen
Copy link
Author

Now I tested it also with Spring Boot 3.2.1. The problem is still there. Is there a ticket in one of the other projects or should I open one?

@marcingrzejszczak
Copy link
Contributor

Waiting for @jonatan-ivanov to look at this

@jonatan-ivanov
Copy link
Member

I'm not sure what is going on in this case but the issue we are facing approximately looks like this:

  1. In order to support exemplars, PrometheusMeterRegistry needs a Tracer to get the traceId and the spanId when metrics are recorded
  2. If Brave is used (a Tracer implementation), its reporter needs a MeterRegistry to record metrics about the published spans

This is a circular dependency that supposed to be resolved using LazyTracingSpanContextSupplier.

It seems when Lettuce tries to record metrics, Micrometer tries to get the Tracer bean from the context. I guess we will need a reproducer to see what is going on.
Also, could you please inject both Tracer and MeterRegistry into a component before the app tries to connect to redis to see if Spring Framewok is able to create both of these beans?

As a workaround, if you don't need exemplars, you can either create your own PrometheusMeterRegistry bean where you don't specify or inject a null sampler. Alternatively you can create a "noop" SpanContextSupplier that returns false in isSampled and null in the other two getters (this does not solve the real issue but hides it from you so that you can upgrade).

@nicbillen
Copy link
Author

nicbillen commented Jan 18, 2024

If I inject a Tracer or a _MeterRegistry with a Bean in the application class, then I cannot start the context.

public class Application{
  private static final Logger LOG = LoggerFactory.getLogger(Application.class);

  public static void main(String[] arguments) {
    SpringApplication.run(Application.class, arguments);
  }

  @Bean
  public MethodValidationPostProcessor methodValidationPostProcessor() {
    return new MethodValidationPostProcessor();
  }

  @Bean
  public MeterRegistry tracer(MeterRegistry meterRegistry) {
    LOG.info("Tracer {} is injected", meterRegistry);
    return meterRegistry;
  }
}

Currently I have no idea, where I can inject it in a better way before the Redis configuration is loaded. But I will also test the workaround with an own PrometheusMeterRegistry.

Error Message:

Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'webMvcObservationFilter' defined in class path resource [org/springframework/boot/actuate/autoconfigure/observation/web/servlet/WebMvcObservationAutoConfiguration.class]: Unsatisfied dependency expressed through method 'webMvcObservationFilter' parameter 0: Error creating bean with name 'observationRegistry' defined in class path resource [org/springframework/boot/actuate/autoconfigure/observation/ObservationAutoConfiguration.class]: Error creating bean with name 'tracingAwareMeterObservationHandler' defined in class path resource [org/springframework/boot/actuate/autoconfigure/observation/ObservationAutoConfiguration$MeterObservationHandlerConfiguration$TracingAndMetricsObservationHandlerConfiguration.class]: Unsatisfied dependency expressed through method 'tracingAwareMeterObservationHandler' parameter 0: Error creating bean with name 'tracer' defined in XXX.Application: Unsatisfied dependency expressed through method 'tracer' parameter 0: Error creating bean with name 'tracer': Requested bean is currently in creation: Is there an unresolvable circular reference?

@nicbillen
Copy link
Author

The workaround doesn't work for us. Without a sampler the service starts, but all metrics are gone. With the SpanContextSampler the service doesn't start (like before).

This is our configuration:

@Configuration
@ConditionalOnProperty(value = "management.prometheus.metrics.export.enabled", havingValue = "true")
public class CurstomPrometheusMeterRegistry {
  private static final Logger LOG = LoggerFactory.getLogger(CurstomPrometheusMeterRegistry.class);

  /**
   * See https://github.com/spring-projects/spring-data-redis/issues/2814
   *
   * @return custom registry
   */
  @Bean
  public PrometheusMeterRegistry prometheusMeterRegistry(SpanContextSupplier spanContextSupplier) {
    PrometheusMeterRegistry micrometerRegistry = new PrometheusMeterRegistry(
        PrometheusConfig.DEFAULT,
        CollectorRegistry.defaultRegistry,
        Clock.SYSTEM,
        new DefaultExemplarSampler(spanContextSupplier)
    );

    LOG.info("Creating custom prometheus meter registry {}", micrometerRegistry);

    return micrometerRegistry;
  }
}

@nicbillen
Copy link
Author

nicbillen commented Jan 18, 2024

The "noop" SpanContextSupplier also results in no metrics:

public class CustomSpanContextSupplier implements SpanContextSupplier {
  @Override
  public String getTraceId() {
    return null;
  }

  @Override
  public String getSpanId() {
    return null;
  }

  @Override
  public boolean isSampled() {
    return false;
  }
}
  @Bean
  public PrometheusMeterRegistry prometheusMeterRegistry() {
    PrometheusMeterRegistry micrometerRegistry = new PrometheusMeterRegistry(
        PrometheusConfig.DEFAULT,
        CollectorRegistry.defaultRegistry,
        Clock.SYSTEM,
        new DefaultExemplarSampler(new CustomSpanContextSupplier())
    );

    LOG.info("Creating custom prometheus meter registry {}", micrometerRegistry);

    return micrometerRegistry;
  }

@nicbillen
Copy link
Author

Injecting the Tracer in the prometheusMeterRegistry Bean also results in the stucking behaviour:

  @Bean
  public PrometheusMeterRegistry prometheusMeterRegistry(Tracer tracer, SpanContextSupplier spanContextSupplier) {
    PrometheusMeterRegistry micrometerRegistry = new PrometheusMeterRegistry(
        PrometheusConfig.DEFAULT,
        CollectorRegistry.defaultRegistry,
        Clock.SYSTEM,
        new DefaultExemplarSampler(spanContextSupplier)
    );

    LOG.info("Creating custom prometheus meter registry {}, {}", micrometerRegistry, tracer);

    return micrometerRegistry;
  }

Log message:

{"@timestamp":"2024-01-18T13:57:22.132652015Z","@version":"1","message":"Creating custom prometheus meter registry io.micrometer.prometheus.PrometheusMeterRegistry@d641499, Tracer{spanHandler=[io.micrometer.tracing.brave.bridge.CompositeSpanHandler@75345e47, 

[...]

{"@timestamp":"2024-01-18T13:57:41.88513261Z","@version":"1","message":"Decode done, empty stack: true","logger_name":"io.lettuce.core.protocol.RedisStateMachine","thread_name":"lettuce-nioEventLoop-4-1","level":"DEBUG","level_value":10000}

[hanging service]

@Zernov-A
Copy link

I am encountering the same problem in my project.
As mentioned @jonatan-ivanov we should have Tracer bean already initialized, before the application tries to connect to redis. So I tried to register bean which has injected Tracer and application hanging goes out.

    @Bean
    MicrometerOptions micrometerOptions(Tracer tracer) {
        return MicrometerOptions.create();
    }

MicrometerOptions bean already present in LettuceMetricsAutoConfiguration and needs for MicrometerCommandLatencyRecorder.
But this code is not straightforward and clear, it's just a some kind of trick. So we need to find a more transparent and better solution.

Here are my thoughts:

  • Do we truly require the LazyTracingSpanContextSupplier with a lazy Tracer dependency? Evaluating the necessity of this component might lead us to a cleaner solution.
  • Maybe we can move key expiration listener initialization from bean initialization step (in RedisKeyValueAdapter) to phase when application context will be fully configured? This should prevent deadlock. The current scenario, if we use the EnableKeyspaceEvents.ON_STARTUP option, leads to the creation of the key expiration listener and the initiation of requests to Redis during bean initialization, and as a result we have deadlock with lazy Tracer initializing.

@nicbillen
Copy link
Author

I can confirm it. In my case it is now working too.

Here is the complete config:

@Configuration
@ConditionalOnProperty(value = "management.prometheus.metrics.export.enabled", havingValue = "true")
public class CurstomPrometheusMeterRegistry {
  private static final Logger LOG = LoggerFactory.getLogger(CurstomPrometheusMeterRegistry.class);

  /** Workaround for Micrometer-Redis Bug
   *
   * See https://github.com/spring-projects/spring-data-redis/issues/2814
   *
   * @return Micrometer options
   */
  @Bean
  MicrometerOptions micrometerOptions(Tracer tracer) {
    return MicrometerOptions.create();
  }
}

Maybe it is important to mention, that one has to import the correct tracer. It works with "import io.micrometer.tracing.Tracer".

@tyagiarpit
Copy link

Was there a more permanent solution identified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix
Projects
None yet
Development

No branches or pull requests

7 participants