Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation in WebFlux #777

Closed
lukasznowak-viacom opened this issue Jul 15, 2024 · 3 comments
Closed

Performance degradation in WebFlux #777

lukasznowak-viacom opened this issue Jul 15, 2024 · 3 comments

Comments

@lukasznowak-viacom
Copy link

lukasznowak-viacom commented Jul 15, 2024

Recently we started moving towards reactive stack based on WebFlux. Unfortunately during the PoC tests we encountered performance degradation compared to the old solution based on MVC. After investigation we found that micrometer-tracing library is called by reactive stack in blocking way and because of that we see much more CPU usage + higher overall latencies. Does it mean that it is not adapted to the reactive programming at all?

We are using:

  • webflux 6.1.6
  • micrometer-tracing 1.2.5
  • jvm 21

Please find stack trace from https://github.com/reactor/BlockHound used in our service:

[ERROR] [2024-07-09 08:09:01,148] [client-settings-service,, ,,] [reactor-http-epoll-11] r.n.h.s.HttpServerOperations: [9a3a060a-4034, L:/172.17.0.6:9304 - R:/10.242.106.249:50730] Error starting response. Replying error status-
reactor.blockhound.BlockingOperationError: Blocking call! jdk.internal.misc.Unsafe#park
	at java.base/jdk.internal.misc.Unsafe.park(Unknown Source) ~[na:na]
	at java.base/java.util.concurrent.locks.LockSupport.park(Unknown Source) ~[na:na]
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) ~[na:na]
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) ~[na:na]
	at java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(Unknown Source) ~[na:na]
	at java.base/java.util.concurrent.locks.ReentrantLock.lock(Unknown Source) ~[na:na]
	at java.base/java.lang.ref.ReferenceQueue.poll(Unknown Source) ~[na:na]
	at brave.internal.recorder.PendingSpans.expungeStaleEntries(PendingSpans.java:129) ~[brave-5.16.0.jar!/:na]
	at brave.internal.collect.WeakConcurrentMap.getIfPresent(WeakConcurrentMap.java:50) ~[brave-5.16.0.jar!/:na]
	at brave.internal.recorder.PendingSpans.get(PendingSpans.java:59) ~[brave-5.16.0.jar!/:na]
	at brave.internal.recorder.PendingSpans.getOrCreate(PendingSpans.java:64) ~[brave-5.16.0.jar!/:na]
	at brave.Tracer._toSpan(Tracer.java:410) ~[brave-5.16.0.jar!/:na]
	at brave.Tracer.nextSpan(Tracer.java:365) ~[brave-5.16.0.jar!/:na]
	at io.micrometer.tracing.brave.bridge.BraveSpanBuilder.span(BraveSpanBuilder.java:78) ~[micrometer-tracing-bridge-brave-1.2.5.jar!/:1.2.5]
	at io.micrometer.tracing.brave.bridge.BraveSpanBuilder.start(BraveSpanBuilder.java:166) ~[micrometer-tracing-bridge-brave-1.2.5.jar!/:1.2.5]
	at io.micrometer.tracing.handler.PropagatingReceiverTracingObservationHandler.onStart(PropagatingReceiverTracingObservationHandler.java:74) ~[micrometer-tracing-1.2.5.jar!/:1.2.5]
	at io.micrometer.tracing.handler.PropagatingReceiverTracingObservationHandler.onStart(PropagatingReceiverTracingObservationHandler.java:35) ~[micrometer-tracing-1.2.5.jar!/:1.2.5]
	at io.micrometer.observation.ObservationHandler$FirstMatchingCompositeObservationHandler.onStart(ObservationHandler.java:149) ~[micrometer-observation-1.12.5.jar!/:1.12.5]
	at io.micrometer.observation.SimpleObservation.notifyOnObservationStarted(SimpleObservation.java:214) ~[micrometer-observation-1.12.5.jar!/:1.12.5]
	at io.micrometer.observation.SimpleObservation.start(SimpleObservation.java:161) ~[micrometer-observation-1.12.5.jar!/:1.12.5]
	at org.springframework.web.server.adapter.HttpWebHandlerAdapter$ObservationSignalListener.doFirst(HttpWebHandlerAdapter.java:383) ~[spring-web-6.1.6.jar!/:6.1.6]
	at reactor.core.publisher.MonoTapRestoringThreadLocals.subscribe(MonoTapRestoringThreadLocals.java:58) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4568) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:265) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4568) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:265) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoDeferContextual.subscribe(MonoDeferContextual.java:55) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.netty.http.server.HttpServer$HttpServerHandle.onStateChange(HttpServer.java:1176) ~[reactor-netty-http-1.1.18.jar!/:1.1.18]
	at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:710) ~[reactor-netty-core-1.1.18.jar!/:1.1.18]
	at reactor.netty.transport.ServerTransport$ChildObserver.onStateChange(ServerTransport.java:481) ~[reactor-netty-core-1.1.18.jar!/:1.1.18]
	at reactor.netty.http.server.HttpServerOperations.onInboundNext(HttpServerOperations.java:652) ~[reactor-netty-http-1.1.18.jar!/:1.1.18]
	at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:114) ~[reactor-netty-core-1.1.18.jar!/:1.1.18]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at reactor.netty.http.server.HttpTrafficHandler.channelRead(HttpTrafficHandler.java:240) ~[reactor-netty-http-1.1.18.jar!/:1.1.18]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289) ~[netty-handler-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:801) ~[netty-transport-classes-epoll-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[netty-transport-classes-epoll-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[netty-transport-classes-epoll-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]
@jonatan-ivanov
Copy link
Member

Thank you for the issue.
I've edited your comment to improve the formatting of the stacktrace. You might want to check out this Mastering Markdown guide for future reference.

I'm not sure there is anything we can do on Micrometer Tracing side. As you can see in the stacktrace, Micrometer Tracing calls brave.Tracer.nextSpan which eventually boils down to a call to java.lang.ref.ReferenceQueue::poll (from WeakConcurrentMap) which holds a ReentrantLock and lock on it is a blocking call.

It seems Brave's WeakConcurrentMap was borrowed from Rafael Winterhalter’s WeakConcurrentMap which says this should be a "concurrent, lock-free (as in lock-less) hash map". The implementation uses java.lang.ref.ReferenceQueue's poll method which could seem non-blocking according to its javadoc but it seems it blocks on a lock (or I might miss something).

I think you can verify it by injecting brave.Tracer into one of your components and calling nextSpan on it, you should see the same error from BlockHound but without Micrometer Tracing.

I would open an issue for Brave and another one for Rafael's weak-lock-free repo and reference this issue from there but I think this should have minimal impact on you project.

Do you have profiling data that proves that the lock causes higher CPU usage and higher latencies? (I think the lock should not cause higher CPU usage but because of using Reactor, your throughput and your CPU utilization can be higher compared to MVC.)

Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Copy link

github-actions bot commented Aug 1, 2024

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants