Explore acquisition scheduler offloading #217

mp911de · 2024-10-16T12:26:21Z

Reactor Pool uses a single-threaded acquisition method to dispatch pool allocation requests as the consequence of its non-blocking synchronization means. This can cause congestion on a single thread and negatively impact scalability.

We should explore whether we could obtain a Scheduler from a Connection in case it is Wrapped.

The text was updated successfully, but these errors were encountered:

[resolves #217] Signed-off-by: Mark Paluch <mark.paluch@broadcom.com>

chemicL · 2024-10-16T13:49:41Z

Thanks! Linking the original issue reactor/reactor-pool#247. I'll test this change and follow-up. It would be neat to offer a generic reactor-pool mechanism but this approach is still great if all the r2dbc drivers can take advantage of it as demonstrated in the postgresql case.

chemicL · 2024-10-16T19:00:21Z

I can confirm this change brings a scalability boost where all event loops are used evenly for the processing that follows an acquisition (either a new connection or re-use from the pool). And it also feels more aligned with the SPI of r2dbc (compared to an implicit contract in reactor-pool).

pkgonan · 2024-11-02T02:05:50Z

@mp911de
Hi,

When i change the r2dbc-pool version 1.0.1.RELEASE to 1.0.2.RELEASE, i got an error like below.

{"timestamp":"2024-11-02T02:00:07.300Z","level":"WARN","thread":"parallel-16","mdc":{"trace_id":"14d93682b73679ce02ef7dd04fa32152","trace_flags":"01","span_id":"4cbab10951b82bc9"},"logger":"org.springframework.boot.actuate.r2dbc.ConnectionFactoryHealthIndicator","message":"Health check failed","context":"default","exception":"io.r2dbc.spi.R2dbcTimeoutException: Connection acquisition timed out after 3000ms\n\tat io.r2dbc.pool.ConnectionPool.lambda$null$10(ConnectionPool.java:163)\n\tat reactor.core.publisher.Mono.lambda$onErrorMap$28(Mono.java:3848)\n\tat reactor.core.publisher.Mono.lambda$onErrorResume$30(Mono.java:3938)\n\tat reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onError$3(TracingSubscriber.java:76)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onError(TracingSubscriber.java:76)\n\tat reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onError$3(TracingSubscriber.java:76)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onError(TracingSubscriber.java:76)\n\tat reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)\n\tat reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:296)\n\tat reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281)\n\tat reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:64)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:64)\n\tat reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:64)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:64)\n\tat reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270)\n\tat reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285)\n\tat io.sentry.spring.jakarta.webflux.SentryScheduleHook.lambda$apply$0(SentryScheduleHook.java:23)\n\tat reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)\n\tat reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\nCaused by: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 3000ms in 'Connection acquisition from [MariadbConnectionFactory-proxy [MariadbConnectionFactory{configuration=r2dbc:mariadb://test.cluster-test.ap-northeast-1.rds.amazonaws.com/test?password=***&username=test&allowPublicKeyRetrieval=true}]]' (and no fallback has been configured)\n\t... 22 common frames omitted\n"}

pkgonan · 2024-11-02T02:50:09Z

@mp911de
A little while ago, I did a downgrade from 1.0.2.RELEASE to 1.0.1.RELEASE and the deployment was successful.

My guess is that there is a bug in 1.0.2.RELEASE.

rbraeunlich · 2024-11-25T09:19:00Z

Hi @mp911de,
I can confirm that this introduced a bug. I use r2dbc with Spring Boot 3.3.x and Postgres. My application crashed with:

	java.lang.ClassCastException: class io.r2dbc.postgresql.PostgresqlConnection cannot be cast to class reactor.core.scheduler.Scheduler (io.r2dbc.postgresql.PostgresqlConnection and reactor.core.scheduler.Scheduler are in unnamed module of loader 'app')
	at io.r2dbc.pool.ConnectionPool.lambda$null$4(ConnectionPool.java:118) ~[r2dbc-pool-1.0.2.RELEASE.jar:1.0.2.RELEASE]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) ~[reactor-core-3.6.12.jar:3.6.12]
	... 208 common frames omitted
Wrapped by: org.springframework.transaction.CannotCreateTransactionException: Could not open R2DBC Connection for transaction
	at org.springframework.r2dbc.connection.R2dbcTransactionManager.lambda$doBegin$5(R2dbcTransactionManager.java:231) ~[spring-r2dbc-6.1.15.jar:6.1.15]

Downgrading to 1.0.1.RELEASE solved this for now.

mp911de · 2024-11-28T15:24:02Z

This is indeed a regression that becomes visible with implementations that do not return a scheduler (or when using an older R2DBC Postgres driver version). I fixed it with #220. Care to test against 1.0.3.BUILD-SNAPSHOT, available from https://oss.sonatype.org/content/repositories/snapshots whether the fix addresses the issue for you?

mp911de added the type: enhancement A general enhancement label Oct 16, 2024

mp911de added a commit that referenced this issue Oct 16, 2024

Explore acquisition scheduler offloading.

fd0ca3c

[resolves #217] Signed-off-by: Mark Paluch <mark.paluch@broadcom.com>

mp911de self-assigned this Oct 16, 2024

mp911de added this to the 1.0.2.RELEASE milestone Oct 16, 2024

mp911de mentioned this issue Oct 16, 2024

Explore acquisition scheduler offloading pgjdbc/r2dbc-postgresql#668

Closed

mp911de closed this as completed in 76dfa90 Oct 17, 2024

mp911de mentioned this issue Nov 28, 2024

Wrapped in ConnectionPool unwrapping doesn't guard against implementations returning this #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore acquisition scheduler offloading #217

Explore acquisition scheduler offloading #217

mp911de commented Oct 16, 2024

chemicL commented Oct 16, 2024

chemicL commented Oct 16, 2024 •

edited

Loading

pkgonan commented Nov 2, 2024

pkgonan commented Nov 2, 2024 •

edited

Loading

rbraeunlich commented Nov 25, 2024 •

edited

Loading

mp911de commented Nov 28, 2024

Explore acquisition scheduler offloading #217

Explore acquisition scheduler offloading #217

Comments

mp911de commented Oct 16, 2024

chemicL commented Oct 16, 2024

chemicL commented Oct 16, 2024 • edited Loading

pkgonan commented Nov 2, 2024

pkgonan commented Nov 2, 2024 • edited Loading

rbraeunlich commented Nov 25, 2024 • edited Loading

mp911de commented Nov 28, 2024

chemicL commented Oct 16, 2024 •

edited

Loading

pkgonan commented Nov 2, 2024 •

edited

Loading

rbraeunlich commented Nov 25, 2024 •

edited

Loading