Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore acquisition scheduler offloading #217

Closed
mp911de opened this issue Oct 16, 2024 · 6 comments
Closed

Explore acquisition scheduler offloading #217

mp911de opened this issue Oct 16, 2024 · 6 comments
Assignees
Labels
type: enhancement A general enhancement
Milestone

Comments

@mp911de
Copy link
Member

mp911de commented Oct 16, 2024

Reactor Pool uses a single-threaded acquisition method to dispatch pool allocation requests as the consequence of its non-blocking synchronization means. This can cause congestion on a single thread and negatively impact scalability.

We should explore whether we could obtain a Scheduler from a Connection in case it is Wrapped.

@mp911de mp911de added the type: enhancement A general enhancement label Oct 16, 2024
mp911de added a commit that referenced this issue Oct 16, 2024
[resolves #217]

Signed-off-by: Mark Paluch <mark.paluch@broadcom.com>
@mp911de mp911de self-assigned this Oct 16, 2024
@mp911de mp911de added this to the 1.0.2.RELEASE milestone Oct 16, 2024
@chemicL
Copy link

chemicL commented Oct 16, 2024

Thanks! Linking the original issue reactor/reactor-pool#247. I'll test this change and follow-up. It would be neat to offer a generic reactor-pool mechanism but this approach is still great if all the r2dbc drivers can take advantage of it as demonstrated in the postgresql case.

@chemicL
Copy link

chemicL commented Oct 16, 2024

I can confirm this change brings a scalability boost where all event loops are used evenly for the processing that follows an acquisition (either a new connection or re-use from the pool). And it also feels more aligned with the SPI of r2dbc (compared to an implicit contract in reactor-pool).

@pkgonan
Copy link

pkgonan commented Nov 2, 2024

@mp911de
Hi,

When i change the r2dbc-pool version 1.0.1.RELEASE to 1.0.2.RELEASE, i got an error like below.

{"timestamp":"2024-11-02T02:00:07.300Z","level":"WARN","thread":"parallel-16","mdc":{"trace_id":"14d93682b73679ce02ef7dd04fa32152","trace_flags":"01","span_id":"4cbab10951b82bc9"},"logger":"org.springframework.boot.actuate.r2dbc.ConnectionFactoryHealthIndicator","message":"Health check failed","context":"default","exception":"io.r2dbc.spi.R2dbcTimeoutException: Connection acquisition timed out after 3000ms\n\tat io.r2dbc.pool.ConnectionPool.lambda$null$10(ConnectionPool.java:163)\n\tat reactor.core.publisher.Mono.lambda$onErrorMap$28(Mono.java:3848)\n\tat reactor.core.publisher.Mono.lambda$onErrorResume$30(Mono.java:3938)\n\tat reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onError$3(TracingSubscriber.java:76)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onError(TracingSubscriber.java:76)\n\tat reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onError$3(TracingSubscriber.java:76)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onError(TracingSubscriber.java:76)\n\tat reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)\n\tat reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:296)\n\tat reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281)\n\tat reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:64)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:64)\n\tat reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.lambda$onNext$1(TracingSubscriber.java:64)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:97)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.withActiveSpan(TracingSubscriber.java:91)\n\tat io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:64)\n\tat reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270)\n\tat reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285)\n\tat io.sentry.spring.jakarta.webflux.SentryScheduleHook.lambda$apply$0(SentryScheduleHook.java:23)\n\tat reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)\n\tat reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\nCaused by: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 3000ms in 'Connection acquisition from [MariadbConnectionFactory-proxy [MariadbConnectionFactory{configuration=r2dbc:mariadb://test.cluster-test.ap-northeast-1.rds.amazonaws.com/test?password=***&username=test&allowPublicKeyRetrieval=true}]]' (and no fallback has been configured)\n\t... 22 common frames omitted\n"}

@pkgonan
Copy link

pkgonan commented Nov 2, 2024

@mp911de
A little while ago, I did a downgrade from 1.0.2.RELEASE to 1.0.1.RELEASE and the deployment was successful.

My guess is that there is a bug in 1.0.2.RELEASE.

@rbraeunlich
Copy link

rbraeunlich commented Nov 25, 2024

Hi @mp911de,
I can confirm that this introduced a bug. I use r2dbc with Spring Boot 3.3.x and Postgres. My application crashed with:

	java.lang.ClassCastException: class io.r2dbc.postgresql.PostgresqlConnection cannot be cast to class reactor.core.scheduler.Scheduler (io.r2dbc.postgresql.PostgresqlConnection and reactor.core.scheduler.Scheduler are in unnamed module of loader 'app')
	at io.r2dbc.pool.ConnectionPool.lambda$null$4(ConnectionPool.java:118) ~[r2dbc-pool-1.0.2.RELEASE.jar:1.0.2.RELEASE]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) ~[reactor-core-3.6.12.jar:3.6.12]
	... 208 common frames omitted
Wrapped by: org.springframework.transaction.CannotCreateTransactionException: Could not open R2DBC Connection for transaction
	at org.springframework.r2dbc.connection.R2dbcTransactionManager.lambda$doBegin$5(R2dbcTransactionManager.java:231) ~[spring-r2dbc-6.1.15.jar:6.1.15]

Downgrading to 1.0.1.RELEASE solved this for now.

@mp911de
Copy link
Member Author

mp911de commented Nov 28, 2024

This is indeed a regression that becomes visible with implementations that do not return a scheduler (or when using an older R2DBC Postgres driver version). I fixed it with #220. Care to test against 1.0.3.BUILD-SNAPSHOT, available from https://oss.sonatype.org/content/repositories/snapshots whether the fix addresses the issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

4 participants