Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown is hanging #38164

Closed
shawkins opened this issue Jan 12, 2024 · 24 comments
Closed

Shutdown is hanging #38164

shawkins opened this issue Jan 12, 2024 · 24 comments
Labels
area/narayana Transactions / Narayana kind/bug Something isn't working triage/out-of-date This issue/PR is no longer valid or relevant

Comments

@shawkins
Copy link
Contributor

Describe the bug

See https://issues.redhat.com/browse/JBTM-3830 and keycloak/keycloak#23786 - for some of tests where the server has a failed start up, the process will fail to exit due to the TransactionReaper shutdown not completing.

Expected behavior

The keycloak server should stop.

Actual behavior

The keycloak server shutdown hangs.

How to Reproduce?

Add a RepeatedTest annotation to https://github.com/keycloak/keycloak/blob/5d3df39f04d64bcfe3acc077de837a57bd838b64/quarkus/tests/integration/src/test/java/org/keycloak/it/cli/dist/FipsDistTest.java#L124 - this only occurs about 10% of the time. In the output when it is hanging the test will capture the thread dump.

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

3.2.9

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@shawkins shawkins added the kind/bug Something isn't working label Jan 12, 2024
@quarkus-bot quarkus-bot bot added the area/securepipeline issues related to ensure Quarkus can be used in a secure pipeline setups like FIPS or similar label Jan 12, 2024
Copy link

quarkus-bot bot commented Jan 12, 2024

/cc @Karm (securepipeline), @jerboaa (securepipeline)

@shawkins
Copy link
Contributor Author

Unfortunately a simple change to address a possible race condition in narayana did not address this. I was hoping some more eyes could help determine if the remaining element can be forcibly terminated in this case.

@geoand geoand added the area/narayana Transactions / Narayana label Jan 15, 2024
@mmusgrov
Copy link
Contributor

@shawkins Can you add some thread dumps so that we can see which threads are waiting?

@shawkins
Copy link
Contributor Author

shawkins commented Jan 15, 2024

@mmusgrov other than the main thread being held at shutdown as shown in https://issues.redhat.com/browse/JBTM-3830 there are no other threads doing work - see the attached for the full thread dump.
threaddump.txt

Since no real work has been done at this point by Keycloak, the only thing that should have initiated a transaction is the liquibase schema migrations. Somehow the failed overall startup must leave that as being seen as on-going.

@Karm Karm removed the area/securepipeline issues related to ensure Quarkus can be used in a secure pipeline setups like FIPS or similar label Jan 16, 2024
@shawkins
Copy link
Contributor Author

To follow up the underlying issue is with TransactionReaper.terminate(false) - instead of immediately shutting down, the original timeout of any lingering transactions is still being honored. Thus the shutdown will hang until the transaction(s) times out.

The lingering transaction is not from the liquibase migration as mentioned above, but is from a transaction that is managed by Keycloak's TransactionalSessionHandler - https://github.com/keycloak/keycloak/blob/ed96b13312198569b7e68647eefe44d5ee24c739/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/transaction/TransactionalSessionHandler.java#L27

In the affected test cases the server does not complete a normal startup, and a session that was started does not have the corresponding close method called prior to TransactionReaper.terminate.

So a question on the quarkus/jaxrs side - should there be a guarentee that our CloseSessionHandler is called? https://github.com/keycloak/keycloak/blob/ed96b13312198569b7e68647eefe44d5ee24c739/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/integration/jaxrs/CloseSessionHandler.java#L68

cc @geoand

@geoand
Copy link
Contributor

geoand commented Jan 30, 2024

Not sure I undertstand, where were you hoping closeSession to be called from?

@shawkins
Copy link
Contributor Author

@geoand both our create and close are expected to be called via the handler chain on QuarkusResteasyReactiveRequestContext.

@geoand
Copy link
Contributor

geoand commented Jan 30, 2024

I admit I have not understood the details of the issue, but I don't see why you expect that would be called when no HTTP requests have been handled and therefore there is no REST layer code involved.

@shawkins
Copy link
Contributor Author

I admit I have not understood the details of the issue, but I don't see why you expect that would be called when no HTTP requests have been handled and therefore there is no REST layer code involved.

see the correction in #38164 (comment) - there is REST session being created which is responsible for the left-over transaction.

@geoand
Copy link
Contributor

geoand commented Jan 30, 2024

there is REST session

Is this a Keycloak thing?

@shawkins
Copy link
Contributor Author

Is this a Keycloak thing?

Keycloak is defining a CreateSessionHandler [1] and a CloseSessionHandler [2] which get installed as handlers / filters in the QuarkusResteasyReactiveRequestContext [3].

When transactions are cleaned up appropriately I can see paired calls to the create and close methods via their respective handlers - for example:

        at org.keycloak.quarkus.runtime.transaction.TransactionalSessionHandler.close(TransactionalSessionHandler.java:57)
        at org.keycloak.quarkus.runtime.integration.jaxrs.CloseSessionHandler.closeSession(CloseSessionHandler.java:69)
        at org.keycloak.quarkus.runtime.integration.jaxrs.CloseSessionHandler.filter(CloseSessionHandler.java:65)
        at org.jboss.resteasy.reactive.server.handlers.ResourceResponseFilterHandler.handle(ResourceResponseFilterHandler.java:25)
        at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
        at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)

The KeycloakSession, which is put into the routing context by the CreateSessionHandler, is implemented by a DefaultKeycloakSession [4] and provides access to the underlying transaction manager.

[1] https://github.com/keycloak/keycloak/blob/5373f3c97a801c000fb0ac6b54b2148b1e6c276b/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/integration/resteasy/CreateSessionHandler.java
[2] https://github.com/keycloak/keycloak/blob/5373f3c97a801c000fb0ac6b54b2148b1e6c276b/quarkus/runtime/src/main/java/org/keycloak/quarkus/runtime/integration/jaxrs/CloseSessionHandler.java
[3] https://github.com/quarkusio/quarkus/blob/2b4a129ce2ca85f43fa0b047541bd8da00d0d382/extensions/resteasy-reactive/quarkus-resteasy-reactive/runtime/src/main/java/io/quarkus/resteasy/reactive/server/runtime/QuarkusResteasyReactiveRequestContext.java
[4] https://github.com/keycloak/keycloak/blob/5373f3c97a801c000fb0ac6b54b2148b1e6c276b/services/src/main/java/org/keycloak/services/DefaultKeycloakSession.java

@geoand
Copy link
Contributor

geoand commented Jan 30, 2024

Just to clarify things a little:

Handlers and Filters are different things.

The former are a RESTEasy Reactive concept that essentially comprise the basic unit of an HTTP request handling pipeline - extensions can contribute to these as Keycloak does.
The latter are a JAX-RS concept that are meant to be called before and after a Resource method has been invoked - these are usually contributed by applications, but nothing stops extensions for doing so as well.

With that it mind, it seems like CloseSessionHandler is incorrectly named, but that's just a minor thing.

If I understand correctly, you start a transaction when an HTTP request is being processed and you want to close it when the processing is complete, correct?
What I still don't understand is the condition under which you are not seeing the transaction being closed.

@shawkins
Copy link
Contributor Author

If I understand correctly, you start a transaction when an HTTP request is being processed and you want to close it when the processing is complete, correct?

Yes that is the expectation.

What I still don't understand is the condition under which you are not seeing the transaction being closed.

That's not clear to me either yet. There are some other stacks that end up in the logs, such as #26604, but even with resolving that another stack pops up - and the orphaned transaction problem persists. I was hoping there was something simple / guarenteed here about how close runs.

@shawkins
Copy link
Contributor Author

shawkins commented Jan 30, 2024

After clearing up the #26604 here's what the log looks like:

2024-01-30 13:23:41,191 INFO  [org.keycloak.broker.provider.AbstractIdentityProviderMapper] (main) Registering class org.keycloak.broker.provider.mappersync.ConfigSyncEventListener
2024-01-30 13:23:43,845 INFO  [org.keycloak.quarkus.runtime.storage.legacy.liquibase.QuarkusJpaUpdaterProvider] (main) Initializing database schema. Using changelog META-INF/jpa-changelog-master.xml

UPDATE SUMMARY
Run:                        117
Previously run:               0
Filtered out:                 0
-------------------------------
Total change sets:          117

2024-01-30 13:23:47,987 INFO  [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (main) Node name: localhost-49123, Site name: null
2024-01-30 13:23:48,099 INFO  [org.keycloak.services] (main) KC-SERVICES0050: Initializing master realm
2024-01-30 13:23:49,023 INFO  [org.infinispan.CLUSTER] (main) ISPN000080: Disconnecting JGroups channel `ISPN`

Then a log of transaction begin via TransactionSessionHandler:

	at org.keycloak.quarkus.runtime.transaction.TransactionalSessionHandler.create(TransactionalSessionHandler.java:39)
	at org.keycloak.quarkus.runtime.integration.resteasy.CreateSessionHandler.handle(CreateSessionHandler.java:42)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

Then a bunch of repeating errors, but no call to CloseSessionHandler.close:

2024-01-30 13:23:49,181 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (executor-thread-1) Uncaught server error: org.infinispan.commons.IllegalLifecycleStateException: ISPN000323: realms is in 'TERMINATED' state and so it does not accept new invocations. Either restart it or recreate the cache container.
	at org.infinispan.cache.impl.SimpleCacheImpl.getDataContainer(SimpleCacheImpl.java:1307)
	at org.infinispan.cache.impl.SimpleCacheImpl.get(SimpleCacheImpl.java:559)
	at org.infinispan.cache.impl.StatsCollectingCache.get(StatsCollectingCache.java:54)
	at org.infinispan.cache.impl.EncoderCache.get(EncoderCache.java:762)
	at org.keycloak.models.cache.infinispan.CacheManager.get(CacheManager.java:95)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealmByName(RealmCacheSession.java:466)
	at org.keycloak.services.managers.RealmManager.getRealmByName(RealmManager.java:100)
	at org.keycloak.services.resources.RealmsResource.resolveRealmAndUpdateSession(RealmsResource.java:180)
	at org.keycloak.services.resources.RealmsResource.getRealmResource(RealmsResource.java:196)
	at org.keycloak.services.resources.RealmsResource$quarkusrestinvoker$getRealmResource_40080643164dd4bff8adf35fd68260654e9c3eab.invoke(Unknown Source)
	at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:49,192 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (executor-thread-1) Failed to create error page: org.infinispan.commons.IllegalLifecycleStateException: ISPN000323: realms is in 'TERMINATED' state and so it does not accept new invocations. Either restart it or recreate the cache container.
	at org.infinispan.cache.impl.SimpleCacheImpl.getDataContainer(SimpleCacheImpl.java:1307)
	at org.infinispan.cache.impl.SimpleCacheImpl.get(SimpleCacheImpl.java:559)
	at org.infinispan.cache.impl.StatsCollectingCache.get(StatsCollectingCache.java:54)
	at org.infinispan.cache.impl.EncoderCache.get(EncoderCache.java:762)
	at org.keycloak.models.cache.infinispan.CacheManager.get(CacheManager.java:95)
	at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealmByName(RealmCacheSession.java:466)
	at org.keycloak.services.managers.RealmManager.getRealmByName(RealmManager.java:100)
	at org.keycloak.services.error.KeycloakErrorHandler.resolveRealm(KeycloakErrorHandler.java:162)
	at org.keycloak.services.error.KeycloakErrorHandler.getResponse(KeycloakErrorHandler.java:97)
	at org.keycloak.services.error.KeycloakErrorHandler.toResponse(KeycloakErrorHandler.java:67)
	at org.jboss.resteasy.reactive.server.core.RuntimeExceptionMapper.mapException(RuntimeExceptionMapper.java:99)
	at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.mapExceptionIfPresent(ResteasyReactiveRequestContext.java:337)
	at org.jboss.resteasy.reactive.server.handlers.ExceptionHandler.handle(ExceptionHandler.java:15)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:49,193 ERROR [org.keycloak.headers.DefaultSecurityHeadersProvider] (executor-thread-1) MediaType not set on path /realms/master/, with response status 500
2024-01-30 13:23:49,197 ERROR [io.quarkus.vertx.http.runtime.QuarkusErrorHandler] (executor-thread-1) HTTP Request to //realms/master/ failed, error id: b0f9c8d9-c005-4338-8ee8-de1068b9b1fd-1: jakarta.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
	at org.keycloak.headers.DefaultSecurityHeadersProvider.addHeaders(DefaultSecurityHeadersProvider.java:75)
	at org.keycloak.services.filters.KeycloakSecurityHeadersFilter.filter(KeycloakSecurityHeadersFilter.java:43)
	at org.jboss.resteasy.reactive.server.handlers.ResourceResponseFilterHandler.handle(ResourceResponseFilterHandler.java:25)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:50,206 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (executor-thread-1) Uncaught server error: java.lang.NullPointerException: Cannot invoke "io.quarkus.arc.ArcContainer.getContexts(java.lang.Class)" because the return value of "io.quarkus.arc.Arc.container()" is null
	at io.quarkus.arc.impl.ClientProxies.getDelegate(ClientProxies.java:27)
	at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.arc$delegate(Unknown Source)
	at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.setOtherHttpContextObject(Unknown Source)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusCurrentRequest.set(QuarkusCurrentRequest.java:34)
	at org.jboss.resteasy.reactive.server.core.CurrentRequestManager.set(CurrentRequestManager.java:12)
	at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.handleRequestScopeActivation(ResteasyReactiveRequestContext.java:640)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.handleRequestScopeActivation(QuarkusResteasyReactiveRequestContext.java:39)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.requireCDIRequestScope(AbstractResteasyReactiveContext.java:257)
	at org.jboss.resteasy.reactive.server.handlers.ParameterHandler.handle(ParameterHandler.java:43)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:137)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

2024-01-30 13:23:50,207 ERROR [io.quarkus.vertx.http.runtime.QuarkusErrorHandler] (executor-thread-1) HTTP Request to //realms/master/ failed, error id: b0f9c8d9-c005-4338-8ee8-de1068b9b1fd-2: java.lang.NullPointerException: Cannot invoke "io.quarkus.arc.ArcContainer.getContexts(java.lang.Class)" because the return value of "io.quarkus.arc.Arc.container()" is null
	at io.quarkus.arc.impl.ClientProxies.getDelegate(ClientProxies.java:27)
	at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.arc$delegate(Unknown Source)
	at io.quarkus.vertx.http.runtime.CurrentVertxRequest_ClientProxy.getOtherHttpContextObject(Unknown Source)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusCurrentRequest.get(QuarkusCurrentRequest.java:21)
	at org.jboss.resteasy.reactive.server.core.CurrentRequestManager.get(CurrentRequestManager.java:8)
	at org.keycloak.quarkus.runtime.integration.resteasy.QuarkusKeycloakContext.getResteasyReactiveRequestContext(QuarkusKeycloakContext.java:68)
	at org.keycloak.quarkus.runtime.integration.resteasy.QuarkusKeycloakContext.createHttpRequest(QuarkusKeycloakContext.java:40)
	at org.keycloak.services.DefaultKeycloakContext.getHttpRequest(DefaultKeycloakContext.java:151)
	at org.keycloak.services.DefaultKeycloakContext.getRequestHeaders(DefaultKeycloakContext.java:96)
	at org.keycloak.services.error.KeycloakErrorHandler.getResponse(KeycloakErrorHandler.java:82)
	at org.keycloak.services.error.KeycloakErrorHandler$1.run(KeycloakErrorHandler.java:62)
	at org.keycloak.services.error.KeycloakErrorHandler$1.run(KeycloakErrorHandler.java:59)
	at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:382)
	at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:365)
	at org.keycloak.services.error.KeycloakErrorHandler.toResponse(KeycloakErrorHandler.java:58)
	at org.jboss.resteasy.reactive.server.core.RuntimeExceptionMapper.mapException(RuntimeExceptionMapper.java:99)
	at org.jboss.resteasy.reactive.server.core.ResteasyReactiveRequestContext.mapExceptionIfPresent(ResteasyReactiveRequestContext.java:337)
	at org.jboss.resteasy.reactive.server.handlers.ExceptionHandler.handle(ExceptionHandler.java:15)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:150)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
...

The end result is that this transaction will cause the transaction reaper to be stuck at shutdown holding the main thread (until https://issues.redhat.com/browse/JBTM-3830 is addressed):

"main" #1 prio=5 os_prio=0 cpu=12170.97ms elapsed=114.04s tid=0x00007f4924012f80 nid=0x2603f3 in Object.wait()  [0x00007f492a095000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@17.0.1/Native Method)
	- waiting on <no object reference available>
	at java.lang.Object.wait(java.base@17.0.1/Object.java:338)
	at com.arjuna.ats.arjuna.coordinator.TransactionReaper.shutdown(TransactionReaper.java:760)
	- locked <0x00000000e2a1fb80> (a com.arjuna.ats.arjuna.coordinator.TransactionReaper)
	at com.arjuna.ats.arjuna.coordinator.TransactionReaper.terminate(TransactionReaper.java:969)
	- locked <0x00000000e2a23b90> (a java.lang.Class for com.arjuna.ats.arjuna.coordinator.TransactionReaper)
	at io.quarkus.narayana.jta.runtime.NarayanaJtaRecorder.lambda$handleShutdown$3(NarayanaJtaRecorder.java:171)
	at io.quarkus.narayana.jta.runtime.NarayanaJtaRecorder$$Lambda$1062/0x000000080152c488.run(Unknown Source)
	at io.quarkus.runtime.StartupContext.runAllInReverseOrder(StartupContext.java:84)
	at io.quarkus.runtime.StartupContext.close(StartupContext.java:75)
	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
	at io.quarkus.runtime.Application.start(Application.java:101)
	at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
	at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
	at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:117)
	at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:33)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
	at picocli.CommandLine.execute(CommandLine.java:2170)
	at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:124)
	at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:107)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.1/Native Method)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.1/NativeMethodAccessorImpl.java:77)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.1/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@17.0.1/Method.java:568)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)

@geoand
Copy link
Contributor

geoand commented Jan 31, 2024

Is there any simple way for me to reproduce this?

@shawkins
Copy link
Contributor Author

Other than repeatively running our test case, no I don't have a simple reproducer.

For reference the exception that is expected to cause premature shutdown / abnormal startup is:

java.lang.RuntimeException: Unable to start HTTP server
	at io.quarkus.vertx.http.runtime.VertxHttpRecorder.doServerStart(VertxHttpRecorder.java:849)
	at io.quarkus.vertx.http.runtime.VertxHttpRecorder.startServer(VertxHttpRecorder.java:336)
	at io.quarkus.deployment.steps.VertxHttpProcessor$openSocket189362710.deploy_0(Unknown Source)
	at io.quarkus.deployment.steps.VertxHttpProcessor$openSocket189362710.deploy(Unknown Source)
	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
	at io.quarkus.runtime.Application.start(Application.java:101)
	at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
	at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
	at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:117)
	at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:33)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
	at picocli.CommandLine.execute(CommandLine.java:2170)
	at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:124)
	at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:107)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
	at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: malformed sequence
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
	at io.quarkus.vertx.http.runtime.VertxHttpRecorder.doServerStart(VertxHttpRecorder.java:770)
	... 25 more
Caused by: java.lang.IllegalArgumentException: malformed sequence
	at org.bouncycastle.asn1.bc.ObjectStore.<init>(Unknown Source)
	at org.bouncycastle.asn1.bc.ObjectStore.getInstance(Unknown Source)
	at org.bouncycastle.jcajce.provider.ProvBCFKS$BCFIPSKeyStoreSpi.engineLoad(Unknown Source)
	at java.base/java.security.KeyStore.load(KeyStore.java:1473)
	at io.vertx.core.net.impl.KeyStoreHelper.loadKeyStore(KeyStoreHelper.java:266)
	at io.vertx.core.net.KeyStoreOptionsBase.getHelper(KeyStoreOptionsBase.java:188)
	at io.vertx.core.net.KeyStoreOptionsBase.getKeyManagerFactory(KeyStoreOptionsBase.java:206)
	at io.vertx.core.net.impl.SSLHelper.lambda$build$6(SSLHelper.java:246)
	at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:180)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:277)
	at io.vertx.core.impl.ContextBase.lambda$internalExecuteBlocking$2(ContextBase.java:199)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1512)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

@geoand
Copy link
Contributor

geoand commented Jan 31, 2024

But in that case I still don't see how an HTTP request is being handled as the HTTP server has not come up

@shawkins
Copy link
Contributor Author

@geoand right that is confusing. Trapping this a bit more in the debugger, I can see resteasy reactive logic adding the QuarkusResteasyReactiveRequestContext as a task - at the same time the malformed sequence is being thrown from starting the HTTP server. Then when that task runs, it triggers the creation of the transaction. Stack for adding the task:

        at org.jboss.threads.EnhancedQueueExecutor.execute(EnhancedQueueExecutor.java:799)
        at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:219)
        at org.jboss.resteasy.reactive.server.handlers.RestInitialHandler.beginProcessing(RestInitialHandler.java:48)
        at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:23)
        at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:10)
        at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1286)
        at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)
        at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:144)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$1.handle(HttpServerCommonHandlers.java:58)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$1.handle(HttpServerCommonHandlers.java:36)
        at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1286)
        at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)
        at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:144)
        at io.quarkus.resteasy.reactive.server.runtime.ResteasyReactiveRecorder$13.handle(ResteasyReactiveRecorder.java:382)
        at io.quarkus.resteasy.reactive.server.runtime.ResteasyReactiveRecorder$13.handle(ResteasyReactiveRecorder.java:375)
        at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1286)
        at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:177)
        at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:144)
        at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:68)
        at io.vertx.ext.web.impl.RouterImpl.handle(RouterImpl.java:37)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$2.handle(HttpServerCommonHandlers.java:82)
        at io.quarkus.vertx.http.runtime.options.HttpServerCommonHandlers$2.handle(HttpServerCommonHandlers.java:65)
        at io.quarkus.vertx.http.runtime.VertxHttpRecorder$1.handle(VertxHttpRecorder.java:177)
        at io.quarkus.vertx.http.runtime.VertxHttpRecorder$1.handle(VertxHttpRecorder.java:153)
...

@geoand
Copy link
Contributor

geoand commented Jan 31, 2024

It seems to me like some HTTP request is trying to be served before everything is up and running

@mmusgrov
Copy link
Contributor

The recent merged narayana component upgrade, pr#39009, included a small change to how we handle shutdown. @shawkins would it be possible for you to retest to see if there are any other issues related to narayana?

@shawkins
Copy link
Contributor Author

would it be possible for you to retest to see if there are any other issues related to narayana?

Yes, I'll try the scenario again.

I think the cause of the orphaned transaction was identified as well - keycloak/keycloak#27682

@shawkins
Copy link
Contributor Author

No reoccurances in our test case after 50 runs. Instead I see logs like:

2024-03-11 21:00:02,410 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffffc0a801f3:9ced:65efa90f:1a aborting with 1 threads active!

Which would not have occurred when exhibiting the hanging behavior.

@mmusgrov
Copy link
Contributor

So @geoand and @shawkins can we close this one as fixed by the Narayana component upgrade?

@geoand
Copy link
Contributor

geoand commented Mar 18, 2024

👌

@geoand geoand closed this as completed Mar 18, 2024
@geoand geoand added the triage/out-of-date This issue/PR is no longer valid or relevant label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/narayana Transactions / Narayana kind/bug Something isn't working triage/out-of-date This issue/PR is no longer valid or relevant
Projects
None yet
Development

No branches or pull requests

4 participants