Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JtraTransactionManager keeps reference to already closed scope (and Transaction/connection ThreadLocal) when JtaTxnListener#afterCompletion is called by a different thread #3423

Closed
VictorJimenezKwast opened this issue Jun 18, 2024 · 3 comments · Fixed by #3424
Assignees
Labels
Milestone

Comments

@VictorJimenezKwast
Copy link
Contributor

Expected behavior

When the afterCompletion is called on a JtaTxnListener it should invalidate the corresponding transaction regardless of which thread registered this transaction.

Actual behavior

We're running Ebean 14.3.1 on a JEE container (WildFly) with the JtaTransactionManager enabled. WildFly includes a process that keeps track of (long) running transactions called the TransactionReaper. This background process aborts transactions and kills the underlying connection when it exceeds a certain pre-configured timeout.

When the Reaper detects a timeout it:

  • calls the afterCompletion on the transaction (from the TransactionReaper thread)
  • returns the connection to the pool
  • causing the container to not call afterCompletion on the original thread since the Reaper has already done so.

This causes issues because the JtaTransactionManager stores the current transaction in a ThreadLocal (via TransactionScopeManager). This ThreadLocal is only cleaned up in the afterCompletion which does not work in this case because the TransactionReaper runs in a different Thread. After this has occurred all subsequent queries with Ebean will remain broken on that thread.

Relevant logging:

TransactionReaper killing a transaction:

15:44:33,167 WARN  (Transaction Reaper) [com.arjuna.ats.arjuna] ARJUNA012117: TransactionReaper::check processing TX 0:ffff7f000101:-18aac15b:66718ee9:31f in state  RUN
15:44:33,168 WARN  (Transaction Reaper Worker 0) [com.arjuna.ats.arjuna] ARJUNA012095: Abort of action id 0:ffff7f000101:-18aac15b:66718ee9:31f invoked while multiple threads active within it.
15:44:33,171 WARN  (Transaction Reaper Worker 0) [com.arjuna.ats.arjuna] ARJUNA012381: Action id 0:ffff7f000101:-18aac15b:66718ee9:31f completed with multiple threads - thread default task-1 was in progress with java.base@11.0.20.1/java.net.SocketInputStream.$$YJP$$_jr$jni$socketRead0(Native Method)
java.base@11.0.20.1/java.net.SocketInputStream.socketRead0(SocketInputStream.java)
15:44:33,171 WARN  (Transaction Reaper Worker 0) [com.arjuna.ats.arjuna] ARJUNA012108: CheckedAction::check - atomic action 0:ffff7f000101:-18aac15b:66718ee9:31f aborting with 1 threads active!
15:44:33,192 DEBUG (Transaction Reaper Worker 0) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] HealthcareDS: returnConnection(4a46625, false) [1/20]
15:44:33,192 WARN  (default task-1) [com.arjuna.ats.arjuna] ARJUNA012077: Abort called on already aborted atomic action 0:ffff7f000101:-18aac15b:66718ee9:31f
15:44:33,194 WARN  (Transaction Reaper Worker 0) [com.arjuna.ats.arjuna] ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffff7f000101:-18aac15b:66718ee9:31f

Exception that occurs on the original thread:

15:44:33,196 ERROR (default task-1) [org.jboss.as.ejb3.invocation] WFLYEJB0034: Jakarta Enterprise Beans Invocation failed on component DebugServiceImpl for method public abstract void com.nedap.healthcare.administration.crm.client.DebugService.debug(): javax.ejb.EJBException: javax.persistence.PersistenceException: Query threw SQLException:Statement cancelled due to timeout or client request Query was:SELECT * from debug_lock FOR UPDATE

Exceptions that occur on subsequent requests on the same Thread:

Caused by: javax.persistence.PersistenceException: Query threw SQLException:IJ031040: Connection is not associated with a managed connection: org.jboss.jca.adapters.jdbc.jdk8.WrappedConnectionJDK8@17af4927 Bind values:[null] Query was:<redacted>
	at deployment.ioserver.ear//io.ebean.config.dbplatform.SqlCodeTranslator.translate(SqlCodeTranslator.java:85) [ebean-api-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebean.config.dbplatform.DatabasePlatform.translate(DatabasePlatform.java:212) [ebean-api-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.query.CQueryEngine.translate(CQueryEngine.java:139) [ebean-core-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.query.DefaultOrmQueryEngine.translate(DefaultOrmQueryEngine.java:37) [ebean-core-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.core.OrmQueryRequest.translate(OrmQueryRequest.java:58) [ebean-core-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.query.CQuery.createPersistenceException(CQuery.java:647) [ebean-core-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.query.CQueryEngine.find(CQueryEngine.java:395) [ebean-core-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.query.DefaultOrmQueryEngine.findId(DefaultOrmQueryEngine.java:162) [ebean-core-14.3.1-javax.jar:14.3.1-javax]
	at deployment.ioserver.ear//io.ebeaninternal.server.core.OrmQueryRequest.findId(OrmQueryRequest.java:344) [ebean-core-14.3.1-javax.jar:14.3.1-javax]

Possible solutions

I've tried two ways to solve this:

  1. de-thread-local the Transaction (scope).
    This broke a lot of behavior when I tried it out and probably needs a bigger refactor (or and attempt from someone more knowledgeable) to get it working.

  2. Mark transaction as invalid in afterCompletion and check this flag when accessing the ThreadLocal transaction.
    I've made a PR via Mark transaction as inactive in JtaTransactionManager during AfterCom… #3422 and tested that against our stack and this works for WildFly but it's a bit hard to oversee if this keeps working in other stacks that use the JtaTransactionManager. It also re-uses (abuses) the existing active flag for this which might warrant a separate flag.

@rbygrave
Copy link
Member

Great write up, very good.

  1. de-thread-local the Transaction (scope).

My thought is that isn't a good path in that with the current ThreadLocal will be useful for any ebean use between the JTA transaction demarcation. So good to stick with it if we can.

  1. Mark transaction as invalid in afterCompletion

Yes, this seems reasonable. We actually have the SpiTransaction there so we can be a little more explicit - follow the gist of the Draft PR but be more explicit.

It also re-uses (abuses) the existing active flag for this which might warrant a separate flag.

I think the active flag to false is the right thing to here - I think that's pretty much exactly what we want to do in effect.

rbygrave added a commit that referenced this issue Jun 20, 2024
…and Transaction/connection ThreadLocal) when JtaTxnListener#afterCompletion is called by a different thread

Bugfix for "Long/slow transaction reaper" in Wildfire that can close/rollback a transaction in a different thread. Fix is to use active flag to inactivate the transaction and detect that case in JtaTransactionManager.getCurrentTransaction()
@rbygrave
Copy link
Member

Hi @VictorJimenezKwast would you like to review that PR. Pretty much follows the PR you prepared. Thanks!!

@VictorJimenezKwast
Copy link
Contributor Author

Approved the PR and tested against our stack. Thanks for picking this up so quickly.

rbygrave added a commit that referenced this issue Jun 20, 2024
…and Transaction/connection ThreadLocal) when JtaTxnListener#afterCompletion is called by a different thread (#3424)

Bugfix for "Long/slow transaction reaper" in Wildfire that can close/rollback a transaction in a different thread. Fix is to use active flag to inactivate the transaction and detect that case in JtaTransactionManager.getCurrentTransaction()
@rbygrave rbygrave self-assigned this Jun 20, 2024
@rbygrave rbygrave added the bug label Jun 20, 2024
@rbygrave rbygrave added this to the 14.3.2 milestone Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants