Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recalculation process blocked waiting for lock release #1975

Open
lampajr opened this issue Sep 11, 2024 · 0 comments
Open

Recalculation process blocked waiting for lock release #1975

lampajr opened this issue Sep 11, 2024 · 0 comments

Comments

@lampajr
Copy link
Member

lampajr commented Sep 11, 2024

Describe the bug

Looks like that some processes are blocked waiting for {{lock}} to be released at

void withRecalculationLock(Runnable runnable) {
recalculationLock.lock();
try {
runnable.run();
} finally {
recalculationLock.unlock();
}
}

This could block the thread for long time, especially considering that the recalculation could take long time:

2024-09-09 16:08:13,718 797ee630d871 quarkus-run.jar[7] WARN  [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[horreum.run.pool-2,5,main] has been blocked for 61418 ms, time limit is 60000 ms: io.vertx.core.VertxException: Thread blocked
	at java.base@17.0.11/jdk.internal.misc.Unsafe.park(Native Method)
	at java.base@17.0.11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
	at java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
	at java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)
	at java.base@17.0.11/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
	at java.base@17.0.11/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
	at io.hyperfoil.tools.horreum.svc.DatasetServiceImpl.withRecalculationLock(DatasetServiceImpl.java:554)
	at io.hyperfoil.tools.horreum.svc.DatasetServiceImpl.onNewDataset(DatasetServiceImpl.java:563)
	at io.hyperfoil.tools.horreum.svc.DatasetServiceImpl_ClientProxy.onNewDataset(Unknown Source)
	at io.hyperfoil.tools.horreum.svc.ServiceMediator.newDataset(ServiceMediator.java:141)
	at io.hyperfoil.tools.horreum.svc.ServiceMediator_ClientProxy.newDataset(Unknown Source)
	at io.hyperfoil.tools.horreum.svc.RunServiceImpl.createDataset(RunServiceImpl.java:1355)
	at io.hyperfoil.tools.horreum.svc.RunServiceImpl.transform(RunServiceImpl.java:1334)

and as a consequence, for async processed events, it could force db transaction to be killed after hitting the default timeout:

2024-09-09 16:09:12,440 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:12 in state  RUN
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:13 in state  RUN
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:14 in state  RUN
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:15 in state  RUN
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:16 in state  RUN
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:17 in state  RUN
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff0a000264:9bc3:66df1d30:12 invoked while multiple threads active within it.
2024-09-09 16:09:12,441 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check processing TX 0:ffff0a000264:9bc3:66df1d30:18 in state  RUN
2024-09-09 16:09:12,442 797ee630d871 quarkus-run.jar[7] WARN  [com.arj.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012381: Action id 0:ffff0a000264:9bc3:66df1d30:12 completed with multiple threads - thread horreum.run.pool-2 was in progress with java.base@17.0.11/jdk.internal.misc.Unsafe.park(Native Method)
java.base@17.0.11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)
java.base@17.0.11/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base@17.0.11/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
io.hyperfoil.tools.horreum.svc.DatasetServiceImpl.withRecalculationLock(DatasetServiceImpl.java:554)
io.hyperfoil.tools.horreum.svc.DatasetServiceImpl.onNewDataset(DatasetServiceImpl.java:563)
io.hyperfoil.tools.horreum.svc.DatasetServiceImpl_ClientProxy.onNewDataset(Unknown Source)
io.hyperfoil.tools.horreum.svc.ServiceMediator.newDataset(ServiceMediator.java:141)
io.hyperfoil.tools.horreum.svc.ServiceMediator_ClientProxy.newDataset(Unknown Source)
io.hyperfoil.tools.horreum.svc.RunServiceImpl.createDataset(RunServiceImpl.java:1355)
io.hyperfoil.tools.horreum.svc.RunServiceImpl.transform(RunServiceImpl.java:1334)
io.hyperfoil.tools.horreum.svc.RunServiceImpl_Subclass.transform$$superforward(Unknown Source)

As a final result this could prevent the queues to be properly drained.

Additional context

The reason for adding that {{recalculationLock}} seems has been the following:
""" it appears thatin the past we hit an issue (or feature of postgres SSI) : https://wiki.postgresql.org/wiki/SSI, which is the internal mechanism of Postgres to ensure that all snapshots are serializable. He must have experienced TX rollbaks that he could not explain. But, we are not using serlializable isolation level any more, i think we reduced the isolation level to repeatable reads. so postgres will not be using SSI any more """

To Reproduce

I was not able to reproduce the same locally yet.

Version

What is the version of Horreum ?

If you are using a development branch; what is the commit id ?

0.14.1

but after quick look, if the root cause is always the lock I think that all versions including master branch are affected as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant