Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking on orientdb #9575

Closed
holocentric-bmsnext opened this issue Apr 6, 2021 · 3 comments
Closed

Locking on orientdb #9575

holocentric-bmsnext opened this issue Apr 6, 2021 · 3 comments
Milestone

Comments

@holocentric-bmsnext
Copy link

OrientDB Version: <3.1.7>

Java Version: <openjdk 13.0.2 2020-01-14>

OS: <Linux api-content-7c984f76ff-jrwnn 4.15.0-1102-azure #113~16.04.1-Ubuntu SMP Wed Dec 9 20:42:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux>

Expected behavior

Actual behavior

<Orientdb is not responsive, cannot login or list any databases from the studio start page.>

Steps to reproduce

This happens after a lot of parallel read/write the multiple databases. So I don't have the steps to reproduce it. However I have the full thread dump and logs output from orient, if required I will email it separately.

There were 60+ threads BLOCKED in the thread dump, all waiting to acquire [0x00000003c00353a8] or [0x00000003c00e88c0].

As you can see the thread listing databases already hold [0x00000003c00e88c0], but it never returns. The server has been like that for at least an hour and eventually we have to shutdown all databases and restart orient.

Do you know what is com.orientechnologies.orient.server.OServer.listDatabases doing? We do have a lot of databases (1068, and each with a lot of files. around 2000 per each database)

"OrientDB HTTP Connection /10.30.0.190:2480<-/10.30.0.100:35688": waiting to acquire [0x00000003c00e88c0], holding [0x00000003c00353a8]
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:453)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:79)
	at com.orientechnologies.orient.server.OSystemDatabase.openSystemDatabase(OSystemDatabase.java:93)
	at com.orientechnologies.orient.server.OSystemDatabase.execute(OSystemDatabase.java:102)
	at com.orientechnologies.orient.server.security.ODefaultServerSecurity.getSystemUser(ODefaultServerSecurity.java:329)
	at com.orientechnologies.orient.server.security.authenticator.OSystemUserAuthenticator.isAuthorized(OSystemUserAuthenticator.java:88)
	at com.orientechnologies.orient.server.security.ODefaultServerSecurity.isAuthorized(ODefaultServerSecurity.java:355)
	at com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetListDatabases.execute(OServerCommandGetListDatabases.java:66)
	at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.service(ONetworkProtocolHttpAbstract.java:253)
	at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.execute(ONetworkProtocolHttpAbstract.java:811)
	at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:67)



"OrientDB (/10.30.0.190:2424) <- BinaryClient (/10.30.0.8:51416)": running, holding [0x00000003c00e88c0]
	at sun.nio.fs.UnixNativeDispatcher.stat0(Native Method)
	at sun.nio.fs.UnixNativeDispatcher.stat(UnixNativeDispatcher.java:291)
	at sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:70)
	at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:52)
	at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
	at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
	at java.nio.file.Files.readAttributes(Files.java:1737)
	at java.nio.file.Files.isDirectory(Files.java:2192)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$scanDatabaseDirectory$6(OrientDBEmbedded.java:969)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded$$Lambda$233/292680111.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.scanDatabaseDirectory(OrientDBEmbedded.java:967)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.listDatabases(OrientDBEmbedded.java:841)
	at com.orientechnologies.orient.server.OServer.listDatabases(OServer.java:1303)
	at com.orientechnologies.orient.server.OConnectionBinaryExecutor.executeSubscribeDistributedConfiguration(OConnectionBinaryExecutor.java:1782)
	at com.orientechnologies.orient.client.remote.message.OSubscribeDistributedConfigurationRequest.execute(OSubscribeDistributedConfigurationRequest.java:35)
	at com.orientechnologies.orient.server.OConnectionBinaryExecutor.executeSubscribe(OConnectionBinaryExecutor.java:1768)
	at com.orientechnologies.orient.client.remote.message.OSubscribeRequest.execute(OSubscribeRequest.java:74)
	at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.sessionRequest(ONetworkProtocolBinary.java:355)
	at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:239)
	at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:67)

Thanks,
Kathy

@lvca
Copy link
Member

lvca commented Jun 25, 2021

Could you please send the entire stack trace of all the threads?

@holocentric-bmsnext
Copy link
Author

I don't have the entire stack trace anymore, but I have managed to avoid the lock and improve the performance by moving a lot of the databases outside the default storage, to some custom storage.

I discovered that listDatabases and scanDatabaseDirectory was called on every command executed remotely, and when you have a lot of folders in the directory, somehow when multiple execute threads were called, they locks up each other.

If I move most of the databases into the custom storage like this by changing the server config xml, then because scanDatabaseDirectory doesn't scan custom storages, it speeds things up. This is only a workaround, but it clearly shows where the issue is.

. . . .

tglman added a commit that referenced this issue Sep 13, 2021
tglman added a commit that referenced this issue Sep 14, 2021
@tglman tglman added this to the 3.1.x milestone Feb 23, 2022
@tglman
Copy link
Member

tglman commented Jun 14, 2022

Hi,

Fixes for this case have been done and released, closing this, please re-open if you still experience problems.

Regards

@tglman tglman closed this as completed Jun 14, 2022
@tglman tglman modified the milestones: 3.1.x, 3.1.19 Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants