Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4 node cluster - fail to deploy DB over the network on startup #8994

Closed
ygyg70 opened this issue Aug 17, 2019 · 4 comments
Closed

4 node cluster - fail to deploy DB over the network on startup #8994

ygyg70 opened this issue Aug 17, 2019 · 4 comments
Assignees
Milestone

Comments

@ygyg70
Copy link

ygyg70 commented Aug 17, 2019

OrientDB Version: 3.0.23

Java Version: 11.0.2

OS: Windows 10

Expected behavior

4 node cluster, all nodes are master for all data, embedded in a Servlet container (Jetty).
All 4 nodes should join the cluster

Actual behavior

Failure to deploy the DB - usually the 3rd or 4th node, usually at at the 2nd chunk (out of 6).
Get the following exception on the receiving node:
java.io.EOFException: Unexpected end of ZLIB input stream
at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:245)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
at java.base/java.util.zip.ZipInputStream.read(ZipInputStream.java:195)
at com.orientechnologies.common.io.OIOUtils.copyStream(OIOUtils.java:205)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.extractFile(OZIPCompressionUtil.java:97)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.uncompressDirectory(OZIPCompressionUtil.java:83)
at com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage.restore(OLocalPaginatedStorage.java:294)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.restore(OrientDBEmbedded.java:418)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$6.call(ODistributedAbstractPlugin.java:1991)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$6.call(ODistributedAbstractPlugin.java:1930)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeInDistributedDatabaseLock(ODistributedAbstractPlugin.java:1770)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.installDatabaseOnLocalNode(ODistributedAbstractPlugin.java:1930)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.installDatabaseFromNetwork(ODistributedAbstractPlugin.java:1597)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.requestDatabaseFullSync(ODistributedAbstractPlugin.java:1418)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.requestFullDatabase(ODistributedAbstractPlugin.java:1100)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$3.call(ODistributedAbstractPlugin.java:997)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$3.call(ODistributedAbstractPlugin.java:948)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeInDistributedDatabaseLock(ODistributedAbstractPlugin.java:1770)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.installDatabase(ODistributedAbstractPlugin.java:947)
at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installNewDatabasesFromCluster(OHazelcastPlugin.java:1439)
at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.startup(OHazelcastPlugin.java:300)
at com.orientechnologies.orient.server.OServer.registerPlugins(OServer.java:1194)
at com.orientechnologies.orient.server.OServer.activate(OServer.java:469)

com.orientechnologies.orient.core.exception.ODatabaseException: Cannot create database 'db_name'
at com.orientechnologies.orient.core.db.OrientDBEmbedded.restore(OrientDBEmbedded.java:424)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$6.call(ODistributedAbstractPlugin.java:1991)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$6.call(ODistributedAbstractPlugin.java:1930)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeInDistributedDatabaseLock(ODistributedAbstractPlugin.java:1770)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.installDatabaseOnLocalNode(ODistributedAbstractPlugin.java:1930)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.installDatabaseFromNetwork(ODistributedAbstractPlugin.java:1597)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.requestDatabaseFullSync(ODistributedAbstractPlugin.java:1418)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.requestFullDatabase(ODistributedAbstractPlugin.java:1100)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$3.call(ODistributedAbstractPlugin.java:997)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$3.call(ODistributedAbstractPlugin.java:948)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeInDistributedDatabaseLock(ODistributedAbstractPlugin.java:1770)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.installDatabase(ODistributedAbstractPlugin.java:947)
at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installNewDatabasesFromCluster(OHazelcastPlugin.java:1439)
at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.startup(OHazelcastPlugin.java:300)
at com.orientechnologies.orient.server.OServer.registerPlugins(OServer.java:1194)
at com.orientechnologies.orient.server.OServer.activate(OServer.java:469)
Caused by: java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.logAndPrepareForRethrow(OAbstractPaginatedStorage.java:5918)
at com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage.restore(OLocalPaginatedStorage.java:330)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.restore(OrientDBEmbedded.java:418)
... 88 more
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:245)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
at java.base/java.util.zip.ZipInputStream.read(ZipInputStream.java:195)
at com.orientechnologies.common.io.OIOUtils.copyStream(OIOUtils.java:205)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.extractFile(OZIPCompressionUtil.java:97)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.uncompressDirectory(OZIPCompressionUtil.java:83)
at com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage.restore(OLocalPaginatedStorage.java:294)
... 89 more

On the sending node there is no error.

After this error, the node that sent the data stays in BACKUP status and doesn't recover. The receiving node stays in SYNCHRONIZING status and doesn't recover either.

Steps to reproduce

I can share privately the database files that reproduce the issue.
Seems that it happens more on nodes that have less processors (always with 2 processors, less frequently with 4 processors, I don't have a 4 node setup with more than 4 processors so can't tell)

@ygyg70 ygyg70 changed the title Fail to deploy DB over the ntwork 4 node cluster - fail to deploy DB over the network on startup Aug 19, 2019
@ygyg70
Copy link
Author

ygyg70 commented Aug 20, 2019

Took a thread dump while transfer seems to be stuck. Is this relevant?

"OrientDB SyncDatabase node=node_name db=db_name@15422" prio=5 tid=0x107 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:273)
at java.io.PipedInputStream.receive(PipedInputStream.java:231)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at com.orientechnologies.orient.server.distributed.impl.task.TeeOutputStream.write(TeeOutputStream.java:30)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
- locked <0x3c68> (a java.io.BufferedOutputStream)
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
at java.util.zip.ZipOutputStream.write(ZipOutputStream.java:332)
- locked <0x3c69> (a java.util.zip.ZipOutputStream)
at com.orientechnologies.common.io.OIOUtils.copyStream(OIOUtils.java:206)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.addFile(OZIPCompressionUtil.java:208)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.addFolder(OZIPCompressionUtil.java:140)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.addFolder(OZIPCompressionUtil.java:126)
at com.orientechnologies.orient.core.compression.impl.OZIPCompressionUtil.compressDirectory(OZIPCompressionUtil.java:48)
at com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage.backup(OLocalPaginatedStorage.java:241)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.backup(ODistributedStorage.java:1663)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.backup(ODatabaseDocumentAbstract.java:2334)
at com.orientechnologies.orient.server.distributed.impl.task.OBackgroundBackup.run(OBackgroundBackup.java:110)
at java.lang.Thread.run(Thread.java:834)

@ygyg70
Copy link
Author

ygyg70 commented Aug 21, 2019

This issue is preventing us from deploying to production, any comment appreciated.

@ygyg70
Copy link
Author

ygyg70 commented Aug 30, 2019

13 days and no comment - should I assume no one is looking at submitted issues? Is there any other way to get help?

@tglman
Copy link
Member

tglman commented Jun 1, 2020

Hi,

Many of this problem have been fixed in a more recent version,if you can update it, this is quite likely is fixed.
Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants