Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0M2] order by @rid desc query renders orientdb nonresponsive #2966

Closed
ghost opened this issue Oct 19, 2014 · 9 comments
Closed

[2.0M2] order by @rid desc query renders orientdb nonresponsive #2966

ghost opened this issue Oct 19, 2014 · 9 comments
Assignees
Milestone

Comments

@ghost
Copy link

ghost commented Oct 19, 2014

Problem 1: Queries like the following should be very fast:

select from Address order by @rid desc

The Address class has ~10 million records, so even with a disk-sort, this should be pretty fast. With an inverse cluster cursor, the query should be very fast, according to http://www.orientechnologies.com/docs/2.0/orientdb.wiki/SQL-Query.html.

But in fact, the query renders orientdb completely non-responsive. I can't even log into the server using the console (socket timeout). Seems like I triggered some kind of bug.

Note that the same problem occurs with ascending sort.

Problem 2: Probably as a side effect of Problem 1 (I had to kill the server), orientdb on restart now starts rebuilding indices. This takes a long time, and clients are unable to connect until it's ready. Such long startups renders orientdb a dubious choice in high-availability settings. Should be fixed in 2.0 IMO (that is, at least make the database available while indices are rebuilding)

This is a non-distributed database, running on a 64-bit machine/jvm with plenty of memory.

Thanks.

@ghost
Copy link
Author

ghost commented Oct 19, 2014

Just a followup that any order-by against non-indexed fields (on large clusters) makes every other concurrent query very slow. The actual order-by seems to take forever to complete (I gave up waiting), even for a small dataset like 10 million records and a very simple class.

The exact same query against the same dataset in MySQL completes within seconds! Something is clearly wrong there.

By the way, here's what OrientDB dumps when I kill the process during the @Rid query above:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.5-b02 mixed mode):

"OrientDB WAL Flush Task (demo)" #34 daemon prio=5 os_prio=0 tid=0x000000001fadd800 nid=0x1fc0 runnable [0x00000000215ae
000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c2774de8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionOb
ject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.j
ava:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"OrientDB Write Cache Flush Task (demo)" #25 daemon prio=5 os_prio=0 tid=0x000000001f6a4000 nid=0x275c runnable [0x00000
000219af000]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.offer(ScheduledThreadPoolExecutor.java:1010
)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.add(ScheduledThreadPoolExecutor.java:1037)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.add(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ScheduledThreadPoolExecutor.reExecutePeriodic(ScheduledThreadPoolExecutor.java:346)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:296
)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Thread-7" #24 daemon prio=5 os_prio=0 tid=0x000000001f493800 nid=0x1ef0 waiting on condition [0x00000000216ae000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c10effd8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionOb
ject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.j
ava:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"OrientDB <- BinaryClient (/127.0.0.1:65414)" #22 daemon prio=5 os_prio=0 tid=0x000000001f88c800 nid=0xa54 waiting for m
onitor entry [0x00000000214ae000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.orientechnologies.orient.core.index.hashindex.local.cache.OReadWriteDiskCache.updateCache(OReadWriteDiskC
ache.java:654)
at com.orientechnologies.orient.core.index.hashindex.local.cache.OReadWriteDiskCache.doLoad(OReadWriteDiskCache.
java:299)
at com.orientechnologies.orient.core.index.hashindex.local.cache.OReadWriteDiskCache.load(OReadWriteDiskCache.ja
va:267)
at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readFullEntry(OPaginatedClus
ter.java:1305)
at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster
.java:530)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedS
torage.java:1355)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedS
torage.java:716)
at com.orientechnologies.orient.core.db.raw.ODatabaseRaw.read(ODatabaseRaw.java:288)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.executeReadRecord(ODatabaseRecordAbstract
.java:915)
at com.orientechnologies.orient.core.tx.OTransactionNoTx.loadRecord(OTransactionNoTx.java:79)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordTx.load(ODatabaseRecordTx.java:277)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordTx.load(ODatabaseRecordTx.java:43)
at com.orientechnologies.orient.core.iterator.OIdentifiableIterator.readCurrentRecord(OIdentifiableIterator.java
:290)
at com.orientechnologies.orient.core.iterator.ORecordIteratorClusters.hasNext(ORecordIteratorClusters.java:149)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchFromTarget(OCommandExecutorSQLSelect.jav
a:1093)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:
377)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:338)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:64
)
at com.orientechnologies.orient.core.storage.OStorageEmbedded.executeCommand(OStorageEmbedded.java:85)
at com.orientechnologies.orient.core.storage.OStorageEmbedded.command(OStorageEmbedded.java:74)
at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:73)
at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:84)
at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.command(ONetworkProtocolBi
nary.java:1102)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkPro
tocolBinary.java:342)
at com.orientechnologies.orient.server.network.protocol.binary.OBinaryNetworkProtocolAbstract.execute(OBinaryNet
workProtocolAbstract.java:180)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:65)

"DestroyJavaVM" #21 prio=5 os_prio=0 tid=0x0000000002b72800 nid=0x2878 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"OrientDB ONetworkProtocolHttpDb listen at 0.0.0.0:2480-2490" #19 prio=5 os_prio=0 tid=0x000000001f81f800 nid=0x1fe0 wai
ting for monitor entry [0x00000000203de000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.text.resources.FormatData.getContents(FormatData.java:120)
at sun.util.resources.ParallelListResourceBundle.loadLookupTablesIfNecessary(ParallelListResourceBundle.java:168
)
at sun.util.resources.ParallelListResourceBundle.handleKeySet(ParallelListResourceBundle.java:134)
at sun.util.resources.ParallelListResourceBundle.keySet(ParallelListResourceBundle.java:143)
at sun.util.resources.ParallelListResourceBundle.containsKey(ParallelListResourceBundle.java:129)
at sun.util.resources.ParallelListResourceBundle$KeySet.contains(ParallelListResourceBundle.java:208)
at sun.util.resources.ParallelListResourceBundle.containsKey(ParallelListResourceBundle.java:129)
at sun.util.resources.ParallelListResourceBundle$KeySet.contains(ParallelListResourceBundle.java:208)
at sun.util.resources.ParallelListResourceBundle.containsKey(ParallelListResourceBundle.java:129)
at java.text.DateFormatSymbols.initializeData(DateFormatSymbols.java:702)
at java.text.DateFormatSymbols.(DateFormatSymbols.java:146)
at sun.util.locale.provider.DateFormatSymbolsProviderImpl.getInstance(DateFormatSymbolsProviderImpl.java:85)
at java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:359)
at java.text.DateFormatSymbols.getInstanceRef(DateFormatSymbols.java:349)
at java.text.SimpleDateFormat.(SimpleDateFormat.java:603)
at java.text.SimpleDateFormat.(SimpleDateFormat.java:580)
at com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetConnections.(O
ServerCommandGetConnections.java:39)
at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.registerStatelessComma
nds(ONetworkProtocolHttpAbstract.java:657)
at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.config(ONetworkProtoco
lHttpAbstract.java:87)
at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpDb.config(ONetworkProtocolHttpD
b.java:44)
at com.orientechnologies.orient.server.network.OServerNetworkListener.run(OServerNetworkListener.java:220)

"OrientDB ONetworkProtocolBinary listen at 0.0.0.0:2424-2430" #17 prio=5 os_prio=0 tid=0x000000001f82a800 nid=0x2b34 run
nable [0x00000000202df000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:3326)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at java.util.logging.LogManager$LoggerContext.processParentHandlers(LogManager.java:874)
at java.util.logging.LogManager$LoggerContext.addLocalLogger(LogManager.java:811)
- locked <0x00000006c0f23ad0> (a java.util.logging.LogManager$LoggerContext)
at java.util.logging.LogManager$LoggerContext.addLocalLogger(LogManager.java:758)
at java.util.logging.LogManager.addLogger(LogManager.java:1162)
at java.util.logging.LogManager.demandLogger(LogManager.java:556)
at java.util.logging.Logger.demandLogger(Logger.java:455)
at java.util.logging.Logger.getLogger(Logger.java:502)
at com.orientechnologies.common.log.OLogManager.log(OLogManager.java:81)
at com.orientechnologies.common.log.OLogManager.config(OLogManager.java:151)
at com.orientechnologies.orient.server.OClientConnectionManager.connect(OClientConnectionManager.java:114)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.config(ONetworkProtocolBin
ary.java:126)
at com.orientechnologies.orient.server.network.OServerNetworkListener.run(OServerNetworkListener.java:220)

"Timer-0" #12 daemon prio=5 os_prio=0 tid=0x000000001f63c000 nid=0x289c waiting for monitor entry [0x000000001fdde000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.logging.LogManager$LoggerContext.findLogger(LogManager.java:692)
- waiting to lock <0x00000006c0f23ad0> (a java.util.logging.LogManager$LoggerContext)
at java.util.logging.LogManager.getLogger(LogManager.java:1226)
at java.util.logging.LogManager.demandLogger(LogManager.java:551)
at java.util.logging.Logger.demandLogger(Logger.java:455)
at java.util.logging.Logger.getLogger(Logger.java:502)
at com.orientechnologies.common.log.OLogManager.log(OLogManager.java:81)
at com.orientechnologies.common.log.OLogManager.debug(OLogManager.java:106)
at com.orientechnologies.orient.server.OClientConnectionManager$1.run(OClientConnectionManager.java:70)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)

"Service Thread" #10 daemon prio=9 os_prio=0 tid=0x000000001e7a2000 nid=0x904 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C1 CompilerThread3" #9 daemon prio=9 os_prio=2 tid=0x000000001e718000 nid=0x2afc waiting on condition [0x00000000000000
00]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread2" #8 daemon prio=9 os_prio=2 tid=0x000000001e70f000 nid=0x28c8 waiting on condition [0x00000000000000
00]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #7 daemon prio=9 os_prio=2 tid=0x000000001e70b000 nid=0x1c88 waiting on condition [0x00000000000000
00]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x000000001e709000 nid=0x1c08 waiting on condition [0x00000000000000
00]
java.lang.Thread.State: RUNNABLE

"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001e706800 nid=0x1e18 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001e706000 nid=0x2598 waiting on condition [0x000000000000000
0]
java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0000000002c68800 nid=0x23e4 in Object.wait() [0x000000001e6df000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006c0e5d7e8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
- locked <0x00000006c0e5d7e8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:158)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x000000001c70c800 nid=0x1b08 in Object.wait() [0x000000001e5df000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006c0e5d9a0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
- locked <0x00000006c0e5d9a0> (a java.lang.ref.Reference$Lock)

"VM Thread" os_prio=2 tid=0x000000001c708800 nid=0x24a0 runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x0000000002b88000 nid=0x14b0 runnable

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x0000000002b89800 nid=0x1b14 runnable

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x0000000002b8b000 nid=0x25f0 runnable

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000002b8d800 nid=0x34c runnable

"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x0000000002b90000 nid=0x2854 runnable

"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x0000000002b91000 nid=0x14e8 runnable

"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x0000000002b94000 nid=0x203c runnable

"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x0000000002b95800 nid=0x2628 runnable

"VM Periodic Task Thread" os_prio=2 tid=0x000000001e7a3000 nid=0x708 waiting on condition

JNI global references: 1557

Heap
PSYoungGen total 773632K, used 309248K [0x000000076af80000, 0x00000007b7d00000, 0x00000007c0000000)
eden space 309248K, 100% used [0x000000076af80000,0x000000077dd80000,0x000000077dd80000)
from space 464384K, 0% used [0x000000077dd80000,0x000000077dd80000,0x000000079a300000)
to space 464384K, 0% used [0x000000079b780000,0x000000079b780000,0x00000007b7d00000)
ParOldGen total 2786816K, used 2786593K [0x00000006c0e00000, 0x000000076af80000, 0x000000076af80000)
object space 2786816K, 99% used [0x00000006c0e00000,0x000000076af48730,0x000000076af80000)
Metaspace used 19735K, capacity 20068K, committed 20352K, reserved 1067008K
class space used 2454K, capacity 2572K, committed 2688K, reserved 1048576K

@lvca
Copy link
Member

lvca commented Oct 20, 2014

@luigidellaquila I remember to wrote this optimization a couple of months ago, when the "order by @Rid desc" was optimized with a descendent iterator on class. A few tests took 0.1 to browse last 20 records in a class with millions, in reverse. Did you know why this seems to don't work anymore?

@lvca lvca added this to the 2.0 Final milestone Oct 20, 2014
@ghost
Copy link
Author

ghost commented Oct 20, 2014

The most worrying part of this test is this: any query against non-indexed properties completely kills the orientdb server, once you have a few million records. Best case is that other queries gets very slow, worst case is that orientdb stops responding entirely.

I think a) some kind of resource throttling is needed and 2) order-by on non-indexed fields are unreasonably slow (we're talking hours).

Out of curiosity, how is order-by implemented? Mayby shuffle the result records through a temporary b-tree? I actually let my order-by query on the 10-million records suite run for hours, and it eventually ended in an OutOfMemory error.

@luigidellaquila
Copy link
Member

@JavaLama in current implementation order by is implemented with normal in-memory sort algorithms.
We are working on these algorithms and we also plan to implement a disk based strategy for big result sets.
The time ordering takes can also depend on record structure (eg. deserializing a property value on a record with hundreds of properties is much more expensive than deserializing the same property on a small record).
In 2.0 we started working on these issues, but probably some of them will be completely addressed in 2.1

Anyway, from your thread dump it seems that your query is somehow locked.

Are you working on an Milestone release? Or are you on a SNAPSHOT?

thanks

Luigi

@luigidellaquila
Copy link
Member

@JavaLama do you think you can provide a memory dump together with the thread dump?
It would be very helpful

thanks

@ghost
Copy link
Author

ghost commented Oct 20, 2014

I'm using the milestone release.

I don't think further investigation is necessary since you mentioned it's in-memory sorting. That completely explain everything (gc pressure causes slowdown and eventually OOM).

I'm afraid orientdb is not usable to me at this point, until disk-based sorting is in place. On the other hand, everything else works beautifully so I'm prepared to revisit it in 2.1 or so.

@luigidellaquila
Copy link
Member

@JavaLama if all you need is RID based sorting, I should be able to fix it quickly.
For sorting by general (schemaless) properties on big quantities, I will be back with news soon, but it will probably be released after 2.0

@ghost
Copy link
Author

ghost commented Oct 20, 2014

@luigidellaquila No, I actually don't need RID based sorting now. I'm more in need of big-quantity sorting on schemaless properties. At least it shouldn't make orientdb grind to a halt.

Great to hear that it's being worked on.

@lvca
Copy link
Member

lvca commented Jan 20, 2015

Fixed in 2.0-final.

@lvca lvca closed this as completed Jan 20, 2015
@lvca lvca modified the milestones: 2.0 Final, 2.1 Jan 20, 2015
@lvca lvca assigned lvca and unassigned luigidellaquila Jan 20, 2015
@lvca lvca added the sql label Jan 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants