-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.0M2] order by @rid desc query renders orientdb nonresponsive #2966
Comments
Just a followup that any order-by against non-indexed fields (on large clusters) makes every other concurrent query very slow. The actual order-by seems to take forever to complete (I gave up waiting), even for a small dataset like 10 million records and a very simple class. The exact same query against the same dataset in MySQL completes within seconds! Something is clearly wrong there. By the way, here's what OrientDB dumps when I kill the process during the @Rid query above: Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.5-b02 mixed mode): "OrientDB WAL Flush Task (demo)" #34 daemon prio=5 os_prio=0 tid=0x000000001fadd800 nid=0x1fc0 runnable [0x00000000215ae
"OrientDB Write Cache Flush Task (demo)" #25 daemon prio=5 os_prio=0 tid=0x000000001f6a4000 nid=0x275c runnable [0x00000 "Thread-7" #24 daemon prio=5 os_prio=0 tid=0x000000001f493800 nid=0x1ef0 waiting on condition [0x00000000216ae000]
"OrientDB <- BinaryClient (/127.0.0.1:65414)" #22 daemon prio=5 os_prio=0 tid=0x000000001f88c800 nid=0xa54 waiting for m "DestroyJavaVM" #21 prio=5 os_prio=0 tid=0x0000000002b72800 nid=0x2878 waiting on condition [0x0000000000000000] "OrientDB ONetworkProtocolHttpDb listen at 0.0.0.0:2480-2490" #19 prio=5 os_prio=0 tid=0x000000001f81f800 nid=0x1fe0 wai "OrientDB ONetworkProtocolBinary listen at 0.0.0.0:2424-2430" #17 prio=5 os_prio=0 tid=0x000000001f82a800 nid=0x2b34 run "Timer-0" #12 daemon prio=5 os_prio=0 tid=0x000000001f63c000 nid=0x289c waiting for monitor entry [0x000000001fdde000] "Service Thread" #10 daemon prio=9 os_prio=0 tid=0x000000001e7a2000 nid=0x904 runnable [0x0000000000000000] "C1 CompilerThread3" #9 daemon prio=9 os_prio=2 tid=0x000000001e718000 nid=0x2afc waiting on condition [0x00000000000000 "C2 CompilerThread2" #8 daemon prio=9 os_prio=2 tid=0x000000001e70f000 nid=0x28c8 waiting on condition [0x00000000000000 "C2 CompilerThread1" #7 daemon prio=9 os_prio=2 tid=0x000000001e70b000 nid=0x1c88 waiting on condition [0x00000000000000 "C2 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x000000001e709000 nid=0x1c08 waiting on condition [0x00000000000000 "Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001e706800 nid=0x1e18 runnable [0x0000000000000000] "Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001e706000 nid=0x2598 waiting on condition [0x000000000000000 "Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0000000002c68800 nid=0x23e4 in Object.wait() [0x000000001e6df000] "Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x000000001c70c800 nid=0x1b08 in Object.wait() [0x000000001e5df000] "VM Thread" os_prio=2 tid=0x000000001c708800 nid=0x24a0 runnable "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x0000000002b88000 nid=0x14b0 runnable "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x0000000002b89800 nid=0x1b14 runnable "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x0000000002b8b000 nid=0x25f0 runnable "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000002b8d800 nid=0x34c runnable "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x0000000002b90000 nid=0x2854 runnable "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x0000000002b91000 nid=0x14e8 runnable "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x0000000002b94000 nid=0x203c runnable "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x0000000002b95800 nid=0x2628 runnable "VM Periodic Task Thread" os_prio=2 tid=0x000000001e7a3000 nid=0x708 waiting on condition JNI global references: 1557 Heap |
@luigidellaquila I remember to wrote this optimization a couple of months ago, when the "order by @Rid desc" was optimized with a descendent iterator on class. A few tests took 0.1 to browse last 20 records in a class with millions, in reverse. Did you know why this seems to don't work anymore? |
The most worrying part of this test is this: any query against non-indexed properties completely kills the orientdb server, once you have a few million records. Best case is that other queries gets very slow, worst case is that orientdb stops responding entirely. I think a) some kind of resource throttling is needed and 2) order-by on non-indexed fields are unreasonably slow (we're talking hours). Out of curiosity, how is order-by implemented? Mayby shuffle the result records through a temporary b-tree? I actually let my order-by query on the 10-million records suite run for hours, and it eventually ended in an OutOfMemory error. |
@JavaLama in current implementation order by is implemented with normal in-memory sort algorithms. Anyway, from your thread dump it seems that your query is somehow locked. Are you working on an Milestone release? Or are you on a SNAPSHOT? thanks Luigi |
@JavaLama do you think you can provide a memory dump together with the thread dump? thanks |
I'm using the milestone release. I don't think further investigation is necessary since you mentioned it's in-memory sorting. That completely explain everything (gc pressure causes slowdown and eventually OOM). I'm afraid orientdb is not usable to me at this point, until disk-based sorting is in place. On the other hand, everything else works beautifully so I'm prepared to revisit it in 2.1 or so. |
@JavaLama if all you need is RID based sorting, I should be able to fix it quickly. |
@luigidellaquila No, I actually don't need RID based sorting now. I'm more in need of big-quantity sorting on schemaless properties. At least it shouldn't make orientdb grind to a halt. Great to hear that it's being worked on. |
Fixed in 2.0-final. |
Problem 1: Queries like the following should be very fast:
The Address class has ~10 million records, so even with a disk-sort, this should be pretty fast. With an inverse cluster cursor, the query should be very fast, according to http://www.orientechnologies.com/docs/2.0/orientdb.wiki/SQL-Query.html.
But in fact, the query renders orientdb completely non-responsive. I can't even log into the server using the console (socket timeout). Seems like I triggered some kind of bug.
Note that the same problem occurs with ascending sort.
Problem 2: Probably as a side effect of Problem 1 (I had to kill the server), orientdb on restart now starts rebuilding indices. This takes a long time, and clients are unable to connect until it's ready. Such long startups renders orientdb a dubious choice in high-availability settings. Should be fixed in 2.0 IMO (that is, at least make the database available while indices are rebuilding)
This is a non-distributed database, running on a 64-bit machine/jvm with plenty of memory.
Thanks.
The text was updated successfully, but these errors were encountered: