Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow down significantly when changing the limit return records from 1000 to 2000 #8703

Closed
ASXHOLE opened this issue Dec 17, 2018 · 5 comments
Assignees
Milestone

Comments

@ASXHOLE
Copy link

ASXHOLE commented Dec 17, 2018

OrientDB Version: 3.10

Java Version: JDK1.8.0_111

OS: CentOs 7

Expected behavior

Actual behavior

Why is the query very slow (more than 33 seconds) when limiting the return result within 2000 records, while very quick (winthin 1 second) when limiting the return result within 1000 records)?

Steps to reproduce

match
{Class:Con,as: Con1,where:(CON_NO='3620067544')}
.out('hasTag')
.in('hasTag')
{as:Con2 ,where:(CON_NO !='3620067544')}
return Con1.CON_NO,Con2.CON_NO
limit 5000

@luigidellaquila
Copy link
Member

Hi @ASXHOLE

Any chance to have a dataset to reproduce the problem?

Thanks

Luigi

@luigidellaquila luigidellaquila self-assigned this Dec 17, 2018
@ASXHOLE
Copy link
Author

ASXHOLE commented Dec 18, 2018

Hi, @luigidellaquila
I‘m so sorry , I made a mistake. My OrientDB Version is 3.0.10 .

My dataset is so large that i split it into a 10MB zip file(You need to remove the ".zip" suffix before unzip the files,like "OrientDB_data.zip.001")

OrientDB_data.zip.001.zip
OrientDB_data.zip.002.zip
OrientDB_data.zip.003.zip
OrientDB_data.zip.004.zip
OrientDB_data.zip.005.zip
OrientDB_data.zip.006.zip
OrientDB_data.zip.007.zip
OrientDB_data.zip.008.zip
OrientDB_data.zip.009.zip
OrientDB_data.zip.010.zip

@Tracy0231
Copy link

I want to know if any progress with this? I have the same problem...

@luigidellaquila
Copy link
Member

Hi @ASXHOLE @Tracy0231

I checked your DB and I came to the conclusion that the main problem here is that the MATCH executor does some early loading of the connected patterns and in your specific case you have this situation:

  • node with CON_NO='3620067544' is connected to four other nodes: TAG_ID = 1001058, 1003050, 1003113, 1003118
  • node with TAG_ID = 1001058 is connected to ~1600 other nodes, so when you do LIMIT 1000 all they are loaded and checked, and you get the result in a few seconds
  • the other TAG_IDs have more than two million connected nodes each, so when you do LIMIT 2000 you load both TAG_ID = 1001058 and TAG_ID = 1003050, and the executor does an early loading of around 2.500.000 other nodes.

All this deserves some optimization, we can easily do some lazy loading and save a lot of resources (both in terms of execution time and memory consumption), I added it to my TODO list and hopefully I'll manage to do it soon.

In general, please also consider having many supernodes (ie. vertices with millions of connected edges) in a graph is not considered a very good practice

Thanks

Luigi

@luigidellaquila
Copy link
Member

Hi @ASXHOLE @Tracy0231

I just pushed a fix to 3.0.x branch, now the query takes around ten seconds also with LIMIT 2000

The fix will be released with v 3.0.13

Thanks

Luigi

@luigidellaquila luigidellaquila added this to the 3.0.13 milestone Dec 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants