Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traverse with a while condition is too slow #3513

Closed
RCNTEC opened this issue Feb 2, 2015 · 7 comments
Closed

Traverse with a while condition is too slow #3513

RCNTEC opened this issue Feb 2, 2015 · 7 comments
Assignees
Milestone

Comments

@RCNTEC
Copy link

RCNTEC commented Feb 2, 2015

Hi!
I have a graph database with the following vertices: CompanyUnit, CompanyUnitEmployeeRelaiton, Employee, ActionEmployeeRelation, Action, ActionServiceRelation, Service. They are connected with uniquely named lightweight edges in the same order as mentioned above, so we can traverse from CompanyUnit to Service:
CompanyUnit ->cu_cuer-> CompanyUnitEmployeeRelation ->cuer_e-> Employee ->e_aer-> ActionEmployeeRelation ->aer_a-> Action ->a_asr-> ActionServiceRelation ->asr_s-> Service
CompanyUnits might have lower level CompanyUnits. Each vertex has four datetime fields version_start_of_life, version_end_of_life, start_date, end_date, where version_start_of_life and version_end_of_life represent the dates when the record is created and deleted (or updated) respectively, start_date and end_date show the period during which the record is valid.

In order to retrieve all services provided by some CompanyUnit in some period (start_date-end_date) from a particular point in time (version_start_of_life, version_end_of_life), I use the following query (#17:12 is the rid of the company unit to start from):
select @rid, @class, name from ( traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e'), out('e_aer'), out('aer_a'), out('a_asr'), out('asr_s') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' ) where @class = 'Service' order by name limit 1000000,
out('CompanyUnits') - traverses all lower CompanyUnits,
out('cu_cuer') - all CompanyUnitEmployeeRelations,
out('cuer_e') - all Employees and so on..
The problem is that this traverse is extremly slow (~ 50 secs.)

Trying to find the cause of this performance issue I tried to consecutively add the levels to traverse:

  1. traverse out('CompanyUnits') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' limit 1000000 - retrieves all lower CompanyUnits of the current one (Distributed version of Key/Value Server [moved] #17:12). 393 records returned in 3 secs.
    If I remove the while condition and run the following traverse
    traverse out('CompanyUnits') from #17:12 limit 1000000 It returns 53946 records in 12 seconds.
  2. traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' limit 1000000 - traverses from the company unit to employees. 1675 records returned in 4 secs
    Same without condition:
    traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e') from #17:12 limit 1000000 returned 74247 records in 21 secs.
  3. traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e'), out('e_aer'), out('aer_a') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' limit 1000000 - traverses from the company unit to actions. 29488 records returnd in 40,765 secs.
    The same traverse without condition took too long so I interrupted it..

I also need to traverse in the reverse direction from Services to Employees. So if I traverse from 10 Service rids:
select @rid, @class, name, surname, middle_name from ( traverse in('e_aer'), in('aer_a'), in('a_asr'), in('asr_a') from [ #22:584, #22:585, #22:586, #22:587, #22:588, #22:589, #22:590, #22:591, #22:592, #22:593 ] while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' ) where @class = 'Employee' limit 1000000 returns 498 records in 10 secs.
If I traverse from one rid:
select @rid, @class, name, surname, middle_name from ( traverse in('e_aer'), in('aer_a'), in('a_asr'), in('asr_a') from #22:584 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' ) where @class = 'Employee' limit 1000000 returns 217 records in 3.4 secs.
Again, these traverses are too slow

The number of records contained in different classes in the database:
CompanyUnit - 87,128
cu_cuer - 49,016
CompanyUnitEmployeeRelation - 6,025
cuer_e - 36,782
Employee - 126,242
e_aer - 3,713,388
ActionEmployeeRelation - 1,091,370
aer_a - 4,113,841
Action - 151,693
a_asr - 36,943
ActionServiceRelation - 9,124
asr_s - 16,271
Service - 4,521

I'm using orientdb-community-2.0

I wonder if there are any ways to speed up these traverses.

@RCNTEC RCNTEC changed the title Traverse with a while condition is very slow Traverse with a while condition is too slow Feb 2, 2015
@lvca
Copy link
Member

lvca commented Jun 18, 2015

In 2.1-rc4 we largely improved traversing speed. Could you try with this release?

@lvca lvca added this to the 2.1 GA milestone Jun 18, 2015
@lvca
Copy link
Member

lvca commented Jul 3, 2015

Any chance to try it with 2.1-rc5?

@lvca lvca self-assigned this Jul 3, 2015
@StarpTech
Copy link

@lvca How did you improved the performance. I`m very interesting in it.

@lvca
Copy link
Member

lvca commented Jul 3, 2015

We did a lot of optimization in 2.1. The internal structure of the document is lighter, we improved also out(), in() and both() functions by reducing the creation of Java Objects (therefore less stress to the JVM GC). Shortest path has an A* algorithm and recognize earlier when 2 nodes are not connected.

@lvca
Copy link
Member

lvca commented Jul 29, 2015

Any news on this?

@RCNTEC
Copy link
Author

RCNTEC commented Jul 30, 2015

Hey! Sorry for my late reply. I'm currently busy with another project and didn't have time to check out if there was any performance improvement in that traversal. I'll report here as soon as I do it.

@lvca
Copy link
Member

lvca commented Jul 30, 2015

I'm closing this issue, in case you don't ind any benefit, please reopen or comment this. Thanks.

@lvca lvca closed this as completed Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants