-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traverse with a while condition is too slow #3513
Comments
In 2.1-rc4 we largely improved traversing speed. Could you try with this release? |
Any chance to try it with 2.1-rc5? |
@lvca How did you improved the performance. I`m very interesting in it. |
We did a lot of optimization in 2.1. The internal structure of the document is lighter, we improved also out(), in() and both() functions by reducing the creation of Java Objects (therefore less stress to the JVM GC). Shortest path has an A* algorithm and recognize earlier when 2 nodes are not connected. |
Any news on this? |
Hey! Sorry for my late reply. I'm currently busy with another project and didn't have time to check out if there was any performance improvement in that traversal. I'll report here as soon as I do it. |
I'm closing this issue, in case you don't ind any benefit, please reopen or comment this. Thanks. |
Hi!
I have a graph database with the following vertices: CompanyUnit, CompanyUnitEmployeeRelaiton, Employee, ActionEmployeeRelation, Action, ActionServiceRelation, Service. They are connected with uniquely named lightweight edges in the same order as mentioned above, so we can traverse from CompanyUnit to Service:
CompanyUnit ->cu_cuer-> CompanyUnitEmployeeRelation ->cuer_e-> Employee ->e_aer-> ActionEmployeeRelation ->aer_a-> Action ->a_asr-> ActionServiceRelation ->asr_s-> Service
CompanyUnits might have lower level CompanyUnits. Each vertex has four datetime fields
version_start_of_life
,version_end_of_life
,start_date
,end_date
, whereversion_start_of_life
andversion_end_of_life
represent the dates when the record is created and deleted (or updated) respectively,start_date
andend_date
show the period during which the record is valid.In order to retrieve all services provided by some CompanyUnit in some period (start_date-end_date) from a particular point in time (version_start_of_life, version_end_of_life), I use the following query (
#17:12
is the rid of the company unit to start from):select @rid, @class, name from ( traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e'), out('e_aer'), out('aer_a'), out('a_asr'), out('asr_s') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' ) where @class = 'Service' order by name limit 1000000
,out('CompanyUnits') - traverses all lower CompanyUnits,
out('cu_cuer') - all CompanyUnitEmployeeRelations,
out('cuer_e') - all Employees and so on..
The problem is that this traverse is extremly slow (~ 50 secs.)
Trying to find the cause of this performance issue I tried to consecutively add the levels to traverse:
traverse out('CompanyUnits') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' limit 1000000
- retrieves all lower CompanyUnits of the current one (Distributed version of Key/Value Server [moved] #17:12). 393 records returned in 3 secs.If I remove the while condition and run the following traverse
traverse out('CompanyUnits') from #17:12 limit 1000000
It returns 53946 records in 12 seconds.traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' limit 1000000
- traverses from the company unit to employees. 1675 records returned in 4 secsSame without condition:
traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e') from #17:12 limit 1000000
returned 74247 records in 21 secs.traverse out('CompanyUnits'), out('cu_cuer'), out('cuer_e'), out('e_aer'), out('aer_a') from #17:12 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' limit 1000000
- traverses from the company unit to actions. 29488 records returnd in 40,765 secs.The same traverse without condition took too long so I interrupted it..
I also need to traverse in the reverse direction from Services to Employees. So if I traverse from 10 Service rids:
select @rid, @class, name, surname, middle_name from ( traverse in('e_aer'), in('aer_a'), in('a_asr'), in('asr_a') from [ #22:584, #22:585, #22:586, #22:587, #22:588, #22:589, #22:590, #22:591, #22:592, #22:593 ] while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' ) where @class = 'Employee' limit 1000000
returns 498 records in 10 secs.If I traverse from one rid:
select @rid, @class, name, surname, middle_name from ( traverse in('e_aer'), in('aer_a'), in('a_asr'), in('asr_a') from #22:584 while version_start_of_life <= '2014-09-30 00:00:00' and version_end_of_life >= '2014-09-30 00:00:00' and start_date <= '2014-09-30 23:59:59' and end_date >= '2014-09-01 00:00:00' ) where @class = 'Employee' limit 1000000
returns 217 records in 3.4 secs.Again, these traverses are too slow
The number of records contained in different classes in the database:
CompanyUnit - 87,128
cu_cuer - 49,016
CompanyUnitEmployeeRelation - 6,025
cuer_e - 36,782
Employee - 126,242
e_aer - 3,713,388
ActionEmployeeRelation - 1,091,370
aer_a - 4,113,841
Action - 151,693
a_asr - 36,943
ActionServiceRelation - 9,124
asr_s - 16,271
Service - 4,521
I'm using orientdb-community-2.0
I wonder if there are any ways to speed up these traverses.
The text was updated successfully, but these errors were encountered: