Indexes across classes #5069

TeamXceleratorDev · 2015-10-07T16:18:47Z

Both OrientDB and Neo4J don't provide index support across classes at this time. For models with big data, that can be problematic. Indexes are required for performance, but all of the properties needed for the index have to be co-located in the same class. As a result, we are having to de-normalize the model if you will in order to arrange the data for the index.

If it is possible, can you please custom index feature that allows advanced users to index data across classes? That would be a huge improvement and really separate it from the competition :)

luigidellaquila · 2015-10-07T16:35:13Z

Hi @TeamXceleratorDev
OrientDB, since v 2.1, supports multiple inheritance, so as a temporary work-around you can consider creating a parent class for all the classes you want to index together and then define the index on that one.

Anyway, this proposal will be kept in strong consideration

Thanks

TeamXceleratorDev · 2015-10-07T21:37:53Z

Thanks for the quick reply. I did try out your suggestion, but it won't work for us. The reason is that when the included sub-classes have the same property names, they can collide. For example, if you had classes A, B, and C where each class has a "name" property. If you try to use multiple inheritance as a work-around for indexing, it can produce a weird situation. For now, I am thinking to create a custom class whose properties are links to the real vertices in other classes. I can then index the custom class for fast look-ups.

Thanks

smolinari · 2015-10-08T03:52:24Z

I am curious, if the index is on multiple classes, how would one add this multi-class index, if not on the same property types and names? Can you give a rough use case example of what a multi-class index should look like?

Scott

TeamXceleratorDev · 2015-10-11T04:42:50Z

Sure thing Scott. Let's take an example where we have classes: Employer, Employee, and Address. You could have edges or LinkSets tying employees to employers. You could also connect address vertices to their respective employees and employers. Again, this is just an example. Let's say we have thousands of employers and millions of employees and addresses. Naturally, we still want our queries to run as fast as possible for frequent use cases. Say that a regular query is to find the street address for employees that work for a particular company, have a given last name, and maybe in given state. Again, just an example. The where clause might check for "Home Depot", "Smith", and "Florida" as example. When you go against millions and millions of vertices without an index, it is painfully slow and uses a lot of memory. I suspect that is true for Neo4J as well. However, a query against an Orient index is very, very fast and uses virtually no memory. However, I can't create an index to assist this query because the 3 criteria properties belong to separate classes. That is my main point. For now, I have tried other techniques such as the graph equivalent of a View class which contains the fields that I want to index against with respective links/edges to the underlying vertices. This provides me with sub-second queries against millions of records and not a trite tutorial use case. But, using denormalization / views if you will is messy. I would much rather have a clean set of classes in my graph but simply allow me to create indexes which can go against properties that are connected via a link/edge. Not just properties directly off the class. This would be a GAMECHANGER. Seriously, it would allow adopters of Orient to store massive, massive data while still achieving sub-second queries without flooding memory. I hope that made sense :)

smolinari · 2015-10-11T06:43:33Z

Ok. Thanks for the great reply. I understand the situation now, so it made sense to me. I just am not sure if it is feasible or not. I guess one of the ODB crew would need to answer that.

On another note, in a document database world, which ODB offers, I would have long denormalized addresses directly into the Employer and Employee vertexes, as sub-documents. I don't think the need to normalize this data is necessary. It rarely changes and the size of duplicated data is nominal. So, one issue is resolved directly.

Another denormalization I'd do is to add the name of the Employer to the Employee vertex. This would change along with the link/ edge between the vertexes. The duplication would also be nominal.

So, now you have all the data necessary in the Employee vertex and can (I think) create a single index on it.

I am not sure about compound indexes on root and sub-document attributes though. Still, if that isn't possible, it wouldn't be hard to create a "location" property in the root, which uses data from the address sub-documents.

Scott

TeamXceleratorDev · 2015-10-14T15:02:18Z

Basically, it would be awesome to have the "option" to include index fields during index creation that involve traversing edges. In Oracle terms, DBAs will sometimes create materialized views to provide such capability. My company is evaluating Orient for use with 300+M records for a single component. With such massive data, queries require indexes to perform well. Regular indexes work for most cases, but are unable to handle cases where query criteria goes against traversed properties.

TeamXceleratorDev · 2015-10-21T14:32:41Z

@robfrank, please let me know if I can be of any assistance with this enhancement. I would be happy to discuss it with you at your convenience :)

robfrank · 2015-11-09T08:07:50Z

@TeamXceleratorDev I'm doing some spikes to uncover the better way to implement this and other use cases. I hope to have a beta, or maybe more than beta, version for 2.2. Stay tuned :)

TeamXceleratorDev · 2015-11-11T06:34:50Z

Great news @robfrank . I mentioned at work (major financial company), and people are interested in seeing it in action. We have complex queries against millions and millions of records. With Oracle and Neo4J, massive amounts of memory are needed to hold results in memory while the query results are being finalized. Having a mechanism for creating more complex indexes would allow us to have an O(1) lookup without taking up CPU and memory resources. It will be a real game-changer. Candidly, I surprised more people haven't spoken up about this topic. I am not sure people grab how critical indexes are against big data. Supporting indexes beyond the properties of a single class will be huge.

grossvater · 2015-11-13T08:52:46Z

@robfrank do you work on this for 2.2? I would like to experiment with it if you have something working.

refs: #5069

… all indexes refs: #5069

refs: #5069

smolinari · 2016-11-21T12:40:31Z

@robfrank - does the closing of this issue mean it is now part of 3.0?

Scott

robfrank · 2017-07-13T11:52:44Z

@smolinari we implemented this feature on EE edition: http://orientdb.com/docs/3.0.x/indexing/Full-Text-Index.html#cross-class-search-enterprise-edition

smolinari · 2017-07-13T12:34:16Z

Thanks for the note Roberto.

I'm waiting for the much more important indexing on data that doesn't have schema definitions i.e. schema-less indexing. Until that happens, ODB can't really say it is a NoSQL competitor in my book.

Is that feature planned at all?

Scott

robfrank · 2017-07-13T12:44:36Z

AFAIK, no, it is not planned.

smolinari · 2017-07-13T15:30:40Z

That's too bad. You are sort of false advertising without it.

Scott

grossvater · 2017-08-09T08:58:46Z

Any chance to have it implemented in community edition also?

robfrank · 2017-08-09T09:07:35Z

No, there are no plans to move it to the community.

luigidellaquila added the enhancement label Oct 7, 2015

lvca assigned robfrank Oct 12, 2015

tglman mentioned this issue Jun 15, 2016

Roadmap 3.0 #6005

Closed

34 tasks

smolinari mentioned this issue Sep 19, 2016

[OEP 11] Lucene Improvements orientechnologies/orientdb-labs#11

Open

11 tasks

robfrank added this to the 3.0 milestone Sep 23, 2016

robfrank added a commit that referenced this issue Nov 4, 2016

adds support for cross classes search index

3085b60

refs: #5069

robfrank added a commit that referenced this issue Nov 7, 2016

adds auto-creation of CrossClassSearchIndex inside OLuceneIndexFactory

8236ebc

refs: #5069

robfrank added a commit that referenced this issue Nov 8, 2016

refactor global search index to enable search of a single term across…

ff1f5a5

… all indexes refs: #5069

robfrank added a commit that referenced this issue Nov 9, 2016

refactor: renaming of some classes with OLucene prefix

4ed8214

refs: #5069

robfrank closed this as completed Nov 21, 2016

robfrank modified the milestones: 3.0, 3.0-M1 Apr 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexes across classes #5069

Indexes across classes #5069

TeamXceleratorDev commented Oct 7, 2015

luigidellaquila commented Oct 7, 2015

TeamXceleratorDev commented Oct 7, 2015

smolinari commented Oct 8, 2015

TeamXceleratorDev commented Oct 11, 2015

smolinari commented Oct 11, 2015

TeamXceleratorDev commented Oct 14, 2015

TeamXceleratorDev commented Oct 21, 2015

robfrank commented Nov 9, 2015

TeamXceleratorDev commented Nov 11, 2015

grossvater commented Nov 13, 2015

smolinari commented Nov 21, 2016

robfrank commented Jul 13, 2017

smolinari commented Jul 13, 2017 •

edited

Loading

robfrank commented Jul 13, 2017

smolinari commented Jul 13, 2017

grossvater commented Aug 9, 2017

robfrank commented Aug 9, 2017

Indexes across classes #5069

Indexes across classes #5069

Comments

TeamXceleratorDev commented Oct 7, 2015

luigidellaquila commented Oct 7, 2015

TeamXceleratorDev commented Oct 7, 2015

smolinari commented Oct 8, 2015

TeamXceleratorDev commented Oct 11, 2015

smolinari commented Oct 11, 2015

TeamXceleratorDev commented Oct 14, 2015

TeamXceleratorDev commented Oct 21, 2015

robfrank commented Nov 9, 2015

TeamXceleratorDev commented Nov 11, 2015

grossvater commented Nov 13, 2015

smolinari commented Nov 21, 2016

robfrank commented Jul 13, 2017

smolinari commented Jul 13, 2017 • edited Loading

robfrank commented Jul 13, 2017

smolinari commented Jul 13, 2017

grossvater commented Aug 9, 2017

robfrank commented Aug 9, 2017

smolinari commented Jul 13, 2017 •

edited

Loading