Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexes across classes #5069

Closed
TeamXceleratorDev opened this issue Oct 7, 2015 · 17 comments
Closed

Indexes across classes #5069

TeamXceleratorDev opened this issue Oct 7, 2015 · 17 comments
Assignees
Milestone

Comments

@TeamXceleratorDev
Copy link

Both OrientDB and Neo4J don't provide index support across classes at this time. For models with big data, that can be problematic. Indexes are required for performance, but all of the properties needed for the index have to be co-located in the same class. As a result, we are having to de-normalize the model if you will in order to arrange the data for the index.

If it is possible, can you please custom index feature that allows advanced users to index data across classes? That would be a huge improvement and really separate it from the competition :)

@luigidellaquila
Copy link
Member

Hi @TeamXceleratorDev
OrientDB, since v 2.1, supports multiple inheritance, so as a temporary work-around you can consider creating a parent class for all the classes you want to index together and then define the index on that one.

Anyway, this proposal will be kept in strong consideration

Thanks

@TeamXceleratorDev
Copy link
Author

Thanks for the quick reply. I did try out your suggestion, but it won't work for us. The reason is that when the included sub-classes have the same property names, they can collide. For example, if you had classes A, B, and C where each class has a "name" property. If you try to use multiple inheritance as a work-around for indexing, it can produce a weird situation. For now, I am thinking to create a custom class whose properties are links to the real vertices in other classes. I can then index the custom class for fast look-ups.

Thanks

@smolinari
Copy link
Contributor

I am curious, if the index is on multiple classes, how would one add this multi-class index, if not on the same property types and names? Can you give a rough use case example of what a multi-class index should look like?

Scott

@TeamXceleratorDev
Copy link
Author

Sure thing Scott. Let's take an example where we have classes: Employer, Employee, and Address. You could have edges or LinkSets tying employees to employers. You could also connect address vertices to their respective employees and employers. Again, this is just an example. Let's say we have thousands of employers and millions of employees and addresses. Naturally, we still want our queries to run as fast as possible for frequent use cases. Say that a regular query is to find the street address for employees that work for a particular company, have a given last name, and maybe in given state. Again, just an example. The where clause might check for "Home Depot", "Smith", and "Florida" as example. When you go against millions and millions of vertices without an index, it is painfully slow and uses a lot of memory. I suspect that is true for Neo4J as well. However, a query against an Orient index is very, very fast and uses virtually no memory. However, I can't create an index to assist this query because the 3 criteria properties belong to separate classes. That is my main point. For now, I have tried other techniques such as the graph equivalent of a View class which contains the fields that I want to index against with respective links/edges to the underlying vertices. This provides me with sub-second queries against millions of records and not a trite tutorial use case. But, using denormalization / views if you will is messy. I would much rather have a clean set of classes in my graph but simply allow me to create indexes which can go against properties that are connected via a link/edge. Not just properties directly off the class. This would be a GAMECHANGER. Seriously, it would allow adopters of Orient to store massive, massive data while still achieving sub-second queries without flooding memory. I hope that made sense :)

@smolinari
Copy link
Contributor

Ok. Thanks for the great reply. I understand the situation now, so it made sense to me. I just am not sure if it is feasible or not. I guess one of the ODB crew would need to answer that.

On another note, in a document database world, which ODB offers, I would have long denormalized addresses directly into the Employer and Employee vertexes, as sub-documents. I don't think the need to normalize this data is necessary. It rarely changes and the size of duplicated data is nominal. So, one issue is resolved directly.

Another denormalization I'd do is to add the name of the Employer to the Employee vertex. This would change along with the link/ edge between the vertexes. The duplication would also be nominal.

So, now you have all the data necessary in the Employee vertex and can (I think) create a single index on it.

I am not sure about compound indexes on root and sub-document attributes though. Still, if that isn't possible, it wouldn't be hard to create a "location" property in the root, which uses data from the address sub-documents.

Scott

@TeamXceleratorDev
Copy link
Author

Basically, it would be awesome to have the "option" to include index fields during index creation that involve traversing edges. In Oracle terms, DBAs will sometimes create materialized views to provide such capability. My company is evaluating Orient for use with 300+M records for a single component. With such massive data, queries require indexes to perform well. Regular indexes work for most cases, but are unable to handle cases where query criteria goes against traversed properties.

@TeamXceleratorDev
Copy link
Author

@robfrank, please let me know if I can be of any assistance with this enhancement. I would be happy to discuss it with you at your convenience :)

@robfrank
Copy link
Contributor

robfrank commented Nov 9, 2015

@TeamXceleratorDev I'm doing some spikes to uncover the better way to implement this and other use cases. I hope to have a beta, or maybe more than beta, version for 2.2. Stay tuned :)

@TeamXceleratorDev
Copy link
Author

Great news @robfrank . I mentioned at work (major financial company), and people are interested in seeing it in action. We have complex queries against millions and millions of records. With Oracle and Neo4J, massive amounts of memory are needed to hold results in memory while the query results are being finalized. Having a mechanism for creating more complex indexes would allow us to have an O(1) lookup without taking up CPU and memory resources. It will be a real game-changer. Candidly, I surprised more people haven't spoken up about this topic. I am not sure people grab how critical indexes are against big data. Supporting indexes beyond the properties of a single class will be huge.

@grossvater
Copy link

@robfrank do you work on this for 2.2? I would like to experiment with it if you have something working.

@smolinari
Copy link
Contributor

@robfrank - does the closing of this issue mean it is now part of 3.0?

Scott

@robfrank robfrank modified the milestones: 3.0, 3.0-M1 Apr 12, 2017
@robfrank
Copy link
Contributor

@smolinari
Copy link
Contributor

smolinari commented Jul 13, 2017

Thanks for the note Roberto.

I'm waiting for the much more important indexing on data that doesn't have schema definitions i.e. schema-less indexing. Until that happens, ODB can't really say it is a NoSQL competitor in my book.

Is that feature planned at all?

Scott

@robfrank
Copy link
Contributor

AFAIK, no, it is not planned.

@smolinari
Copy link
Contributor

That's too bad. You are sort of false advertising without it.

Scott

@grossvater
Copy link

Any chance to have it implemented in community edition also?

@robfrank
Copy link
Contributor

robfrank commented Aug 9, 2017

No, there are no plans to move it to the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants