-
-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Include() like method on queries based on a kind of navigation between documents #21
Comments
I think eager loading can be done manually in most cases. If yessql had to handle references, it would need to support identity maps, and also handle the lifetime of the collection, probably replace all objects by their ids meaning custom serialization. Lots of work and potential bugs IMO. |
Eager loading can be done manually in every case, but it means Orchard won't be optimal by default unless you program it. I think that the final purpose of having those methods is having the possibility of offering to an user the posibility of setting a field for eagerly loading in a projection query through the admin UI. This will make a big difference in ease of use and performance for regular users. Related to implementation I don't think we need a complex implementation for us at least a simple one will be enough. Identity map is only needed while the query is solved, we don't need to maintain it at request level or sth like that, so I don't see complexity on Identity map or lifetime of the collection specially because YesSql doesn't track changes so we don't need to be worried of lifetime. Lifetime of the collection should be the same lifetime currently you are using for first level content items. Last part is the one I've less knowledge: "probably replace all objects by their ids meaning custom serialization.". The key point will be to find an elegant way of doing that leaving serialization as much automatic as we can. Maybe marking property with Navigation object within a field as non serializable using [JsonIgnore] attribute and also adding a custom attribute to this property defining the relation and involved keys. This custom attribute could be provided by YesSql and won't affect to serialization. YesSql can detect it in the type definition of the field in C# to determine how to fulfill the navigation property. |
In fact, if we follow the proposal I said of design the fields in a way that they return an exception when you attempt to access them without loading them previously then you are doing Orchard optimal by default. Because if a user defines a projection query and in the view there is an access to a referenced content item through a field that is not eager loaded an exception will be thrown. We will catch that exception and will show a message saying he needs to mark that field to be eager loaded. So, Orchard will be optimal or doesn't work in all the cases. |
Let's use an example to see all the impact:
Saving
Serialized like this
Loading
NB: There would be no way to filter which referenced documents are included, like Alternate solutionMaybe you just need to reify your the relationship, and save an document which would contain the information between the two, and index it. This way with one query you can get all references of multiple root aggregates, and assign them back. This could be used to filter also, this could also handle aggregation vs. composition. And with dapper we could also load the products and the categories with a single query, then ask the relationship object the referenced elements. The property would just be a marker. It would also handle when a referenced document is deleted, all its references are automatically delete. It's funny as this is a topic I had an intern work for me on it 10 years ago.
|
Hahaha I love this aspect of our profession: We as devs is common we have an obsession on solving a challenging problem. It follows us for many years giving us opportunities along the time of refining our solution in unexpected ways. |
If a feature for supporting collections is developed #22 then this feature would be interesting that supports to define relations between documents in different collections. We would need the collection key and the document id stored in the relation to allow that scenario. |
Related with the alternate solution you provided, this is how you showed an index for include should be declared:
What if there are more references to more entities and you need all of them at some point. The optimal way of doing that is to put all the references the same index all, isn't it?
Problem is declaring ReferenceIndex using a generic with a variable number of parameters as far as I know it is not posible. So, should we provide an index per each relation? Another question is how this will work when you want to include nested entities? For example PromotionCategoryLine with its nested Promotion (suppossing they pertain to different aggregates).
And we will query in this way? |
What are other document databases doing in this case? |
Siaqodb that is an hibrid of ObjectDb(Graphdb/DocumentDb provides an include method that accepts an array of strings that indicates the complete path of the entities to load. A pure document db like MongoDb provides database-references https://docs.mongodb.com/manual/reference/database-references/ which rely completly on the client, so the client needs to use an extra query per nested entity. So, for complex graphs where to denormalize is not aceptable I guess performance will be bad in those scenarios. The case is YesSQL is using a relational db so it would be great it provides similar performance/functionality as a pure document db but with better performance retrieving related entities. What do you think of going through your initial proposal instead of the alternate one based on ReferenceIndexes? Maybe it could use an include method similar to the one provided by Siaqodb |
Well initial proposal doesn't takes profit of relational nature of underlying db, so it is equivalent to document db solution for these scenarios cause it executes extra queries for retrieving nested entities. |
I am on skype, can I call you ? |
I've just started my holidays and I'm thinking another time about that. What do you think of using this other approach? We could have a new kind of indexes called RelationIndexes. We could declare one per Document Type. Each RelationIndex will have following fields: When a document is saved neither related objects NOR FOREIGN KEYS won't need to be stored in the document. So, not only properties [Reference] won't be serialized as you pointed previoulsly in this thread, also we won't need to store foreign keys so they won't be serialized within the document and we don't need to worry about them. For storing/updating/delete relations data in RelationIndexes YesSQL could provide explicit methods for adding/removing relation index records. For storing data on relation indexes YesSql could offer methods like this:
YesSQL would provide an Include method based on info stored on relation indexes and on reflection for the type used for the include to get all the nested documents no matter how deep are them in the hierarchy with only two extra sqls queries. One for getting all the Ids and the other query for getting the contents related to those ids.:
Include would use reflection to determine how to assign values retrieved from database to a property marked as [Reference]. I mean it would behave differently to load a value for Provider which is a property of type Provider than to load a value for Categories property which is an IEnumerable. For a query like this
For a query like this
For a query like this
For a query like this
What do you think? Can this be the start of sth? |
I forgot to show a sample of a relation between a property of a nested object withing a document and another document. Following with the same types we are using. Let's think that Promotion and PromotionProductLines are stored in the same document.
|
An edge case is when you want to reference from a document an object within another document or even more difficult you want to reference from an object within a document anoter object within another document. For example if we have an Order document that contains OrderLines that have a Product which has a relation with PromotionProductLines like we did before but PromotionProductLines is stored within Promotion document. So, we have two types of documents: orders and promotions with a relation between their nested objects. For this sensible scenario we should try to extend our solution to cover that case. A solution is we redefine the fields of a RelationIndex
Then maybe we will need to extend the [Reference] annottation for indicating the target root type containing the nested object and the path to the target nested class, but not sure if we need it.
We could have new overloads for methods for adding/removing RelationIndexes of nested objects:
We have to keep in mind that session.AddRelation() and session.RemoveRelation() methods will change always two index tables one per entity participating in the relation. Then an include like this
This sql would provide the ids of the documents to load and the ids and path to get related entities within both documents. The include would do the work of assigning related values in the documents tree using reflection. |
Two corrections to previous comment. When I said:
Instead of that we should have one RelationIndex table per relation not per documentType. At the end of the day what I'm proposing is to use a many to many table to represent also many to one or one to one relations. We can name this index ordering lexycografically the two document type names involved in the relation. Following this convention no matter which of the two entitites is the source or the target in an include we will read always same table if involved entities are the same. So, another refinement for RelationIndex. Index name will be DocumentType1_DocumentType2_RelationIndex and its fields should be renamed to this:
Then an include like this
Then if we define a property for the other side of the relation with [Reference] annottation like this:
We will be able of using an include navigating in the opposite direction of the relation, using an include like this
|
In this example we want to relate Product on position 0 within OrderLine on position 2 withint root object on Document 1 of type order with PromotionProductLines on position 3 within root object of type Promotion on Document 2.
IMO the weak part of this proposal is that the user code needs to provide the position of the objects in the path to a nested object involved in a relation when it wants to add a RelationIndexRecord. And also it needs to keep those positions in sync if those are changed. Another limitation is that we only can create relations between nested objects in separated documents when in the hierarchy to reach the objects are present propertys of types serialized in the document or IEnumeables. However IMO this is not a big limitation, in fact it looks enough for most common scenarios. |
The advantage will be that YesSQL will allow us to use document database power of loading very fast big hierarchies in a document but also will provide a fast way of loading branches in that hierarchy stored in other documents. Avoiding the only alternative we have currently on document dbs that is to denormalize. So, with YesSql we would have two options: denormalize causing other issues related to synchronization or creating relations between documents with low impact on performance but a bit of extra work for syncing relations on changes. |
Well, I think I lied, because without include you always can use specific code for each scenario where you want to combine data loaded from different documents. At the end what include provides is a convenient and flexible way of doing it without writing very similar code again and again for each scenario. |
@sebastienros I've started to work on this issue to see which limitations I face. By the moment my initial design has changed:
Here it is my first commit, it is WIP and in fact only a first step. jersiovic@c2f0ece A constraint of current implementation is that subentities within a document tree that contain a property with the Reference tag have to contain an Id property, and the same happen for the referenced entity in the target entity of the relation. Those ids are part of the info stored in relation indexes and will help Include methor to build the resulting Object tree. To-Do:
|
Any progress on this? :) |
Mentiond in #355, I have a use case that seems it would require something similar to this. OrchardCore ContentItems with a ContentPart containing a field with a JObject. All of the parts' JObjects' keys are aggregated into a Keys table (with a unique row for each unique key). Then a Structures table exists that maps the Keys Id columns to a StructureId so that one StructureId has multiple rows matching it to various KeyId values. Example: Keys Structures Edit: Got my above use case working with the following 2 ReduceIndexes. public class RecordKeyIndex : ReduceIndex
{
public string KeyName { get; set; }
public int Count { get; set; }
}
public class RecordStructureIndex : ReduceIndex
{
public int KeysHashId { get; set; }
public int Count { get; set; }
} In each case, make the non-Count value the key. For the structure index, calculate an integer hash for a given combination of keys in the dynamic data. The
If a hash-to-keys lookup is needed, you can just add a JArray field that stores a copy of the keys in the structures table.
|
Wouldn’t the Apple ID just be a guid, and used as the discriminator? |
Pure relational dbs have its pros and cons, pure document dbs also have its pros and cons. To have a nosql db based on an sql db is an smart movement to have best of both worlds easily available to solve the different problems we find developing an app.
Following this philosophy it would be interesting that Yessql also support relations between documents.
Usually we see those relations in ORMs for lazy loading. Personally I'm against the use of Lazy Loading because in the long term devs tend to overuse it hurting performance of the system as have seen many times in Orchard 1 when a content type has content fields pointing to other content items.
What it is interesting about relations is that YesSql can offer a kind of Include() method for queries, helping us to avoid easily N+1 queries in a transparent way. Because we will delegate on YesSql the work of optimizing the operation for getting the related documents of resulting documents of a query using only one extra query for retrieving documents on the other side of a relation.
In Orchard 2 this will be very useful to deal with fields pointing to other content items in an optimal way. I've added a similar solution based on an extension for IContentQuery applicable for content fields and other content items like content fields, taxonomy fields, MediaPickerFields that have boosted performance in Orchard 1 sites, problem it still is not enough clean and elegant as I would like to submit a PR ... :)
To declare a relation we will need to mark "navigation fields" with metadata stablished by YesSql that of course will be independent of Orchard Concepts. Those metadata will stablish keys involved on the relation and the type of the relation: Many-To-One, One-To-One and Many-To-Many. Orchard fields depending of its type will fulfill this metadata with its serialization according to its needs.
In the orchard side Fields should be developed in a way that they cannot be accessed if its navigation property have not been loaded through include extension. But YesSql should offer a way of explicitly load one field based on a relation. IMO the important thing is to don't offer lazy loading cause it is the root cause of performance issues done by devs, but give them an explicit load alternative to help them to be conscious of what they are doing.
To keep things simple for storing documents and not be worried on tracking changes on related documents, we could only store documents explicitly asked to store. But YesSql won't be responsible of storing related documents of documents explicitly stored.
The text was updated successfully, but these errors were encountered: