Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos: Add Include support #16920

Open
AndriySvyryd opened this issue Aug 2, 2019 · 27 comments
Open

Cosmos: Add Include support #16920

AndriySvyryd opened this issue Aug 2, 2019 · 27 comments

Comments

@AndriySvyryd
Copy link
Member

AndriySvyryd commented Aug 2, 2019

And support mapping owned types to other containers

@ajcvickers
Copy link
Member

ajcvickers commented Aug 5, 2019

@AndriySvyryd 3.0? Can you do in-memory first?

This is cross-document Include--backlogging.

However...

can you do in-memory Include?

@ajcvickers ajcvickers added this to the Backlog milestone Aug 5, 2019
@AndriySvyryd
Copy link
Member Author

AndriySvyryd commented Aug 20, 2019

Note to implementor: Consider #12776 and #18022

@TeaBaerd
Copy link

Preview 5 appears to have this functioning. Is there a reason it was removed in later work?

@AndriySvyryd
Copy link
Member Author

@TeaBaerd The query pipeline had to be rewritten to remove the Relinq dependency

@TeaBaerd
Copy link

Bummer. Thank you for the explanation @AndriySvyryd.

@AndriySvyryd
Copy link
Member Author

Don't forget to vote (👍) for the features that are most important for you so we can prioritize better.

@TheFanatr
Copy link
Contributor

If I am missing something please let me know, and I mean no offense by this.

I would like to mention that, in my experience, not having Include support is very detrimental to the usefulness of the Cosmos DB provider; instead of being able to write queries "organically" using the expected APIs, the query must now be done somehow else when attempting to use Include, which, in my case, is a lot. This effectively translates into having to use separate operations, which increases cost. The worse part, however, is that the exception being thrown when the query is executed does not make it obvious as to why the operation failed, so this functionality can lead to wild goose chases, as in the following case.

I was attempting to query an inverse collection property and then filter it into a dictionary with each element key being a property of a navigation property on an element of the aforementioned collection, which would require Include to do efficiently, but instead of it working or me receiving a message in the exception stating that Include is not available in the Cosmos DB provider, I received the following exception when executing ToDictionaryAsync on an IQueryable, far away from the call to Include.

InvalidOperationException: The LINQ expression 'LeftJoin<Alias, Platform, Nullable<Guid>, TransparentIdentifier<Alias, Platform>>(
outer: Where<Alias>(
source: DbSet<Alias>,
predicate: (a) => Property<Nullable<Guid>>(a, "ProfileID") == (Unhandled parameter: __p_0)),
inner: DbSet<Platform>,
outerKeySelector: (a) => Property<Nullable<Guid>>(a, "PlatformID"),
innerKeySelector: (p) => Property<Nullable<Guid>>(p, "ID"),
resultSelector: (o, i) => new TransparentIdentifier<Alias, Platform>(
Outer = o,
Inner = i
))' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync(). See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.

This initially led me to believe that my problem was related to #16730, but as I continued to change the rest of my query to eventually use ToListAsync, I suddenly came to the realization hours later that the one thing I need in order for the processing that needs to be done to not be unwieldy was not working. This means that in order to actually build the dictionary I was attempting to out of this data, I have to individually load the navigation property whose property is needed for the key, using nothing other than separate connections to the database.

Cosmos DB is paid for in RU/s; if people are legitimately expected to use this provider, then it would be appreciated if this basic quality of life feature, that can do what would otherwise require n requests more or raw SQL which is the exact thing being avoided by using supposedly built-in Entity Framework Core functionality, could be implemented.

The following is what was attempted before finding true culprit.

await entry.Collection(profile => profile.Aliases).Query().Include(alias => alias.Platform).ToDictionaryAsync(alias => alias.Platform.Name, alias => alias.Identification);
await entry.Collection(profile => profile.Aliases).Query().Include(alias => alias.Platform).Select(alias => new { Platform = alias.Platform.Name, Identification = alias.Identification }).ToDictionaryAsync(alias => alias.Platform, alias => alias.Identification);
(await entry.Collection(profile => profile.Aliases).Query().Include(alias => alias.Platform).Select(alias => new { Platform = alias.Platform.Name, Identification = alias.Identification }).ToListAsync()).ToDictionary(alias => alias.Platform, alias => alias.Identification);

The final solution is a lot more wasteful clunky than it would have been with include, but it works.

aliases = new Dictionary<string, string> { };
await foreach (Alias alias in entry.Collection(profile => profile.Aliases).Query().AsAsyncEnumerable())
{
    await context.Entry(alias).Reference(alias => alias.Platform).LoadAsync();
    aliases[alias.Platform.Name] = alias.Identification;
}

I've seen somewhere that Cosmos DB may not even support Include operations, but even in that case it is a nuisance to not be able to complete the operation at least semantically efficiently, as in not having to use an await foreach loop, and just being able to use a one-liner instead.

I would appreciate any help to make what I ended up with more efficient.

@AndriySvyryd
Copy link
Member Author

AndriySvyryd commented Oct 29, 2019

@TheFanatr Your understanding is mostly correct. Include is indeed a useful feature, but EF will only be able to issue a more efficient query if the included data is in the same collection and the same partition. It's unlikely that we would support other kind of Include, as it would be equivalent to the code you posted above.

Consider embedding the related data by using owned types if performance is a concern.

@MariaCobretti
Copy link

can't wait for this feature

@kevinchatham
Copy link

Please add support for this 🙏

@braidenstiller
Copy link

Is there any news if this feature will be supported in / by v6?

@ajcvickers
Copy link
Member

@braidenstiller We're currently figuring that out.

@helshabini
Copy link

Hello Guys, I'm sorry I've been scouring the docs for hours on end so I'm just gonna ask a related question here because I can't find any answer in the docs. I understand there is a limitation with the include function and it is not available for the Cosmos provider.

If so, what alternatives do I have other than embedding the one document into another? I mean, embedding works fine, except that I cannot directly query for the embedded entity from the context. So I am forced to traverse the hierarchy.

Example:

//Models
public class MainEntity
{
    public string Id { get; set; }
    public string Name { get; set; }
    public List<SubEntity> SubEntities { get; set; }
}
public class SubEntity
{
    public string Id { get; set; }
    public string Value { get; set; }
}

The problem is that querying for var mainEntity = context.MainEntities.ToList() returns a null SubEntities list (understandable so), but there is no way for me to get them except if I query context.SubEntities.Where(e => e.MainEntityId == mainEntity.Id).ToList()

Regardless of the inefficiency, it results in very clunky code, with null checks and so on, not mention the need to copy those results into another object to return a non-context Dto back to the user (it is a WebApi).

On the other hand if I try to use embedded entities:

//Context
public DbSet< MainEntity> MainEntities { get; set; }
public DbSet< SubEntity > SubEntities { get; set; }

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<MainEntity>().OwnMany(entity => entity.SubEntities);
}

This all works fine, except that there is no way for me to query the context for SubEntities directly.

context.SubEntities.ToList(); throws System.InvalidOperationException: Cannot create a DbSet for 'SubEntity' because it is configured as an owned entity type and must be accessed through its owning entity type.

I completely understand why is this happening, but it kinda seems I'm out of options. I can't use Include, and I can't use embedding, and using neither results in a very ugly situation. Am I missing something here? Is there a way to make this cleaner that I just couldn't find?

@braidenstiller
Copy link

Is it at all possible to Include (or something equivalent) cross-document (but not cross-collection) related data right now?

@roji
Copy link
Member

roji commented Aug 25, 2021

@helshabini any others, the Cosmos provider will default to implicit ownership in 6.0; this means that navigations will be automatically populated when you load the root entity type, without requiring Include.

This issue could still track implementing cross-document include.

@helshabini
Copy link

That's great news. This makes the Cosmos provider far more usable. Thanks.

@braidenstiller
Copy link

+1 on still tracking cross document include as this would be incredibly useful. Some workloads require cross document relationships, so being able to load them in a idiomatic way would be great.

@roji
Copy link
Member

roji commented Aug 25, 2021

@braidenstiller I'm not a Cosmos expert, but the crucial point here is that Cosmos itself doesn't seem to support cross-document joins, which is a typical characteristic of document databases (as opposed to relational databases). Assuming I haven't missed anything, that would mean that EF Core would have to pull back documents and perform joins client-side, which is something I strongly believe it should not do, since that's likely to be extremely inefficient. If that's something you really want, you should be able to easily express this as follows:

_ = ctx.Foo
    .AsEnumerable()
    .Join(ctx.Bar, f => f.Id, b => b.FooId, (f, b) => new { Foo = f, Bar = b });

The AsEnumerable triggers the client-side evaluation, and makes it extremely clear what is going on here (and the fact that this is a potentially problematic query perf-wise). That shouldn't be hidden behind an innocuous-looking Include (especially for people coming from relational databases, who are used to that happening database-side).

@braidenstiller
Copy link

braidenstiller commented Aug 25, 2021

@roji that's a fair argument. I was coming at this as a limitation that ef core could potentially address instead as a feature enhancement on top of cosmos.

Unfortunately that's correct, due to lack of include support on the cosmos side the query would have to be done client side.

Personally I think I the potential performance tradeoff is worth it as some relationships just can't be embedded (unbound related lists for example). Official cosmos documentation even advises to use related documents in these cases:
https://docs.microsoft.com/en-us/azure/cosmos-db/modeling-data#referencing-data

Include would be useful here but perhaps it could be enabled in configuration as a feature flag and not implicitly executed.

I haven't tested that explicit join syntax exactly but I'm fairly certain even that isn't supported at the moment. You have to retrieve the Id and query it explicitly (and importantly seperatly) from the root document query.

@roji
Copy link
Member

roji commented Aug 25, 2021

Official cosmos documentation even advises to use related documents in these cases:
https://docs.microsoft.com/en-us/azure/cosmos-db/modeling-data#referencing-data

That's a fair point and thanks for the link. My gut feeling is that the misuse potential of an EF Core Include implementation (considerably) outweighs the advantage, especially given that it's already possible to express the same operation explicitly via a client-side join (and if it isn't currently, it should be). This feels like the type of thing that shouldn't be made too easy, and especially too implicit/under the hood, but rather be a very explicit thing users opt into on a query-by-query basis.

I haven't tested that explicit join syntax exactly but I'm fairly certain even that isn't supported at the moment. You have to retrieve the Id and query it explicitly (and importantly seperatly) from the root document query.

That may be true, but that depends on exactly what your join conditions are like (you may do a join on any arbitrary properties on both the principal and dependent sides).

@braidenstiller
Copy link

@roji
I'm all for supporting this functionality in a way that I can retrieve the principal and dependents in one ef transaction. Whether or not that's facilited by .Include doesn't necessarily matter, I just want that functionality from ef core. I would tend to agree with your sentiment that achieving this with the standard Include syntax may cause more issues than it's worth, especially for people who are coming entirely from a relational database way of thinking.

Maybe the way forward is, resolve this issue, and track another for the join capabilities (I think there may be one already).

@roji
Copy link
Member

roji commented Dec 9, 2021

I did some research into this, and I do believe we can do something which makes sense.

First, as pointed out by @braidenstiller above, the Cosmos docs do recommend using normalized modeling in certain scenarios, i.e. splitting information across multiple documents as an alternative to embedding all data in a single document.

To implement this efficiently, we could use a multiple query loading strategy which would be somewhat similar to the split query implemented for relational databases. Assuming a one-to-many Blogs/Posts model and a query with a predicate over Blogs, we'd:

  • Send a principal query to get all Blogs matching the predicate
  • Send a dependent query with an array parameter containing all Blog ids, to get all their Posts:
var blogIds = new[] { "2", "3", "123" };
var query = new QueryDefinition("SELECT c.Id, c.Name FROM Posts c WHERE ARRAY_CONTAINS(@BlogIds, c.BlogId)")
    .WithParameter("@BlogIds", blogIds);

The above approach efficiently uses an index over c.BlogId. However, before implementation we should confirm that a large number of IDs can be sent like this, without some arbitrary Cosmos upper limit or perf degradation.

Note that the above assumes a model where each Post has a BlogId, similar to relational foreign key modeling. Cosmos also supports each Blog having a list of Post IDs instead, in which case we'd read that property and load the Posts directly by their IDs in the second query. Although we don't have to support this model style, Cosmos many-to-many modeling (#23523) involves an array of IDs on each side (instead of a join table); the one-to-many array modeling could be done as part of that.

Note that the above is only a design - we don't currently have plans to implement this for 7.0.

@ajcvickers ajcvickers added this to the Backlog milestone Dec 10, 2021
@jazzmanro
Copy link

@roji The problem with not having at least a feature-flag-controlled client-side (inefficient) Include() is with 3rd party libraries where you can't control the implementation. Not sure if this is a good example but consider OData. I'm having a small application with a small dataset where there's not a big problem to do some client-side joining. Of course OData + EFCore makes a great development stack but on top of CosmosDB I can't use $expand properly to pull also navigation property data. I don't (think) I have control over OData implementation so I cannot simply explicitly .AsEnumerable(). This may be true with other libraries also. Ideally for such situations in which performance is not a concern I could explicitly turn on some client-side processing flag.

@roji
Copy link
Member

roji commented Dec 14, 2021

@jazzmanro the proposal in the above comment (#19526) would implement a pretty efficient join that's comparable to the split query feature we have for relational database, and doesn't pull entire document collections like AsEnumerable would do. Assuming there's no blocker to implementing this, it would be available by default without any need for an opt in (so OData would work out of the box).

@jazzmanro
Copy link

@roji sounds really great, wasn't sure exactly what the proposal would cover. Can't wait to have this.

@arolariu
Copy link

arolariu commented Nov 14, 2023

Almost two years later of "silence" - are there any plans for this feature (.Include() support) to be available in EF Core 9 or in the short term future?

I am running into a scenario where I have an aggregator root (entity) that contains another entity that has its own identity. I want to store the aggregator root and the entity in separate documents (call them "Invoices" and "Merchants")... yet if I do this, I will be unable to perform the following query:

DbSet<Invoices>().Include(invoice => invoice.Merchant)
or..
DbSet<Invoices>().Include(invoice => invoice.Merchant).ToList() 

As @TheFanatr has pointed out previously, I get into the same problem and wild goose chasing - the $exception field being populated with LINQ expression could not be translated... try to perform client-side evaluation by using .ToList - which is of no help.


@alfred-zaki
Copy link

following

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests