-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Include() warning: false positive for ThenInclude()? #29665
Comments
@Timovzl can you submit a minimal code sample for this? |
Something to consider is that although it doesn't produce a cross product, it does lead to duplicated details being brought back. i.e. every row has parent, child and grandchild details. I don't know how the perf impact of this compares to a true cross product (e.g. including sibling child collections) but it must have some impact and might be the reason for the warning. FWIW I think the documentation on this confuses Cartesian explosion with duplication. https://learn.microsoft.com/en-gb/ef/core/querying/single-split-queries#single-queries. I think they are different problems that should perhaps have different warnings and different solutions. |
Can you elaborate on the duplication? Assuming one-to-one relationships, what data duplication would occur in the above query? |
@roji I'm assuming one-to-many because I took it as a given that the warning does not appear until 2 collections are involved and interpreted the surprise at the warning being due to the fact the query doesn't produce a cross product (which is true) and not because it didn't involve collections. However, I'm making assumptions and agree the naming of the navigations suggests one-to-one, so will zip it till OP provides a sample as you requested :) |
Sorry, the one-to-one-looking example was my mistake. It should indeed have been a one-to-many relationship. @roji I have created a minimal repro: public static class Program
{
public static async Task Main()
{
var builder = WebApplication.CreateBuilder();
builder.Services.AddSingleton<IHostedService, TestRunner>();
builder.Services.AddDbContextFactory<TestDbContext>(options => options.UseSqlServer(
@"Data Source=(LocalDB)\MSSQLLocalDB;Integrated Security=True;Initial Catalog=MultipleIncludeTest;"));
using var app = builder.Build();
await app.StartAsync();
await app.StopAsync();
}
private class TestRunner : IHostedService
{
private IDbContextFactory<TestDbContext> DbContextFactory { get; }
public TestRunner(IDbContextFactory<TestDbContext> dbContextFactory)
{
this.DbContextFactory = dbContextFactory;
}
public async Task StartAsync(CancellationToken cancellationToken)
{
await using var dbContext = await this.DbContextFactory.CreateDbContextAsync();
await dbContext.Database.EnsureDeletedAsync();
await dbContext.Database.EnsureCreatedAsync();
// The warning appears when this query is first compiled
await dbContext.Parents
.Include(parent => parent.Children)
.ThenInclude(child => child.Children)
.ToListAsync();
}
public Task StopAsync(CancellationToken cancellationToken)
{
return Task.CompletedTask;
}
}
public class TestDbContext : DbContext
{
public DbSet<Parent> Parents { get; private init; }
public TestDbContext(DbContextOptions<TestDbContext> options)
: base(options)
{
}
}
public class Parent
{
public int Id { get; set; }
public IReadOnlyCollection<Child> Children { get; } = new List<Child>();
}
public class Child
{
public int Id { get; set; }
public IReadOnlyCollection<Grandchild> Children { get; } = new List<Grandchild>();
}
public class Grandchild
{
public int Id { get; set; }
}
} |
@Timovzl then you seem to be loading two collection navigations (one-to-many relationships), which is exactly what the warning says:
The threshold for this warning is somewhat arbitrary - and loading multiple collection navigations isn't even necessarily problematic (it depends on the actual cardinality of the data). But as far as I can see the warning is working as expected? |
@roji I think that the warning is incorrectly assuming a cross product (Cartesian explosion) here. Judging by the docs, avoiding the Cartesian explosion problem is the purpose of the warning, wouldn't you agree? Now, here is an example where that warning would be appropriate: this.DbContext.Parents
.Include(parent => parent.Children)
.Include(parent => parent.Pets); It joins two tables that are siblings of one another. For each parent, it has to include every related child and every related pet. Which pets should be in the same row as which children? There is no clear answer, so every possible combination is made. The result is multiplicative. A fairly manageable parent with 100 children and 100 pets suddenly explodes into a result set of 10K rows. Worth a warning, indeed. There are two clear cases where the Cartesian explosion does not occur:
I hope scenario 2 is self-explanatory. Including a one-to-one relationships results in the exact same number of rows as before. Each one-to-one relationship causes multiplication by The repro is for scenario 1. This one isn't multiplicative either, because it is always clear which grandchild belongs with which child. A parent with 10 children and a total of 100 grandchildren will result in exactly 100 rows. The number of rows is linear in the number of matching entities. There is no cause for a warning. Just to illustrate, let me draw the result set with 2 parents, each with 2 children, each with 3 grandchildren. That is 12 grandchildren total (or 12 + 2 + 2 entities total), returned in precisely 12 rows.
This is a normal, performant, unproblematic result set, as returned by a query like this: SELECT *
FROM Parents p
LEFT JOIN Children c ON c.ParentId = p.Id
LEFT JOIN Grandchildren g ON g.ParentId = c.Id It seems to me that the warning is missing some intelligence: It disregards the fact that the relationships are properly nested and thus cause no multiplication (scenario 1). It also disregards the fact that one-to-one relationships cause no multiplication (scenario 2). |
While it's true that there is some duplication for What we can say for certain about the performance is that the regular repetition of the parent's columns does not change the order of magnitude. The space complexity remains linear in the data size. Say the parent and child have roughly equal amounts of data. The repetition of the parent's columns then causes the result set's data size to double, which is the same order of magnitude. If we chose to warn on this, we would be equally warranted to warn if the caller had chosen split queries as their default: making multiple round trips is very likely to perform worse than returning twice the data! We would have to warn always, unless the caller made their choice explicit. 😄 The Cartesian explosion, however, causes multiplication. That increases the order of magnitude, because the space complexity is now quadratic in the data size. That very quickly becomes worse than making an extra round trip. (Note that #22435 is tracking a potential best-of-both-worlds solution.)
I had overlooked this when I first read that section. Agreed. They talk about preventing the Cartesian explosion, but provide an example that does not demonstrate it. |
@Timovzl @stevendarby you are of course right about the order of magnitude difference between cross product (Cartesian explosion) resulting from mulitple sibling collection joins, and the far less problematic duplication resulting from joins in general.
The original intent behind the warning was to cover both; note that the warning doesn't say Cartesian explosion or cross product; just that multiple collection navigations are being included. But I agree that the two are sufficiently different so as at least warrant different warnings (or possibly not warn for duplication in general). I also agree that our documentation conflates the two concepts a bit. I'll make that better. |
Thanks @roji. I know at this point no decision has been made and it needs further thought but: suppose you do go down the path of not warning for duplication in general, for me it would also beg the question of whether it's correct for split query to "kick in" in the presence of any collection navigation rather than just those that produce a cross product. Note that there is already a slight disconnect in that it splits on the first collection but wouldn't warn about this without a second collection included. Anyway, perhaps just something to consider alongside this in future :) |
We should discuss this, but I was always under the impression that this warning was intended to warn against the cartesian explosion specifically. |
Changing the actual behavior of split query is a bit more extreme, i.e. stop doing split query for the Include/ThenInclude case (but I agree it may make sense (and would be consistent with the warning discussion). We'll discuss. |
Design decision: we'll change the warning to cover cross products only (multiple sibling collection joins, Cartesian explosion). We'll leave split query the way it is: we're splitting the query because the user explicitly told us to (opt-in), and data duplication can still be significant in some scenarios. Also, a change here would obviously be more extreme than the warning change, and there's no huge downside in splitting the query given that the user has already opted into it (especially after #10878). |
@roji Will you also be excluding one-to-one relationships from that detection, since these cannot cause a cross product? I mean, of course, where EF can know about them, i.e. where |
@Timovzl AFAIK we already don't warn for one-to-one relationships, are you seeing a different behavior? If so, can you please post the code? |
@roji As I'm trying to create a minimal repro for that scenario as well, I find myself unable to reproduce it. Originally, I had to include I suppose one-to-one is not a warning after all. 🙃 |
Note: I've submitted dotnet/EntityFramework.Docs#4201 to improve our docs around this. |
There is warning that protects us against inadvertent cross products. It occurs when we use multiple calls to
Include()
in a single query:We can suppress it with an explicit call to
.AsSingleQuery()
if we know what we are doing.To my surprise, the warning also pops up for
.Include(parent => parent.Child).ThenInclude(child => child.Grandchild)
.The reason I am surprised is that this seems to give the warning whenever
ThenInclude()
is used, as if that entire method should be avoided. Moreover, anInclude()
followed by aThenInclude()
does not introduce a cross product and thus seems no cause for concern.Is this a false positive?
The text was updated successfully, but these errors were encountered: