Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Include() warning: false positive for ThenInclude()? #29665

Open
Timovzl opened this issue Nov 23, 2022 · 17 comments
Open

Multiple Include() warning: false positive for ThenInclude()? #29665

Timovzl opened this issue Nov 23, 2022 · 17 comments

Comments

@Timovzl
Copy link

Timovzl commented Nov 23, 2022

There is warning that protects us against inadvertent cross products. It occurs when we use multiple calls to Include() in a single query:

Compiling a query which loads related collections for more than one collection navigation, 
either via 'Include' or through projection, but no 'QuerySplittingBehavior' has been configured. 
By default, Entity Framework will use 'QuerySplittingBehavior.SingleQuery', 
which can potentially result in slow query performance. 
See https://go.microsoft.com/fwlink/?linkid=2134277 for more information. 
To identify the query that's triggering this warning call 
'ConfigureWarnings(w => w.Throw(RelationalEventId.MultipleCollectionIncludeWarning))'.

We can suppress it with an explicit call to .AsSingleQuery() if we know what we are doing.

To my surprise, the warning also pops up for .Include(parent => parent.Child).ThenInclude(child => child.Grandchild).

The reason I am surprised is that this seems to give the warning whenever ThenInclude() is used, as if that entire method should be avoided. Moreover, an Include() followed by a ThenInclude() does not introduce a cross product and thus seems no cause for concern.

Is this a false positive?

@roji
Copy link
Member

roji commented Nov 23, 2022

@Timovzl can you submit a minimal code sample for this?

@stevendarby
Copy link
Contributor

stevendarby commented Nov 24, 2022

Something to consider is that although it doesn't produce a cross product, it does lead to duplicated details being brought back. i.e. every row has parent, child and grandchild details.

I don't know how the perf impact of this compares to a true cross product (e.g. including sibling child collections) but it must have some impact and might be the reason for the warning.

FWIW I think the documentation on this confuses Cartesian explosion with duplication. https://learn.microsoft.com/en-gb/ef/core/querying/single-split-queries#single-queries. I think they are different problems that should perhaps have different warnings and different solutions.

@roji
Copy link
Member

roji commented Nov 24, 2022

Something to consider is that although it doesn't produce a cross product, it does lead to duplicated details being brought back. i.e. every row has parent, child and grandchild details.

Can you elaborate on the duplication? Assuming one-to-one relationships, what data duplication would occur in the above query?

@stevendarby
Copy link
Contributor

stevendarby commented Nov 24, 2022

@roji I'm assuming one-to-many because I took it as a given that the warning does not appear until 2 collections are involved and interpreted the surprise at the warning being due to the fact the query doesn't produce a cross product (which is true) and not because it didn't involve collections. However, I'm making assumptions and agree the naming of the navigations suggests one-to-one, so will zip it till OP provides a sample as you requested :)

@Timovzl
Copy link
Author

Timovzl commented Nov 28, 2022

However, I'm making assumptions and agree the naming of the navigations suggests one-to-one

Sorry, the one-to-one-looking example was my mistake. It should indeed have been a one-to-many relationship.

@roji I have created a minimal repro:

public static class Program
{
	public static async Task Main()
	{
		var builder = WebApplication.CreateBuilder();

		builder.Services.AddSingleton<IHostedService, TestRunner>();
		builder.Services.AddDbContextFactory<TestDbContext>(options => options.UseSqlServer(
			@"Data Source=(LocalDB)\MSSQLLocalDB;Integrated Security=True;Initial Catalog=MultipleIncludeTest;"));

		using var app = builder.Build();

		await app.StartAsync();
		await app.StopAsync();
	}

	private class TestRunner : IHostedService
	{
		private IDbContextFactory<TestDbContext> DbContextFactory { get; }

		public TestRunner(IDbContextFactory<TestDbContext> dbContextFactory)
		{
			this.DbContextFactory = dbContextFactory;
		}

		public async Task StartAsync(CancellationToken cancellationToken)
		{
			await using var dbContext = await this.DbContextFactory.CreateDbContextAsync();

			await dbContext.Database.EnsureDeletedAsync();
			await dbContext.Database.EnsureCreatedAsync();

			// The warning appears when this query is first compiled
			await dbContext.Parents
				.Include(parent => parent.Children)
				.ThenInclude(child => child.Children)
				.ToListAsync();
		}

		public Task StopAsync(CancellationToken cancellationToken)
		{
			return Task.CompletedTask;
		}
	}

	public class TestDbContext : DbContext
	{
		public DbSet<Parent> Parents { get; private init; }

		public TestDbContext(DbContextOptions<TestDbContext> options)
			: base(options)
		{
		}
	}

	public class Parent
	{
		public int Id { get; set; }
		public IReadOnlyCollection<Child> Children { get; } = new List<Child>();
	}

	public class Child
	{
		public int Id { get; set; }
		public IReadOnlyCollection<Grandchild> Children { get; } = new List<Grandchild>();
	}

	public class Grandchild
	{
		public int Id { get; set; }
	}
}

@roji
Copy link
Member

roji commented Nov 28, 2022

@Timovzl then you seem to be loading two collection navigations (one-to-many relationships), which is exactly what the warning says:

Compiling a query which loads related collections for more than one collection navigation [...]

The threshold for this warning is somewhat arbitrary - and loading multiple collection navigations isn't even necessarily problematic (it depends on the actual cardinality of the data). But as far as I can see the warning is working as expected?

@Timovzl
Copy link
Author

Timovzl commented Nov 28, 2022

@roji I think that the warning is incorrectly assuming a cross product (Cartesian explosion) here.

Judging by the docs, avoiding the Cartesian explosion problem is the purpose of the warning, wouldn't you agree?

Now, here is an example where that warning would be appropriate:

this.DbContext.Parents
	.Include(parent => parent.Children)
	.Include(parent => parent.Pets);

It joins two tables that are siblings of one another. For each parent, it has to include every related child and every related pet. Which pets should be in the same row as which children? There is no clear answer, so every possible combination is made. The result is multiplicative. A fairly manageable parent with 100 children and 100 pets suddenly explodes into a result set of 10K rows. Worth a warning, indeed.

There are two clear cases where the Cartesian explosion does not occur:

  1. We are not including from multiple tables that are siblings of each other.
  2. We are including from sibling tables, but at most 1 of them is a one-to-many relationship.

I hope scenario 2 is self-explanatory. Including a one-to-one relationships results in the exact same number of rows as before. Each one-to-one relationship causes multiplication by 1, which does not increase the number of resulting rows. Note that this scenario is also causing the warning - wrongfully, I would say.

The repro is for scenario 1. This one isn't multiplicative either, because it is always clear which grandchild belongs with which child. A parent with 10 children and a total of 100 grandchildren will result in exactly 100 rows. The number of rows is linear in the number of matching entities. There is no cause for a warning.

Just to illustrate, let me draw the result set with 2 parents, each with 2 children, each with 3 grandchildren. That is 12 grandchildren total (or 12 + 2 + 2 entities total), returned in precisely 12 rows.

Parent | Child | Grandchild
i	A	1
i	A	2
i	A	3
i	B	4
i	B	5
i	B	6
ii	C	7
ii	C	8
ii	C	9
ii	D	10
ii	D	11
ii	D	12

This is a normal, performant, unproblematic result set, as returned by a query like this:

SELECT *
FROM Parents p
LEFT JOIN Children c ON c.ParentId = p.Id
LEFT JOIN Grandchildren g ON g.ParentId = c.Id

It seems to me that the warning is missing some intelligence: It disregards the fact that the relationships are properly nested and thus cause no multiplication (scenario 1). It also disregards the fact that one-to-one relationships cause no multiplication (scenario 2).

@Timovzl
Copy link
Author

Timovzl commented Nov 28, 2022

@stevendarby

Something to consider is that although it doesn't produce a cross product, it does lead to duplicated details being brought back. i.e. every row has parent, child and grandchild details.
I don't know how the perf impact of this compares to a true cross product (e.g. including sibling child collections) but it must have some impact and might be the reason for the warning.

While it's true that there is some duplication for JOINs with one-to-many relationships, that is normal for relational result sets.

What we can say for certain about the performance is that the regular repetition of the parent's columns does not change the order of magnitude. The space complexity remains linear in the data size. Say the parent and child have roughly equal amounts of data. The repetition of the parent's columns then causes the result set's data size to double, which is the same order of magnitude. If we chose to warn on this, we would be equally warranted to warn if the caller had chosen split queries as their default: making multiple round trips is very likely to perform worse than returning twice the data! We would have to warn always, unless the caller made their choice explicit. 😄

The Cartesian explosion, however, causes multiplication. That increases the order of magnitude, because the space complexity is now quadratic in the data size. That very quickly becomes worse than making an extra round trip. (Note that #22435 is tracking a potential best-of-both-worlds solution.)

FWIW I think the documentation on this confuses Cartesian explosion with duplication. https://learn.microsoft.com/en-gb/ef/core/querying/single-split-queries#single-queries. I think they are different problems that should perhaps have different warnings and different solutions.

I had overlooked this when I first read that section. Agreed. They talk about preventing the Cartesian explosion, but provide an example that does not demonstrate it.

@roji
Copy link
Member

roji commented Nov 29, 2022

@Timovzl @stevendarby you are of course right about the order of magnitude difference between cross product (Cartesian explosion) resulting from mulitple sibling collection joins, and the far less problematic duplication resulting from joins in general.

[...] avoiding the Cartesian explosion problem is the purpose of the warning, wouldn't you agree?

The original intent behind the warning was to cover both; note that the warning doesn't say Cartesian explosion or cross product; just that multiple collection navigations are being included. But I agree that the two are sufficiently different so as at least warrant different warnings (or possibly not warn for duplication in general).

I also agree that our documentation conflates the two concepts a bit. I'll make that better.

@stevendarby
Copy link
Contributor

Thanks @roji. I know at this point no decision has been made and it needs further thought but: suppose you do go down the path of not warning for duplication in general, for me it would also beg the question of whether it's correct for split query to "kick in" in the presence of any collection navigation rather than just those that produce a cross product. Note that there is already a slight disconnect in that it splits on the first collection but wouldn't warn about this without a second collection included. Anyway, perhaps just something to consider alongside this in future :)

@ajcvickers
Copy link
Contributor

We should discuss this, but I was always under the impression that this warning was intended to warn against the cartesian explosion specifically.

@roji
Copy link
Member

roji commented Nov 29, 2022

Changing the actual behavior of split query is a bit more extreme, i.e. stop doing split query for the Include/ThenInclude case (but I agree it may make sense (and would be consistent with the warning discussion). We'll discuss.

@roji
Copy link
Member

roji commented Dec 5, 2022

Design decision: we'll change the warning to cover cross products only (multiple sibling collection joins, Cartesian explosion).

We'll leave split query the way it is: we're splitting the query because the user explicitly told us to (opt-in), and data duplication can still be significant in some scenarios. Also, a change here would obviously be more extreme than the warning change, and there's no huge downside in splitting the query given that the user has already opted into it (especially after #10878).

@Timovzl
Copy link
Author

Timovzl commented Dec 6, 2022

@roji Will you also be excluding one-to-one relationships from that detection, since these cannot cause a cross product? I mean, of course, where EF can know about them, i.e. where .HasOne().WithOne() was used to specify the relationship.

@roji
Copy link
Member

roji commented Dec 6, 2022

@Timovzl AFAIK we already don't warn for one-to-one relationships, are you seeing a different behavior? If so, can you please post the code?

@Timovzl
Copy link
Author

Timovzl commented Dec 6, 2022

@roji As I'm trying to create a minimal repro for that scenario as well, I find myself unable to reproduce it.

Originally, I had to include AsSingleQuery() for both the scenarios: the one with the ThenInclude(), and the one where I include both a 1:1 navigation and a 1:N navigation. Leaving out AsSingleQuery() for any one of these resulted in the warning we discussed. I'd swear I have tested it with EF 7 as well. However, as I now outcomment AsSingleQuery() from the latter scenario, in the solution where I originally discovered it, no warnings appear. For the former scenario, they do appear.

I suppose one-to-one is not a warning after all. 🙃

@roji
Copy link
Member

roji commented Jan 9, 2023

Note: I've submitted dotnet/EntityFramework.Docs#4201 to improve our docs around this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants