Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupJoin is doing aggregate in memory instead of on DB #13887

Closed
shaulbehr opened this issue Nov 5, 2018 · 2 comments
Closed

GroupJoin is doing aggregate in memory instead of on DB #13887

shaulbehr opened this issue Nov 5, 2018 · 2 comments
Labels
area-query closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported type-enhancement

Comments

@shaulbehr
Copy link

GroupJoin() is emitting incorrect Sql, resulting in inefficient queries.

Steps to reproduce

The basic building blocks:
Blog (BlogId, BlogName)
Post (PostId, BlogId, Title, Body)
with the obvious FKs.

I want to count how many posts in each blog have a title starting with the letter "B", ordered by that count descending:

var posts = db.Posts.Where(x => x.Title.StartsWith("B"));
var q2 = db.Blogs
                    .GroupJoin(posts, b => b.BlogId, p => p.BlogId,
                        (b, p) => new {b.BlogId, Count = p.Count()})
                    .OrderByDescending(c=>c.Count)
                    .ToList();

This emits the following Sql (using the Sql Server):

SELECT [b].[BlogId], [b].[BlogName], [t].[PostId], [t].[BlogId], [t].[Body], [t].[Title]
FROM [Blogs] AS [b]
LEFT JOIN (
    SELECT [x].[PostId], [x].[BlogId], [x].[Body], [x].[Title]
    FROM [Posts] AS [x]
    WHERE [x].[Title] LIKE N'B' + N'%' AND (LEFT([x].[Title], LEN(N'B')) = N'B')
) AS [t] ON [b].[BlogId] = [t].[BlogId]
ORDER BY [b].[BlogId]

Similarly when using Npgsql:

SELECT b."BlogId", b."Value", t."PostId", t."BlogId", t."Body", t."Title"
	FROM "Blogs" AS b
	LEFT JOIN (
	    SELECT x."PostId", x."BlogId", x."Body", x."Title"
	    FROM "Posts" AS x
	    WHERE x."Title" LIKE 'B%'
	) AS t ON b."BlogId" = t."BlogId"
	ORDER BY b."BlogId"

As you can see, this is loading all posts into memory, in their entirety, which is going to be hugely expensive if I've got tens of thousands of posts in each blog. All I want is a count.
The Sql I would want to see would be something more like:

select b.BlogId, b.BlogName, count(t.PostId) as [Count]
from Blogs as b
left join (select p.PostId, p.BlogId, p.Title, p.Body
              from Posts as p 
              where p.Title like 'B%') as t
on b.BlogId = t.BlogId
group by b.BlogId, b.BlogName
order by [Count] desc

Further technical details

EF Core version: 2.1.4
Database Provider: Microsoft.EntityFrameworkCore.SqlServer and Npgsql
Operating system: Windows 10
IDE: (e.g. Visual Studio 2017 15.8.2)

@milosloub
Copy link

I have exactly the same problem. Is there any workaround how to produce "LEFT JOIN (Select SOME_AGGREGATE FROM etc.)" ?

@smitpatel
Copy link
Contributor

We decided not to do this for GroupJoin. GroupBy should be used to write equivalent query. See #17068 #12795

@smitpatel smitpatel added the closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. label Mar 16, 2020
@smitpatel smitpatel removed this from the Backlog milestone Mar 16, 2020
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported type-enhancement
Projects
None yet
Development

No branches or pull requests

4 participants