Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relational: Support translating GroupBy() to SQL #2341

Closed
divega opened this issue Jun 6, 2015 · 100 comments
Closed

Relational: Support translating GroupBy() to SQL #2341

divega opened this issue Jun 6, 2015 · 100 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@divega
Copy link
Contributor

divega commented Jun 6, 2015

Updated as of 3/14/2018 based on implementation introduced

LINQ's GroupBy() operators can sometimes be translated to SQL's GROUP BY clauses, in particular when aggregate functions are applied in the projection.

Scope for 2.1 release

Our current intention is for the scope of the work in 2.1 to improve how LINQ's GroupBy is evaluated in this particular scenario:

Grouping on a simple expression that references a single column or multiple columns (using an anonymous type) of the query root and then projecting any aggregate operation from (Count, Sum, Average, Min, Max) and/or any individual property that is part of the grouping key (this will be translated to GROUP BY in SQL)

What is supported in 2.1.0-preview2

Apart from what is supported in 2.1.0-preview1 (details below), we have also added some more patterns

  • Grouping by constant/variables
  • Grouping by properties of reference navigations
  • Ordering or filtering on grouping key or aggregate
  • Aggregate being DTO/nominal type

Examples

// Grouping by constant or a variable from closure
db.Orders.GroupBy(o => 2).Select(g => g.Count());
var a = 5;
db.Orders.GroupBy(o => a).Select(g => g.Count());

// Grouping by scalar properties from reference navigations
db.Orders.GroupBy(o => o.Customer.City).Select(g => g.Count());

// Ordering by Key/Aggregate after GroupBy
db.Orders.GroupBy(o => o.CustomerID).OrderBy(o => o.Count()).Select(g => new { g.Key, Count = g.Count() });

// Filtering on Key/Aggregate after GroupBy (Translates to Having clause in SQL)
db.Orders.GroupBy(o => o.CustomerID).Where(o => o.Count() > 0).Select(g => new { g.Key, Count = g.Count() });

// Projecting aggregate into nominal type
db.Orders.GroupBy(o => o.CustomerID).Select(g => new CustomerCountInfo { g.Key, Count = g.Count() });

And a few bugfixes - #11218 #11157 #11176

What is supported in 2.1.0-preview1

// Grouping by single column projecting aggregate or key
db.Orders.GroupBy(o => o.CustomerId).Select(g => g.Count());
db.Orders.GroupBy(o => o.CustomerId).Select(g => new { CustomerId = g.Key, Count = g.Count() });

// Grouping by multiple columns (using anonymous type) projecting aggregate or key or nested key
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate }).Select(g => g.Sum(o => o.Cost));
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { CustomerId = g.Key.CustomerId, Sum = g.Sum(o => o.Cost) });
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { Key = g.Key, Sum = g.Sum(o => o.Cost) }); // Where Key will be anonymous object

// Grouping after complex query root
(from o in db.Orders.Where(o => o.OrderID < 10400).OrderBy(o => o.OrderDate).Take(100)
    join c in db.Customers.Where(c => c.CustomerID != "DRACD" && c.CustomerID != "FOLKO").OrderBy(c => c.City).Skip(10).Take(50)
        on o.CustomerID equals c.CustomerID
    group o by c.CustomerID)
.Select(g => new { g.Key, Count = g.Average(o => o.OrderID) });

db.Orders.OrderBy(o => o.OrderID)
    .Skip(80)
    .Take(500)
    .GroupBy(o => o.CustomerID)
    .Select(g => g.Max(o => o.OrderID));

// All above examples have group of entity types after GroupBy

// Selecting Group of anonymous types containing multiple columns
db.Orders.GroupBy(o => o.CustomerId, new {o.OrderDate, o.Price}).Select(g => g.Sum(t => t.Price));

Scenarios that we are not planning to improve in the 2.1 release

1. Grouping on constants (available in 2.1.0-preview2)
2. Grouping on an entity (e.g. a reference navigation property)
3. Projecting non-aggregate scalar subqueries after grouping, e.g. FirstOrDefault()
4. Making groups of multiple entityTypes using anonymous types.
5. Using Key/Aggregate values after GroupBy in joins (#10012)

All the scenarios above present different variations depending on what happens after the GroupBy, e.g. is there an aggregate function or not, is the key mentioned in the projection or not, etc. These scenarios will still result in client evaluation.

We would appreciate if customers that care about EF Core supporting any of those scenarios that are scoped out from 2.1 to create individual issues for them, up-vote them, and keep the discussion there.

@alexvaluyskiy
Copy link

I have these classes

public class Blog
{
    public int BlogId { get; set; }
    public string Url { get; set; }

    public List<Post> Posts { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }

    public int BlogId { get; set; }
    public Blog Blog { get; set; }
}

If I use this code

var query1 = from p in context.Posts
             where p.BlogId == 1
             group p by p.Title into g
             select new
             {
                 Title = g.Key,
                 Count = g.Count()
             };

EF7 generates this SQL

SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
FROM [Post] AS [p]
WHERE [p].[BlogId] = 1

without grouping and projection and groups data on the client side

If I use Group by with Inner Join with this code

var query = from p in context.Posts
            join b in context.Blogs on p.BlogId equals b.BlogId
            where p.BlogId == 1
            group p by p.Title into g
            select new
            {
                Title = g.Key,
                Count = g.Count()
            };

I get this error

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType1`2[EF7Test.Post,EF7Test.Blog]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[EF7Test.Post]'."}

EF 7.0.0-beta6-13735

@leus
Copy link

leus commented Oct 2, 2015

Just another test case:

public class Categoria
{
    public int Id { get; set; }
    public int? CategoriaId { get; set; }
    public string Nombre { get; set; }
}

var q = from c in dbContext.Categorias
                join cc in dbContext.Categorias on c.CategoriaId equals cc.Id
                group c by c.CategoriaId into g
                select new { Id = g.Key.Value, total = g.Count() };

Throws the following exception:

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType3`2[nyp.DataModels.Categoria,nyp.DataModels.Categoria]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[nyp.DataModels.Categoria]'."}

This is for a recursive tree-like structure. Here I'm just trying to count children.

@rowanmiller
Copy link
Contributor

@anpete should this go to backlog for now?

@anpete
Copy link
Contributor

anpete commented Dec 7, 2015

@rowanmiller Sounds good.

@rowanmiller rowanmiller removed the pri0 label Dec 7, 2015
@rowanmiller rowanmiller modified the milestones: Backlog, 7.0.0 Dec 7, 2015
@YehudahA
Copy link

YehudahA commented Dec 9, 2015

Why it's go to backlog?!
It's critical for for information and reports systems.
For example, I have 2 million records and I need to show sales by customer. Grouping in client side, is impossible!

@ErikEJ
Copy link
Contributor

ErikEJ commented Dec 9, 2015

View/ Indexed view?

@rowanmiller
Copy link
Contributor

Or a raw SQL query. It's not that we aren't going to support it, just that we don't have time to get this translating (and stabilize it) before the 7.0.0 release. This one will be one of the first things we tackle after that.

@rjperes
Copy link

rjperes commented Jul 22, 2016

What would be the recommended approach using SQL (FromSql)?
The problem I see is that we need an entity to materialize the records into, and I think it would need to be mapped.

@divega divega changed the title Relational: Support translating of GroupBy() to SQL Relational: Support translating GroupBy() to SQL Aug 8, 2016
@divega
Copy link
Contributor Author

divega commented Aug 8, 2016

@rjperes that would require a feature we are planning to add in 1.1.0: the ability to boostrap queries using arbitrary types: #1862.

@chris-oswald
Copy link

@rowanmiller I was hoping this was a 1.1 milestone, but I don't see this in 1.2 either. Any idea when this will get pulled off the backlog? I know we have work-arounds in the mean time.

@rowanmiller
Copy link
Contributor

@chris-oswald not yet, we are planning for the next release this week.

@simeyla
Copy link

simeyla commented Dec 8, 2016

It's good to at least see it is listed as item #2 in the Critical O/RM features section.

Even if all situations cannot be accounted for - I hope we are going to at least get the simplest cases covered.

And to be fair @chris-oswald there is literally nothing in the current 1.2 feature list - you scared me for a moment!

@tuespetre
Copy link
Contributor

Heads up: I have made a lot of headway in this area. Expect a PR soon 😸

@mchenx
Copy link

mchenx commented Feb 20, 2018

This seems still not suppprted... I'm currently working on a project and facing the performance issue as it doesn't translate to select count(*), xxxx group by xxxx , it actually fetches all the data from DB and does group by locally which causes the performance issue, coz I have got a million records...

var counts = (from t in _dbContext.VoteTickets
join m in _dbContext.VoteItemData
on t.VoteItemId equals m.Id
where m.City == city
group t by m.District into grp
select new
{
District = grp.Key,
Count = grp.Count()
})
.ToArray();

@jzabroski
Copy link

jzabroski commented Feb 20, 2018 via email

@smitpatel
Copy link
Contributor

upgrading EF6 to EFCore will bring unexpected surprises.
EF6 to EF Core is not in-place upgrade. EF Core is rewrite of EF6 and in many cases have behavioral differences. Any application expecting the group by constant to work same as EF6 is gonna fail regardless of it client eval or throw. By doing client eval, there is higher probability of external tools may just work.

@Ben-Pattinson
Copy link

@mchenx I'm affraid you won't get any joy with EF Core anytime soon. I recommend doing the simple stuff in EF Core and anything complicated in dapper or similar. It really is WAY easier than even full EF. Yes there's hard coded strings, but with string interpolation, nameof and a data layer, you can really make it robust.
Give it a shot, you'll be surprised.

@ethanli83
Copy link

@Ben-Pattinson, maybe you will be interested in this project I worked on EFSqlTranslator. It translates EFCore lambda into string and executes it on Dapper.

@Ben-Pattinson
Copy link

@ethanli83, very nice! I must say, initially I was rather sceptical, having been badly burnt by EF's horrific SQL generation. Assuming you haven't cherry picked your examples of SQL generation, they are nice and readable, and moreover, sane. Very un-EF like. You'll get into trouble for stuff like that, the Union will be along in a moment to give you a good talking to :) Comprehensive examples too, keep up the good work!

@ethanli83
Copy link

Thank you @Ben-Pattinson, I will take a look at Union!~

@anpete
Copy link
Contributor

anpete commented Feb 27, 2018

@Ben-Pattinson SQL generation is something we have tried really hard to improve in EF Core. Do you have some examples of where you feel we can improve?

@Ben-Pattinson
Copy link

@anpete (sorry for the rambling reply)
I probably should have been more specific, I was referring to full framework's EF sql. I have spent literally days trying to get full EF to output sql that did what I meant. Trying to decipher the generated sql was often a nightmare. Getting the same operation was minutes in dapper. This is why I settled on a hybrid approach. Use EF for CRUD and simple selects, dapper for anything complex. When I started porting to core, I was pleased to see the SQL is indeed much more readable, which is a great improvement.... however imho the lack of group-by support makes it a bit of a joke.
The silent nature in which the group-by executes client-side is particularly nasty for the unwary. It reinforces my position of allways checking the SQL EF outputs - which the new generation does really help with.
I can work round this new limitation by simply extending my dapper use, but I'm sure you can see the problems anyone not on a hybrid approach will have.

I appreciate the problems you have when you need to support the same linq syntax for many different types of query, but when personal projects are running rings around your official data layer implementation, something isn't right.

Personally, all I ever wanted was to write sql in my data layer and have it compile time validated against my data layer. It's so much easier and flexible than translating between unrelated syntaxes. I've got that now with dapper, nameof, string interpolation and t4 templates. It would have been nice to get slicker intellisense support, but it's pretty usable.

@jzabroski
Copy link

jzabroski commented Feb 27, 2018 via email

@jzabroski
Copy link

jzabroski commented Feb 27, 2018 via email

@anpete
Copy link
Contributor

anpete commented Feb 28, 2018

@Ben-Pattinson Thanks for the feedback and glad to hear that you have come up with a solution that works for you.

A couple of responses to the points you make: LINQ GroupBy is very different from SQL GROUP BY and so we need client eval here for many cases anyway. It was a shame we weren't able to get to GroupBy with aggregates translation until 2.1, but it is almost here. One reason for this prioritization was that we have the FromSql API, which gives you a Dapper-like experience for creating queries with raw SQL. Have you tried it? If so, it would be great if you could let us know why it is not working for you - It could be because we didn't have Query Types, which are also landing in 2.1, and make the FromSql experience much better because you no longer need an entity result type.

@simeyla
Copy link

simeyla commented Mar 4, 2018

@anpete It's really great to finally see this working :-)

But oddly enough I still receive the error message telling me it cannot translate to GROUP BY, even though the generated SQL right underneath most certainly uses GROUP BY! I assume this must be a known issue?

Entity Framework Core 2.1.0-preview1-28290 initialized ......

warn: Microsoft.EntityFrameworkCore.Query[20500]
The LINQ expression 'GroupBy(new <>f__AnonymousType1`1(OrderDateDt = [x].OrderDateDt), [x])' could not be translated and will be evaluated locally.

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
Executed DbCommand (33ms) [Parameters=[], CommandType='Text', CommandTimeout='30']


      SELECT [x].[OrderDateDt], COUNT(*)
      FROM [Order] AS [x]
      WHERE [x].[CompletedOrderId] IS NOT NULL AND ([x].[OrderDateDt] > DATEADD(day, -30E0, GETDATE()))
      GROUP BY [x].[OrderDateDt]

This is from this source C#

        var groupedOrders = _context.Order
                    .Where(x => x.CompletedOrderId != null)
                    .Where(x => x.OrderDateDt > DateTime.Now.AddDays(-30))
                    .GroupBy(x => new { x.OrderDateDt })
                    .Select(x => new
                    {
                        Date = x.Key.OrderDateDt,
                        Count = x.Count()
                    })
                    .ToList();

It returns instantly and definitely works as expected so I'm happy for now - just wanted to point this out.

@smitpatel
Copy link
Contributor

@simeyla - Thanks for info. I have filed #11157 to track the issue and fix it.

smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
@smitpatel
Copy link
Contributor

For preview2 version, there are more patterns now being translated to server (including group by constant). I have updated first post to capture the details.

@artemvalmus
Copy link

Please let me know if I can solve the following in 2.1.1:

I need to take the latest item from each group, but I receive warnings saying the query cannot be translated. Is there a workaround and is it going to be possible in the foreseeable future?

Example:

    public class Author
    {
        public int AuthorId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }

    public class Book
    {
        public int BookId { get; set; }
        public int AuthorId { get; set; }
        public string Name { get; set; }
        public DateTime CreatedAt { get; set; }

        public virtual Author Author { get; set; }
    }

    public class EFCoreDemoContext : DbContext
    {
        public static readonly LoggerFactory MyLoggerFactory
            = new LoggerFactory(new[] { new ConsoleLoggerProvider((_, __) => true, true) });

        public DbSet<Author> Authors { get; set; }
        public DbSet<Book> Books { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseLoggerFactory(MyLoggerFactory).UseSqlServer("Server=(localdb)\\MSSQLLocalDB;Database=Test;Trusted_Connection=True");
        }
    }

    static void Main(string[] args)
    {
        using (var context = new EFCoreDemoContext())
        {
            context.Database.EnsureCreated();

            // works as expected
            var test1 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.Count()).ToList();

            // warn: The LINQ expression 'GroupBy([x].AuthorId, [x])' could not be translated and will be evaluated locally.
            // warn: The LINQ expression 'orderby [b].CreatedAt asc' could not be translated and will be evaluated locally.
            // warn: The LINQ expression 'Last()' could not be translated and will be evaluated locally.
            var test2 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.OrderBy(b => b.CreatedAt).Last()).ToList();
        }

        Console.ReadKey();
    }

@ajcvickers ajcvickers modified the milestones: 2.1.0-preview1, 2.1.0 Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests