Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: query with group by in subquery produces invalid SQL #15279

Closed
mdmoura opened this issue Apr 8, 2019 · 21 comments · Fixed by #17442
Closed

Query: query with group by in subquery produces invalid SQL #15279

mdmoura opened this issue Apr 8, 2019 · 21 comments · Fixed by #17442
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression type-bug
Milestone

Comments

@mdmoura
Copy link

mdmoura commented Apr 8, 2019

I am running a Query with a GroupBy and it evaluates on the client.

I get the exception because I am logging all queries that evaluates on the client.

Exception message:

InvalidOperationException: Error generated for warning 'Microsoft.EntityFrameworkCore.Query.QueryClientEvaluationWarning: The LINQ expression 'GroupBy(1, [y])' could not be translated and will be evaluated locally.'. This exception can be suppressed or logged by passing event ID 'RelationalEventId.QueryClientEvaluationWarning' to the 'ConfigureWarnings' method in 'DbContext.OnConfiguring' or 'AddDbContext'.

Stack trace:

Microsoft.EntityFrameworkCore.Diagnostics.EventDefinition<TParam>.Log<TLoggerCategory>(IDiagnosticsLogger<TLoggerCategory> logger, WarningBehavior warningBehavior, TParam arg, Exception exception)
Microsoft.EntityFrameworkCore.Internal.RelationalLoggerExtensions.QueryClientEvaluationWarning(IDiagnosticsLogger<Query> diagnostics, QueryModel queryModel, object queryModelElement)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.VisitResultOperator(ResultOperatorBase resultOperator, QueryModel queryModel, int index)
Remotion.Linq.Clauses.ResultOperatorBase.Accept(IQueryModelVisitor visitor, QueryModel queryModel, int index)
Remotion.Linq.QueryModelVisitorBase.VisitResultOperators(ObservableCollection<ResultOperatorBase> resultOperators, QueryModel queryModel)
Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.EntityQueryModelVisitor.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.LiftSubQuery(IQuerySource querySource, SubQueryExpression subQueryExpression)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.CompileMainFromClauseExpression(MainFromClause mainFromClause, QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.EntityQueryModelVisitor.VisitMainFromClause(MainFromClause fromClause, QueryModel queryModel)
Remotion.Linq.Clauses.MainFromClause.Accept(IQueryModelVisitor visitor, QueryModel queryModel)
Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.EntityQueryModelVisitor.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.ExpressionVisitors.ProjectionExpressionVisitor.VisitSubQuery(SubQueryExpression expression)
Remotion.Linq.Clauses.Expressions.SubQueryExpression.Accept(ExpressionVisitor visitor)
System.Linq.Expressions.ExpressionVisitor.VisitAndConvert<T>(ReadOnlyCollection<T> nodes, string callerName)
Remotion.Linq.Parsing.RelinqExpressionVisitor.VisitNew(NewExpression expression)
Microsoft.EntityFrameworkCore.Query.ExpressionVisitors.RelationalProjectionExpressionVisitor.VisitNew(NewExpression newExpression)
Microsoft.EntityFrameworkCore.Query.EntityQueryModelVisitor.VisitSelectClause(SelectClause selectClause, QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.VisitSelectClause(SelectClause selectClause, QueryModel queryModel)
Remotion.Linq.Clauses.SelectClause.Accept(IQueryModelVisitor visitor, QueryModel queryModel)
Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.EntityQueryModelVisitor.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.RelationalQueryModelVisitor.VisitQueryModel(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.EntityQueryModelVisitor.CreateAsyncQueryExecutor<TResult>(QueryModel queryModel)
Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQueryCore<TFunc>(object cacheKey, Func<Func<QueryContext, TFunc>> compiler)
Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.ExecuteAsync<TResult>(Expression query)
Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable<TResult>.System.Collections.Generic.IAsyncEnumerable<TResult>.GetEnumerator()
System.Linq.AsyncEnumerable.Aggregate_<TSource, TAccumulate, TResult>(IAsyncEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> accumulator, Func<TAccumulate, TResult> resultSelector, CancellationToken cancellationToken) 

Steps to reproduce

The original query is more complex but the following one replicates the error:

var result = await context.Packages.Select(x => new {
  Value = x.Products.GroupBy(y => 1).Select(y => new {
    Result = 20
  })
}).ToListAsync();

Further technical details

EF Core version: 2.2.0
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: macOs Mojave 10.14.4
IDE: Visual Studio Code 1.33.0

@ajcvickers ajcvickers added this to the Backlog milestone Apr 8, 2019
@NetTecture
Copy link

Have you considered reworking it? FIRST A group by, THEN a select based on the returned data into the new object?

@mdmoura
Copy link
Author

mdmoura commented Apr 9, 2019

@NetTecture What do you mean? My real query is something closer to the following:

var result = await context.Packages.Select(x => new {
  Value = x.Products
      .Select(y => new {
          Price = y.SalePrice * y.Coefficient / 2.4,
          Potential = y.SalePrice * y.Valorization / 4.8
      })
      .GroupBy(y => 1)
      .Select(y => new {
         Result = y.Sum(z => z.Price),
         Next = y.Sum(z => z.Potential)
      }).ToListAsync()
  })

Basically, I am doing the following:

1 - Use a first Select to calculate the Price and Potential with values of the Entity.

2 - Group By 1 to create only one group.

3 - Apply a second Select to calculate Result and Next
Result and Next are the SUM of the Price and Potential of all records.

I tried a few variations but I was never able to solve the client evaluation.

If you have an alternative please let me know ...

I would like to solve this before a fix is pushed.

Thank You

@smitpatel
Copy link
Contributor

The issue here is the nested collection. You are not running group by query on products directly. You are running it on packages. i.e. for each package run Group by query. It cannot be translated to SQL GROUP BY directly and requires outer apply at best case. (Currently EF Core fails to lift it in single query and generates N+1 queries).
Since you are grouping all products of given package into one, it is very easy to group by PackageId instead.
Following query will generate same result and does server evaluation. The shape of result is slightly different. In your case, it will be List<List<'a>> where each inner list is going to contain only 1 entry. In following query the result is List<'a> which is easy to convert to desired shape on client side.

var query = db.Set<Product>()
    .Select(y => new
    {
        PackageId = EF.Property<int?>(y, "PackageId"),
        Price = y.SalePrice * y.Coefficient / 2.4,
        Potential = y.SalePrice * y.Valorization / 4.8
    })
    .GroupBy(p => p.PackageId)
    .Select(y => new
    {
        Result = y.Sum(z => z.Price),
        Next = y.Sum(z => z.Potential)
    }).ToList();

Generated SQL

      SELECT SUM(([y].[SalePrice] * [y].[Coefficient]) / 2.3999999999999999E0) AS [Result], SUM(([y].[SalePrice] * [y].[Valorization]) / 4.7999999999999998E0) AS [Next]
      FROM [Product] AS [y]
      GROUP BY [y].[PackageId]

@mdmoura
Copy link
Author

mdmoura commented Apr 15, 2019

@smitpatel Thank you for the answer. I have been testing your suggestion in different scenarios and it seems to solve the problem.

@John0King
Copy link

John0King commented May 18, 2019

Will GroupBy with Sum and conditions be supported by EF Core 3.0 ?

SELECT Id, Count(Num) as  ICount, 
   CASE WEHN Count(Id) == 0 
    Then 1.0
    ELSE   SUM(CASE WHEN Num % 2 = 0 THEN 1.0 ELSE 0.0 END) / Count(Num)
    END as  IRate
FROM tb
GROUP BY Id

the linq should be

ctx.Tb.GroupBy(x=>x.Id)
    .Select(x=> new 
    { 
        Id = x.Key,   
        ICount = x.Count(),
        IRate = x.Count() == 0 ? 1.0 : g.Sum(y=>y.Num % 2 == 0 ? 1.0 : 0.0) / x.Count() 
        //OR
         IRate = x.Count() == 0 ? 1.0 : g.Count(y=>y.Num % 2 == 0) / x.Count() 
       // Count should also be translate to Sum when condition
     })

@smitpatel smitpatel added the verify-fixed This issue is likely fixed in new query pipeline. label Jul 2, 2019
@roji
Copy link
Member

roji commented Aug 27, 2019

Basic Sum in GroupBy projection is tested by GroupBy_Property_Select_Sum_Min_Key_Max_Avg. Conditionality in GroupBy is tested on key by GroupBy_aggregate_projecting_conditional_expression_based_on_group_key, #17442 also adds conditional on the aggregate function itself.

@roji roji closed this as completed Aug 27, 2019
@roji roji added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. and removed verify-fixed This issue is likely fixed in new query pipeline. labels Aug 27, 2019
@roji roji modified the milestones: Backlog, 3.0.0 Aug 27, 2019
@smitpatel smitpatel reopened this Aug 27, 2019
@smitpatel
Copy link
Contributor

The first query is still untested.

var result = await context.Packages.Select(x => new {
  Value = x.Products.GroupBy(y => 1).Select(y => new {
    Result = 20
  })
}).ToListAsync();

@smitpatel smitpatel removed this from the 3.0.0 milestone Aug 27, 2019
@roji roji removed their assignment Aug 30, 2019
@roji roji added verify-fixed This issue is likely fixed in new query pipeline. and removed closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. labels Aug 30, 2019
roji added a commit that referenced this issue Aug 30, 2019
@roji roji added this to the Backlog milestone Aug 30, 2019
@maumar
Copy link
Contributor

maumar commented Oct 8, 2019

@smitpatel wrt case 1. after making the change the scenario is still broken - we now produce group by clause, but still there are extra columns in the subquery projection:

SELECT [c].[CustomerID], [t].[c], [t].[OrderID]
FROM [Customers] AS [c]
LEFT JOIN (
    SELECT SUM([o].[OrderID]) AS [c], [o].[OrderID], [o].[CustomerID]
    FROM [Orders] AS [o]
    GROUP BY [o].[CustomerID]
) AS [t] ON [c].[CustomerID] = [t].[CustomerID]
ORDER BY [c].[CustomerID], [t].[OrderID]

@maumar
Copy link
Contributor

maumar commented Oct 8, 2019

related to/duplicate of #15873

@ajcvickers ajcvickers added this to the Backlog milestone Oct 10, 2019
@SebastiaanPolfliet
Copy link

I see that the milestone 3.1.0 got removed. Does this mean we shouldn't expect the issue to be fixed in 3.1.0?

Thank you!

@ajcvickers
Copy link
Contributor

@SebastiaanPolfliet Correct.

@prochnowc
Copy link

@ajcvickers Any chance to get a fix into 3.1.2? This bug blocks us writing some projections which are used by OData endpoints.

@ajcvickers
Copy link
Contributor

@prochnowc It's pretty unlikely that we will patch this in a 3.1.x release given the current information. Changes to the query pipeline are risky, so the customer impact of the issue would have to be very large to offset that.

@NetTecture
Copy link

NetTecture commented Jan 26, 2020

Well, here is the point of view of someone earning money with making software:

I HAVE TO SHIP. You obviously do not, at least not something working, but I do have to ship.

Yes, this whole post will come out as a rant - but please understand it is arant build on months of frustration with a team that seems to not understand at all that their product is USED in commercial software and contineus to not care about fixing bugs, constantly delaing them repeatedly to the next version.

You talk about breaking things, for me EfCore 3.1 is in the same state ANY VERSION WAS: UNUSABLE. Seriously. Unusable except for the most simplistic cases and please do not use any feature - up before 3.1 global query filters broke the whole stack. NICE. There is nothing to break.

I have projects moved to .NET Core (now 3.1) and the only way we could ever use EfCore was to load the whole table into memory on every request, then filter in memory. EfCore was fundamentally broken since I started using it in 2.1. Now you tell me that this will stay so for the foreseeable future - what are you afraid of breaking in a product that is unusable? What is the logic there? Why not make a beta release so people can try it out, like - modern software development?

3.1 solved most of the issues that forced my workaround - mostly the total inability to use global query filters by fixing the bug in there that was overlooked then deemed not worthy fixing. It introduced a "one step forward, 5 steps back" approach by ripping out core functionality and STILL not generating valid SQL and then telling people to wait. Well, I release next week.

So, here is what I will do. I will rip out EfCore, the sperior product (except: it is not) and replace it with Ef non core - because THAT ONE WORKS. It may be architecturally inferior (but working), it is slower (except it is not), it has less support for non sql sources (which we do not use), it has less options for migrations (which are an antipattern for enterprise software), it has less features (but they actually WORK, not like EfCore features that are broken and should not even count) and it lacks global query filters (except there is a third party branch EntityFramework Clasic that does have them) and it has less linq features (which actually is wrong beacuse try doing a union or intersect in efcore and you realize that it only supports a VERY small subset of LINQ operations). Generally it is now - with 3 MAJOR releases of EfCore, STILL the superior product and one actually usable.

You REALLY need to get a point of view outside of the "we are super, why do people complain, lets delay critical fixes" and start thinking with the view of someone who has to ship. I have tried to wait as long as I could - time to move back and accept that the best thing which happened to EfCore now is the ability to dump it.

ANY bad sql bug is critial because you totally invalidate the use case for an ORM mapper if you are unable to actually use it.

In any sensible world, you would put out a beta branch with fixes and see whether people complain. You would actually fix our product and not tell people ALL OVER FOR EVERY RELEASE TO WAIT FOR THE NEXT ONE. I waited for fixes to EfCore 2.1, then was told in 2.2 to wait for 3.0 then was told ithe fixes where delayed for 3.1 then NOW I read that yeah, maybe in November we fix our crappy SQL generation, please delay your products AGAIN. EfCore turned from a rework of the overengineered but working EF product into the joke of the whole stack. Maybe it turns usable before it reaches version 10 - but I actually have to ship.

For anyone else who is in the same situation:
in Dotnetcore 3.1 you CAN use Ef - non core - and that one actually works.
If you prefer to have global query filters and a lot more functionality you can head over to https://entityframework-classic.net/ which is a branch that works in netstandard 2.0 and incorporates global query filters.

I personally - i have a test project now and for every major release that is released I will check whether it FINALLY is usable there, if not - no need for me to waste my time even opening bug reports.

@fschlaef
Copy link

This is probably not be the best place to discuss EF Core policy, but I want to answer to some points @NetTecture made that I consider valid :

  • Releases are too few and far between. For a highly-used product like this (NuGet says 3.1.0 was downloaded over 1.3 million times), having to wait over a month for a critical fix https://github.com/dotnet/efcore/milestone/78 doesn't cut it. I don't expect a release every day, but every other week would be much more acceptable.
  • Too many bugs get punted to oblivion. As someone who has been using .Net Core and EF Core every day since 1.0.0-rc1-update1 in 2016, and who opened a lot of issues here, more often than not we are told to wait until the next major release. This was especially a problem in the transition from 2.0 to 2.1 and 2.2 where some issues were stuck for months. If there is no workaround, you gotta re-engineer everything and wait 6 months.
  • The policy is unclear at best. 3.1.0 is supposed to be LTS, yet many bug fixes are punted to 5.0.0, which will no doubt bring a whole lot of new bugs, and the cycle will restart again. This already happened with the 2.1 LTS branch where many issues were punted to 3.0, which defeats the purpose of having LTS.
  • Rewriting rewriting rewriting. 1.0 was the "beta test", that's OK. Then, everything got rewritten for 2.0, and then again for 3.0, each time breaking stuff that had to be fixed after the fact. 5.0 will probably be the same.

All of that being said, for me, EF Core works. I've been using it continuously since 2016 and I'm very happy about it. Most of the time, it's transparent to use and doesn't get in the way. When it does, usually there's a fix, and unless you get some edge case (which happened to me) you can get your code working fast.

I wouldn't go back to .Net 4 to save my life. The EF and .Net Core team are doing a tremendous job and I hope that this will continue improving going forward 👍

Feel free to move this post to a more appropriate place if needed.

@smitpatel
Copy link
Contributor

@maumar - Can you clean this up when you get time? We need to figure out action item here.

@ajcvickers ajcvickers modified the milestones: Backlog, 6.0.0 Nov 5, 2020
@smitpatel
Copy link
Contributor

@maumar -ping

@Nathan-Ryan
Copy link

This issue's been here for over two years now, any progress on a fix?
Seems like quite a major as it breaks all sorts of sub-select with grouping logic, e.g.

                var query = (
                    from wp in database.WarehouseProducts
                    where wp.WarehouseLocation.Warehouse.OrganisationId == organisationId
                        // Omitted
                    group wp by new
                    {
                        // Omitted
                    } into wpGroup
                    select new ReportStockOnHandViewModel.ResultModel
                    {
                        // Omitted
                        QuantityOnHand = wpGroup.Sum(wp => wp.QuantityAvailable + wp.QuantityHeld),
                        VolumeM3OnHand = wpGroup.Sum(wp => wp.VolumeM3Available + wp.VolumeM3Held),
                        WeightKGOnHand = wpGroup.Sum(wp => wp.WeightKGAvailable + wp.WeightKGHeld),
                        Attributes = (
                            from wpa in database.WarehouseProductAttributes
                            where wpa.WarehouseProductId == wpGroup.Min(wp => wp.Id) // <-- This breaks the query
                            orderby wpa.ProductAttribute.Name
                            select new ProductAttributeItemModel
                            {
                                Name = wpa.ProductAttribute.Name,
                                Description = wpa.Description
                            })
                            .ToArray()
                    });

                return await query.ToArrayAsync();

This produces an error:

Column 'WarehouseProducts.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

@maumar
Copy link
Contributor

maumar commented Sep 16, 2021

I have verified that all queries mentioned in this thread now work correctly

@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Sep 16, 2021
maumar added a commit that referenced this issue Sep 16, 2021
We already have regression tests for most of the scenarios mentioned in the issue report. Adding the one scenario that was missing.

Resolves #15279
maumar added a commit that referenced this issue Sep 16, 2021
We already have regression tests for most of the scenarios mentioned in the issue report. Adding the one scenario that was missing.

Resolves #15279
maumar added a commit that referenced this issue Sep 16, 2021
We already have regression tests for most of the scenarios mentioned in the issue report. Adding the one scenario that was missing.

Resolves #15279
@maumar maumar closed this as completed Sep 16, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0, 6.0.0-rc2 Sep 17, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0-rc2, 6.0.0 Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.