Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Translate GroupBy constant value with aggregate #9969

Closed
smitpatel opened this issue Oct 4, 2017 · 15 comments
Closed

Query: Translate GroupBy constant value with aggregate #9969

smitpatel opened this issue Oct 4, 2017 · 15 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@smitpatel
Copy link
Contributor

Based on #2341 (comment) & #2341 (comment)

var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => 0)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

Queries like above cannot be translated to relational group by since group by constant is invalid SQL

select CustomerID, count(*) from orders
group by 0;

throws

Msg 164, Level 15, State 1, Line 2
Each GROUP BY expression must contain at least one column that is not an outer reference.

Since the purpose of such queries is just to get full aggregate of whole table rather than group-wise, we can perhaps rewrite QueryModel to simplify the task beforehand so our translation pipeline can work seamlessly.

As indicated by @bricelam , some of the frameworks are generating such queries (he can fill in more details), hence we should separately prioritize that work, hence filing this issue.

@dpsenner
Copy link

dpsenner commented Oct 16, 2017

I would expect that the given linq query:

var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => 0)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

would group by the value '0' iff the database can do that and not by the first column as suggested by the author. So the linq query given could translate to the following sql when using postgres as the database:

WITH TMP AS (
   SELECT * FROM (VALUES (0)) AS t (key)
)
SELECT t1.KEY, COUNT(*), SUM(t2.Freight), MAX(t2.ShipName), MIN(t2.ShipAddress) FROM TMP t1
INNER JOIN "TABLE" AS t2 ON 1 = 1
GROUP BY t1.KEY

The proposal is rather a hack and thus in my opinion this enhancement should be closed as invalid since a working implementation of the linq query could be:

var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => Param_0.CustomerId)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});

@smitpatel
Copy link
Contributor Author

@dpsenner - GROUP BY 0 is not supported by every database. Creating temporary table and joining with it, may not be supported by all database and that is unnecessary calculation. Regardless, for a given query, if EF is going to give data return in the form user wanted, how EF arrives at the result should be irrelevant to user as long as it is evaluated on server side.

The second linq query posted already translate correctly to server.

@dpsenner
Copy link

So we both agree that this issue is invalid because linq does not support column references by ordinal (i.e. .GroupBy(Param_0 => 0)) but rather expects explicit columns (i.e. .GroupBy(Param_0 => Param_0.CustomerId)) instead?

@smitpatel
Copy link
Contributor Author

would group by the value '0' iff the database can do that and not by the first column as suggested by the author.

The Linq & SQL query in first post are not same. The SQL is not the translation of the Linq, it is just an example of that group by 0 is invalid SQL.

GroupBy(c => c.CustomerID) in linq translates to GROUP BY c.CustomerID in database which is direct mapping. Same cannot be applied if linq has GroupBy(c => 0) because of database limitation.

GroupBy constant is perfectly valid linq query and it is being generated by various frameworks already.

> var a = new[] { 1, 2, 3 }.AsQueryable().GroupBy(t => 0).ToList();
> a
List<IGrouping<int, int>>(1) { Lookup<int, int>.Grouping { 1, 2, 3 } }

This issue specifically to translate GroupBy constant in LINQ to something which is valid in database and gives same results.

@asfernandes
Copy link

Has .OrderBy any problem? I'm asking this because ORDER BY (SQL) is simpler than GROUP BY and may be problematic too.

@smitpatel
Copy link
Contributor Author

@asfernandes - OrderBy(c => 0) (which is like no-op on linq) gets translated to ORDER BY (SELECT 1) on server.

@asfernandes
Copy link

Ok, so it does not treat 0 as an position. So translating to GROUP BY, a GROUP BY (SELECT 1) (or something simple as GROUP BY CAST(1 as INT) is a valid expression but should not be valid with everything.

It should not be valid to do a equivalent to SELECT FIELD FROM TABLE GROUP BY CAST(1 as INT) but SELECT 1 FROM TABLE GROUP BY CAST(1 as INT) or SELECT COUNT(FIELD) FROM TABLE GROUP BY CAST(1 as INT) are valid expressions.

I'm not understanding why this should be complicate in LINQ.

@ajcvickers ajcvickers modified the milestones: 2.1.0-preview1, 2.1.0 Jan 17, 2018
@recaro
Copy link

recaro commented Jan 24, 2018

Group by constant works in EF 6 by using constant as the key column, similar to:

SELECT 
   0 as K0,
   COUNT(Field) AS P0
FROM Table
GROUP BY K0

This seems like the logical translation of group by constant anyway.

@ajcvickers ajcvickers modified the milestones: 2.1.0, Backlog Jan 26, 2018
@smitpatel smitpatel modified the milestones: Backlog, 2.1.0 Mar 14, 2018
@smitpatel
Copy link
Contributor Author

Pulling this back into 2.1 milestone. I got it working.

@smitpatel smitpatel added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. and removed punted-for-2.1 labels Mar 14, 2018
smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when key/element/result selector is DTO instead of anonymous type Resolves #11176
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
smitpatel added a commit that referenced this issue Mar 14, 2018
- Add support for translating OrderBy after GroupBy operator
- Add support for `HAVING` clause in SQL which would be generated when translating predicate after GroupByAggregate Resolves #10870
- Make sure client eval warning is not issued when translating GroupByAggregate Resolves #11157
- GroupBy Aggregate works when element/result selector is DTO instead of anonymous type Resolves #11176 (KeySelector has to be client evaluated)
- Make sure that SQL added to GROUP BY clause is not aliased Resolves #11218
- Translate GroupBy Constant/Parameter with aggregates Resolves #9969

Part of #10012
Part of #2341
@smitpatel
Copy link
Contributor Author

We translate

context.Orders.GroupBy(o => 2).Select(
    g =>
        new
        {
            Sum = g.Sum(o => o.OrderID),
            Min = g.Min(o => o.OrderID),
            g.Key,
            Max = g.Max(o => o.OrderID),
            Avg = g.Average(o => o.OrderID)
        })
SELECT SUM([t].[OrderID]) AS [Sum], MIN([t].[OrderID]) AS [Min], [t].[Key], MAX([t].[OrderID]) AS [Max], AVG(CAST([t].[OrderID] AS float)) AS [Avg]
FROM (
    SELECT [o].*, 2 AS [Key]
    FROM [Orders] AS [o]
) AS [t]
GROUP BY [t].[Key]

Same happens when grouping is by a variable from closure.

@smitpatel smitpatel changed the title Query: Translate GroupBy constant value followed by aggregate as table aggregate Query: Translate GroupBy constant value with aggregate Apr 11, 2018
@SamVanheer
Copy link

We are experiencing this issue using EF Core 2.1:

var conversationTime = await _myContext.Conversations
    .GroupBy(c => 1)
    .Select(g =>
        new BotStatistic
        {
            Name = "ConversationTime",
            AverageTime = g.Average(c => EF.Functions.DateDiffMillisecond(c.StartDate, c.EndDate) / 1000.0),
            ShortestTime = g.Min(c => EF.Functions.DateDiffMillisecond(c.StartDate, c.EndDate) / 1000.0),
            LongestTime = g.Max(c => EF.Functions.DateDiffMillisecond(c.StartDate, c.EndDate) / 1000.0)
        }
    ).SingleAsync();

Using SQL Server 2012.

StartDate and EndDate are datetime columns.

The intention is to return one result containing aggregate date, which is why GroupBy(c => 1) is used.
I have not been able to find any other way to return one result, and this works in-memory, but the intention is to run this as an SQL query.

According to information i've found in some issues this should be possible since 2.1, so is there a bug here or are we doing something wrong? Or is there another way to return one result?

@smitpatel
Copy link
Contributor Author

@SamVanheer - That is #11976 Currently, we still do streaming group by for such queries.

@SamVanheer
Copy link

Thanks for the quick reply, we'll use separate queries for the time being until this can be done directly.

@smitpatel
Copy link
Contributor Author

@SamVanheer - You can rewrite your query to use element selector and get it to server eval

                var conversationTime = db.Conversations
                    .GroupBy(c => 1,
                        e =>
                            EF.Functions.DateDiffMillisecond(e.StartDate, e.EndDate) / 1000.0)
                    .Select(g =>
                    new
                    {
                        Name = "ConversationTime",
                        AverageTime = g.Average(c => c),
                        ShortestTime = g.Min(c => c),
                        LongestTime = g.Max(c => c)
                    }).ToList();

Produces

      SELECT AVG(CAST([t].[c] AS float)) AS [AverageTime], MIN([t].[c]) AS [ShortestTime], MAX([t].[c]) AS [LongestTime]
      FROM (
          SELECT DATEDIFF(MILLISECOND, [c].[StartDate], [c].[EndDate]) / 1000E0 AS [c], 1 AS [Key]
          FROM [Conversations] AS [c]
      ) AS [t]
      GROUP BY [t].[Key]

@SamVanheer
Copy link

Thanks for the help, this works.

Here's the query we're using, slightly modified:

var conversationTime = await _myContext.Conversations
                .GroupBy(c => 1, e => EF.Functions.DateDiffMillisecond(e.StartDate, e.EndDate) / 1000.0)
                .Select(g =>
                    new BotStatistic
                    {
                        Name = "ConversationTime",
                        AverageTime = g.Average(c =>c),
                        ShortestTime = g.Min(c => c),
                        LongestTime = g.Max(c => c)
                    })
                .SingleAsync();

@ajcvickers ajcvickers modified the milestones: 2.1.0-preview2, 2.1.0 Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests

6 participants