-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SQL translation for .Net 6 Linq's MinBy/MaxBy Methods #25566
Comments
Relevant runtime issue: dotnet/runtime#27687 We could translate like this: SELECT * FROM employees WHERE salary = (SELECT MAX(salary) FROM employees); Need to look into behavior when multiple source items have the same maximum value (we could use LIMIT 1/TOP 1 to get an arbitrary row, if that's the right behavior). Opened #25570 to track support for all new LINQ features introduced in .NET 6.0. |
There are possibly other ways to write the same query as a work-around since these operators cannot be translated to SQL in direct native way. |
Note from triage: putting this in 7.0 to investigate reasonable translations. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I think the following are worth taking into consideration for an implementation:
For illustration, a UNIQUE index on SELECT i.*
FROM
(
SELECT ParentId, MAX(CreationDateTime) AS MaxCreationDateTime
FROM Items
GROUP BY ParentId
) AS Groups
INNER JOIN Items i
ON i.ParentId = Groups.ParentId
AND i.CreationDateTime = Groups.MaxCreationDateTime
-- Without a UNIQUE index, we would partition the result by ParentId and eliminate all non-first rows per partition |
@roji Ah, yes, you had already proposed the dependent subquery. And to account for duplicate winners, we could adjust it to the following: SELECT * FROM employees WHERE id = (SELECT TOP(1) id FROM employees ORDER BY salary DESC) When writing this manually, I would deduplicate more deliberately, such as by changing the ordering to This proposal is asymptotically optimal, but dependent subqueries do have a significant constant overhead per row they are applied to, which becomes very noticeable on larger data sets. I just did a whole bunch of experiments, and I think we can do better. I'm going to use a slightly more complex example, where we want the best-earning employee per department, since SELECT *
FROM employees e
GROUP BY e.department
HAVING e.id = (SELECT TOP(1) id FROM employees e2 WHERE e2.department = e.department ORDER BY e2.salary DESC) GROUP BY + INNER JOIN to the rescue Now let's see how we can get speedy joins instead of that pesky dependent subquery. With a UNIQUE index, the solution is straightforward: SELECT e.*
FROM
(
SELECT department, MAX(salary) AS maxSalary
FROM employees
GROUP BY department
) AS groups
INNER JOIN employees e ON e.department = groups.department AND e.salary = groups.maxSalary -- Unique With a non-UNIQUE index (or no index 😱), we could obviously materialize the results and deduplicate in code, but that seems too limiting. To deduplicate efficiently in SQL, we can group by [the thing we wish to deduplicate on], take the first primary key value of each group, and INNER JOIN the target table onto that. SELECT eUnique.*
FROM
(
SELECT MAX(e.Id) -- Can do MAX for MaxBy(), MIN for MinBy()
FROM
(
SELECT department, MAX(salary) AS maxSalary
FROM employees
GROUP BY department
) AS groups
INNER JOIN employees e ON e.department = groups.department AND e.salary = groups.maxSalary -- Non-unique
GROUP BY e.department, e.salary -- SELECT MAX/MIN removes duplicates here
) AS deduplicated
INNER JOIN employees eUnique ON eUnique.id = deduplicated.id In fact, EF Core 6 just generated precisely such a query for me when I worked out the equivalent of this.DbContext.Employees
// Get the top Salary for each Department
.GroupBy(instance => instance.Department)
.Select(group => new { Department = group.Key, Salary = group.Max(instance => instance.Salary) })
// Get the top Id for each { Department, TopSalary } pair
.Join(this.DbContext.Employees, groupMax => groupMax, instance => new { instance.Department, instance.Salary }, (groupMax, instance) => instance)
.GroupBy(instance => new { instance.Department, instance.Salary })
.Select(group => group.Max(instance => instance.Id))
// Get the Employees by Id
.Join(this.DbContext.Employees, groupMax => groupMax, instance => instance.Id, (groupMax, instance) => instance); When you try the solutions on a data set with a few thousand matching groups, you should see the dependent subquery take significantly more time. Points of attention:
|
Can you please share your experiments? For a query such as the following: SELECT * FROM employees WHERE id = (SELECT TOP(1) id FROM employees ORDER BY salary DESC) ... I'd expect the subquery to evaluate exactly once - there's no reference to any outer column so no reason to run it per outer row or anything.
The query your wrote has HAVING, which filters out groups (departments), so I don't think it does what you want (FWIW you probably can't refer to employee ID in the HAVING clause either, since that's not available after aggregating the employees). Am I misunderstanding your sample? In any case, the precise translation of an operator after GroupBy or without GroupBy may differ, so these are possibly two slightly different conversations. Re multiple winners, we generally follow the LINQ to Objects behavior (i.e. whatever the enumerable version of MaxBy does), but within limits of pragmatism (sometimes replicating that behavior in SQL isn't feasible or very slow); we'd have to investigate this. I'm imagining that the implementation probably returns the first item with the maximum value based on the existing ordering; so if you do |
Or here you were simply referring to the time taken to execute that subquery exactly once? If there's an index, then that should be negligible (for either the ordering version or the simpler MAX version). I'm also not sure why any other JOIN-based solution would be necessarily cheaper. I'd suggest concentrating on the non-GroupBy version to start with, just so we're sure we're aligned on the basics. |
I couldn't lay my finger on precisely what the distinction was, but this is it. Indeed, everything I mentioned was from the perspective of MinBy/MaxBy after GroupBy.
Good point. For the non-GroupBy version, there is only a single min/max, and the subquery should be fine.
Oops! I had moved away from the subquery approach quite early on. Hence, the subquery example was off the top of my head, and it was way off. The correct one is more complex. I don't have an example handy, but I saw yesterday that EF writes GROUP BY followed by an OUTER APPLY that uses a subquery to find the min/max for each group.
With LINQ being stable, I have the same expectation. Correct me if I'm wrong, but I also believe that the promise of being stable is not reflected in LINQ-to-SQL, since the "preexisting order" there normally depends on the index chosen. As such, picking an arbitrary winner is likely fine here, and saves complexity.
I don't fully understand why, but a subquery applied per row seems to be somehow treated in a "one by one" fashion, whereas a join seems to get executed in more of a batched fashion. So even though both the join and the subquery are O(log(N)) per row they are applied to (which is asymptotically optimal given a B-tree index), the subquery has a much larger constant overhead. It's as if the database is executing the subquery from the top of the tree each time the subquery is applied, while using a single set of hops through the tree to join the entire sequence. Although my explanation is pure interpretation, you should be able to easily observe the performance difference when returning a few thousand results from the respective queries. It's a factor of about 2 to 10, if I recall correctly.
😅 My suggestions are probably irrelevant until the GroupBy version comes into view. |
When we say preexisting order, that usually only refers to an ordering explicitly established with ORDER BY; so there must be an OrderBy LINQ operator before the MaxBy to establish the order. If there isn't one, the order is unspecified and the query is non-deterministic; it's true that the index plays a role in the actual ordering of the rows, but that something that isn't guaranteed and cannot be relied upon. To summarize, when there's no OrderBy in LINQ, I think we can definitely pick any arbitrary thing. If there's an OrderBy before the MaxBy, then we can optionally propagate it into the subquery, if we come to the conclusion that it's part of MaxBy's contract.
That sounds odd; I'd carefully test this and share concrete, comparative queries and their plans. |
You're right. Based on further testing, I can now say that my earlier claims regarding subqueries having a greater constant overhead certainly do not apply to SQL Server. It works as you might expect: it interprets the query to understand what you want, and produces a plan. Whether you expressed the query as a join or a dependent subquery makes little difference to it. I was originally trained on MySQL 5.6, so it is very possible that what I claimed applies to that alone. I have not yet checked the behavior on MySQL 8, although I'm quite curious. |
I did some further research and discovered that it is fairly challenging to get SQL Server to use an optimal plan for This work was inspired by this post, which shares some great insights into what's really going on and how to deal with it. Consider the following straightforward solution:
this.DbContext.Orders
//.Where(x => customerIds.Contains(x.CustomerId)) // Optional
// Find the latest order per distinct customer
.GroupBy(x => x.CustomerId)
.Select(group => group.Max(x => x.Id))
// Join the entity
.Join(this.DbContext.Orders, id => id, instance => instance.Id, (id, instance) => instance); SET STATISTICS PROFILE ON -- Help check index usage
SELECT [o].*
FROM (
SELECT MAX(Id) AS [Id]
FROM [Orders]
GROUP BY [CustomerId]
) AS [max]
INNER JOIN [Orders] AS [o] ON [o].[Id] = [max].Id Looks fine, right? Well, it's fine for Using only basic building blocks, we can more clearly tell the optimizer what we want, even enforcing the expected index without actually referencing it by name: this.DbContext.Orders
//.Where(x => customerIds.Contains(x.CustomerId)) // Optional
// Seek to each distinct customer
.Select(x => x.CustomerId)
.Distinct()
// For each customer, seek to the top row
.Select(customerId => this.DbContext.Orders
.Where(x => x.CustomerId == customerId)
.OrderByDescending(x => x.CustomerId).ThenByDescending(x => x.Id) // Enforce index and direction
.First().Id)
// Join the entity
.Join(this.DbContext.Orders, id => id, instance => instance.Id, (id, instance) => instance); EF produces roughly the following T-SQL: SET STATISTICS PROFILE ON -- Help check index usage
SELECT [o].*
FROM (
SELECT DISTINCT [CustomerId]
FROM [Orders]
) AS [max]
INNER JOIN [Orders] AS [o] ON [o].[Id] =
(
SELECT TOP(1) [o1].[Id]
FROM [Orders] AS [o1]
WHERE [o1].[CustomerId] = [max].[CustomerId]
ORDER BY [o1].[CustomerId] DESC, [o1].[Id] DESC
) Demo
I know the least about PostgreSQL, so I'm the least confident about its execution plan. Observations
For completion, here is an example using a composite key:
this.DbContext.Orders
//.Where(x => customerIds.Contains(x.CustomerId)) // Optional
// Seek to each distinct customer-website pair
.Select(x => new { x.CustomerId, x.WebsiteId })
.Distinct()
// For each customer-website pair, seek to the top row
.Select(tuple => new
{
CustomerId = tuple.CustomerId,
WebsiteId = tuple.WebsiteId,
Sequence = this.DbContext.Orders
.Where(x => x.CustomerId == tuple.CustomerId && x.WebsiteId == tuple.WebsiteId)
.OrderByDescending(x => x.CustomerId).ThenByDescending(x => x.WebsiteId).ThenByDescending(x => x.Sequence) // Enforce index and direction
.First().Sequence)
}
// Join the entity
.Join(
this.DbContext.Orders,
groupMax => groupMax,
instance => new { instance.CustomerId, instance.WebsiteId, instance.Sequence, },
(groupMax, instance) => instance); EF produces roughly the following T-SQL: SET STATISTICS PROFILE ON -- Help check index usage
SELECT [o].*
FROM (
SELECT DISTINCT [CustomerId], [WebsiteId]
FROM [Orders]
) AS [max]
INNER JOIN [Orders] AS [o]
ON [o].[CustomerId] = [max].[CustomerId]
AND [o].[WebsiteId] = [max].[WebsiteId]
AND [o].[Sequence] =
(
SELECT TOP(1) [o1].[Sequence]
FROM [Orders] AS [o1]
WHERE [o1].[CustomerId] = [max].[CustomerId]
AND [o1].[WebsiteId] = [max].[WebsiteId]
ORDER BY [o1].[CustomerId] DESC, [o1].[WebsiteId] DESC, [o1].[Sequence] DESC
) |
@Timovzl I can't reproduce what you're saying here - maybe there's some missing information (in this kind of issue it's always good to post exact schema, exact LINQ queries and SQLs, the exact data you seeded with, and the exact SQL plan that comes out). Using the console program below to create the schema and seed (5 customers with 10000 orders each), I got the plan for your first SQL: SET STATISTICS PROFILE ON;
SELECT [o].*
FROM (
SELECT MIN(Id) AS [Id]
FROM [Orders]
GROUP BY [CustomerId]
) AS [max]
INNER JOIN [Orders] AS [o] ON [o].[Id] = [max].Id This gives me the following, which indeed goes through 50000 order rows:
Changing MAX to MIN gives me the same plan, still with 50000 rows and still with ORDERED FORWARD. The total subtree cost is 0.22964673 for both MAX and MIN variants. Stepping back, if we concentrate on your grouping subquery: SELECT MAX([o].[Id]) AS [c]
FROM [Orders] AS [o]
GROUP BY [o].[CustomerId] Doing the same on the 2nd SQL you posted yields the following: SELECT [o].*
FROM (
SELECT DISTINCT [CustomerId]
FROM [Orders]
) AS [max]
INNER JOIN [Orders] AS [o] ON [o].[Id] =
(
SELECT TOP(1) [o1].[Id]
FROM [Orders] AS [o1]
WHERE [o1].[CustomerId] = [max].[CustomerId]
ORDER BY [o1].[CustomerId] DESC, [o1].[Id] DESC
);
This still uses the same index, ORDERED FORWARD, and processes 50000 rows. The total subtree cost is 0.21931484, which is indeed better than the above, but doesn't seem to be significant in the way you said above (processing all rows vs. only one per customer). I'm guessing I've somehow set things up differently than you, so if you'd like to provide the full details I can look again. Repro programawait using var ctx = new BlogContext();
// Create and seed
// await ctx.Database.EnsureDeletedAsync();
// await ctx.Database.EnsureCreatedAsync();
//
// for (var i = 0; i < 5; i++)
// {
// var customer = new Customer
// {
// Orders = new List<Order>(Enumerable.Range(0, 10_000).Select(_ => new Order()))
// };
// ctx.Customers.Add(customer);
// }
//
// ctx.SaveChanges();
// Query1
_ = ctx.Orders
.GroupBy(x => x.CustomerId)
.Select(group => group.Max(x => x.Id))
.Join(ctx.Orders, id => id, instance => instance.Id, (id, instance) => instance)
.ToArray();
// Query2
_ = ctx.Orders
.Select(x => x.CustomerId)
.Distinct()
.Select(customerId => ctx.Orders
.Where(x => x.CustomerId == customerId)
.OrderByDescending(x => x.CustomerId).ThenByDescending(x => x.Id)
.First().Id)
.Join(ctx.Orders, id => id, instance => instance.Id, (id, instance) => instance)
.ToArray();
public class BlogContext : DbContext
{
public DbSet<Order> Orders { get; set; }
public DbSet<Customer> Customers { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
// .UseNpgsql(@"Host=localhost;Username=test;Password=test")
.LogTo(Console.WriteLine, LogLevel.Information)
.EnableSensitiveDataLogging();
}
public class Order
{
public int Id { get; set; }
public string? Name { get; set; }
public int CustomerId { get; set; }
public Customer Customer { get; set; }
}
public class Customer
{
public int Id { get; set; }
public string? Name { get; set; }
public List<Order> Orders { get; set; }
} |
I think the SQL Fiddle for SQL Server I provided before should have been to reproduce things, right? Ok, except for the LINQ queries, I admit. In any case, your repro is perfectly suitable. In your first posted execution plan, this line is the potential problem:
It will do a forward scan on the index. If one or many of the Customers whose Orders we're acquiring have loads of Orders, then we will be scanning all of those Orders until we find the max. At 10,000 Orders per Customer, we're traversing 10,000 rows per Customer instead of 1. This prevents the query from being linearithmic in the number of Customers queried (our asymptotic optimum). Your second posted execution plan actually solved that problem! 😄
For the pain point discussed above, we are now doing an index seek with So why, you may ask, is there still an index scan with
As a side note, when querying Orders for all Customers, we could have obtained the distinct CustomerIds more easily, by simply scanning the PK of the Customers table (which is likely far more compact than the Orders table). But I digress - that was not the scan that was affecting the time complexity. |
I've furthered my research on the above proposal for solving the groupwise min/max problem. I've managed to simplify it and confirm it to be efficient (even with composite keys and non-unique min/max values) on not just SQL Server, but also MySQL. Building Blocks An RDBMS need only support the following (fairly basic) combination of features:
Example To make things concrete, let's say we're finding the max (i.e. greatest/latest) Order per Customer. We choose groupwise-max because it tends to be harder than groupwise-min, needing reverse seeks to guarantee efficiency. SELECT (
-- Find ID of latest order per group
-- Using index { CustomerId, CreationDateTime, [Id] }
SELECT TOP 1 o.Id
FROM Orders o
WHERE o.CustomerId = groups.CustomerId -- Match the group
AND o.CreationDateTime < '2023-01-01' -- Optional condition (indexed)
AND o.IsRareExclusion = 0 -- Optional condition (non-indexed, i.e. scan until hit)
ORDER BY o.CustomerId DESC, o.CreationDateTime DESC, o.Id DESC
) AS MaxOrderId
-- Find relevant group keys
-- Using index { CustomerId, [...] }
FROM Orders groups
WHERE groups.CustomerId >= 500 -- Optional condition (indexed)
GROUP BY groups.CustomerId
-- From here, we can use a simple join to get the entire entity
-- This is advisable, so that any followup syntax does not intermingle with the above
; Or expressed in LINQ, and available to us already: this.DbContext.Orders
// Find each distinct customer with ID >= 500
.Where(x => x.CustomerId >= 500) // Optional condition (indexed)
.GroupBy(x => x.CustomerId)
// For each customer, find its latest order of before 2023
.Select(group => new
{
CustomerId = group.CustomerId,
MaxOrderId = this.DbContext.Orders
.Where(x => x.CustomerId == group.CustomerId) // Match the group
.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)) // Optional condition (indexed)
.Where(x => !x.IsRareExclusion) // Optional condition (non-indexed, i.e. scan until hit)
// Using index { CustomerId, CreationDateTime, [Id] }
.OrderByDescending(x => x.CustomerId)
.ThenByDescending(x => x.CreationDateTime)
.ThenByDescending(x => x.Id)
.First().Id,
})
// From here, we can use a simple join to get the entire entity
// This is advisable, so that any followup syntax does not intermingle with the above The LINQ equivalent with this.DbContext.Orders
// Find each distinct customer with ID >= 500
.Where(x => x.CustomerId >= 500) // Optional condition (indexed)
.GroupBy(x => x.CustomerId)
// For each customer, find its latest order of before 2023
.Select(group => group
.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)) // Optional condition (indexed)
.Where(x => !x.IsRareExclusion) // Optional condition (non-indexed, i.e. scan until hit)
.MaxBy(x => x.CreationDateTime)); // Or explicitly: new { x.CreationDateTime, x.Id } Or, in a minimal example: var latestOrderPerCustomer = await this.DbContext.Orders
.GroupBy(x => x.CustomerId)
.Select(group => group.MaxBy(x => x.CreationDateTime))
.ToListAsync(); Performance As indicated by the execution plan explanations and confirmed on large data sets, the resulting plans are asymptotically optimal. A reverse seek (for groupwise-max) starts at the end of each group, whereas a forward seek (for groupwise-min) starts at the front. We always find the target (or lack thereof) without scanning irrelevant data. This is important, because groups may be very large, such as when a customer can have millions of orders. Note that MySQL 8 correctly indicates Explanation
MinBy/MaxBy Without GroupBy Implementing Further Research I have not yet investigated how to query the top N (instead of top 1) items per group. It's probably beyond the scope of the current issue. For future reference, one approach that comes to mind is using two dependent subqueries to add two columns to the row: one to get the first matching item per group and one to get the last one. Then, a join could fetch all rows between each min-max pair. Let me know if anything else is missing. |
@Timovzl thanks for the above research, it looks very helpful! To set expectations, we're currently heads-down in an intensive EF 8.0 work cycle, finishing feature work for rc1 and starting stabilization. It will be some time before we're able to look at this. What can be really helpful is if you could share your exact testing/benchmarking scenarios above - a snippet creating the database schema dn seeding it database with data, as well as the concrete query SQL you're proposing here as the MinBy/MaxBy translations. |
Sure, no expectations about the timeline! As for testing/benchmarking code, I'll do my best. The concrete usage and queries I can already propose. |
Implementation ProposalIdeally, we will translate the Scenario 1: Ungrouped MaxByTranslate Simplest formUsing index // Usage
this.DbContext.Orders
.MaxBy(x => x.CreationDateTime); // Proposed translation to lower expressions
this.DbContext.Orders
.OrderByDescending(x => x.CreationDateTime)
.Take(1); -- Approximation of expected SQL
SELECT TOP(1) o.*
FROM Orders o
ORDER BY o.CreationDateTime DESC With complexitiesThe actual translation still only has to deal with the tuple in Using index // Usage
this.DbContext.Orders
.Where(x => !x.IsDeleted)
.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)
.MaxBy(x => new { x.IsDeleted, x.CreationDateTime, x.Id }) // Tip for user: Include x.IsDeleted to match index explicitly
.Select(x => x.Id); // Proposed translation to lower expressions
this.DbContext.Orders
.Where(x => !x.IsDeleted)
.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)
.OrderByDescending(x => x.IsDeleted)
.ThenByDescending(x => x.CreationDateTime)
.ThenByDescending(x => x.Id)
.Take(1)
.Select(x => x.Id); -- Approximation of expected SQL
SELECT TOP(1) o.Id
FROM Orders o
WHERE o.IsDeleted = 0
AND o.CreationDateTime < '2023-01-01'
ORDER BY o.IsDeleted DESC, o.CreationDateTime DESC, o.Id DESC Scenario 2: Grouped MaxByThis one is better understood from the code, but here is the theory: Recognize a Simplest formUsing index // Usage
this.DbContext.Orders
.GroupBy(x => x.CustomerId)
.Select(group => group.MaxBy(x => x.CreationDateTime)); // Proposed translation to lower expressions
this.DbContext.Orders
.GroupBy(x => x.CustomerId)
.Select(group => this.DbContext.Orders
.Where(x => x.CustomerId == group.CustomerId)
.OrderByDescending(x => x.CustomerId)
.ThenByDescending(x => x.CreationDateTime)
.First().Id)
.Join(this.DbContext.Orders, id => id, instance => instance.Id, (id, instance) => instance); -- Approximation of expected SQL
SELECT o.*
FROM (
SELECT (
SELECT TOP(1) o.Id
FROM Orders o
WHERE o.CustomerId = groups.CustomerId
ORDER BY o.CustomerId DESC, o.CreationDateTime DESC
) AS MaxId
FROM Orders groups
GROUP BY groups.CustomerId
) AS Maxes
INNER JOIN Orders o ON o.Id = Maxes.MaxId
; Having the translation always emit a left-complete ordering (including the Notably, in With complexitiesTo maximize complexity, we'll use a composite group key ( Using index // Usage
this.DbContext.Orders
.Where(x => x.CustomerId > 1000)
.GroupBy(x => new { x.CustomerId, ShopId })
.Select(group => group
.Where(x => !x.IsDeleted)
.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01))
.Where(x => !x.IsRareExclusion) // Scan over rare exclusions when finding group max (non-indexed)
.MaxBy(x => new { x.IsDeleted, x.CreationDateTime })); // Proposed translation to lower expressions
this.DbContext.Orders
.Where(x => x.CustomerId > 1000)
.GroupBy(x => new { x.CustomerId, ShopId })
.Select(group => this.DbContext.Orders
.Where(x => !x.IsDeleted) // User condition (indexed)
.Where(x => x.CreationDateTime < new DateTime(2023, 01, 01)) // User condition (indexed)
.Where(x => !x.IsRareExclusion) // User condition (non-indexed)
.Where(x => x.CustomerId == group.CustomerId && x.ShopId == group.ShopId) // Group condition
.OrderByDescending(x => x.CustomerId)
.ThenByDescending(x => x.ShopId)
.ThenByDescending(x => x.IsDeleted)
.ThenByDescending(x => x.CreationDateTime)
.First().Id)
.Join(this.DbContext.Orders, id => id, instance => instance.Id, (id, instance) => instance); -- Approximation of expected SQL
SELECT o.*
FROM (
SELECT (
SELECT TOP(1) o.Id
FROM Orders o
WHERE o.IsDeleted = 0
AND o.CreationDateTime < '2023-01-01'
AND o.IsRareExclusion = false
AND o.CustomerId = groups.CustomerId AND o.ShopId = group.ShopId
ORDER BY o.CustomerId DESC, o.ShopId DESC, o.IsDeleted DESC, o.CreationDateTime DESC
) AS MaxId
FROM Orders groups
WHERE groups.CustomerId > 1000
GROUP BY groups.CustomerId
) AS Maxes
INNER JOIN Orders o ON o.Id = Maxes.MaxId
; |
Thanks @Timovzl! |
I'm using .Net 6 and the new MinBy/MaxBy methods are great. However, there's no default translation to SQL for them - despite this being seemingly a very natural thing to be able to do.
I can do this:
db.SomeDbSet.ToList().MinBy( x => x.SomeField )
but obviously the problem with that is that it's going to have to load the entire set into memory, and won't use any of the DB indexes etc, so will be slow.Would be great if this could be added before EFCore 6 is released (or as an early preview for EFCore 7).
I'm using the SQLite provider, if it makes a difference.
The text was updated successfully, but these errors were encountered: