Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EF Core 5.0 - FROM clause with Include + Skip/Take #24964

Open
spazioporto opened this issue May 24, 2021 · 5 comments
Open

EF Core 5.0 - FROM clause with Include + Skip/Take #24964

spazioporto opened this issue May 24, 2021 · 5 comments

Comments

@spazioporto
Copy link

Hi everyone,
I'm upgrading my ASP NET Core web application from .NET Core 2.1 to .NET Core 5.0.
I noticed a little difference in how my SQL queries are being generated from LINQ statements when I use Include() and Skip()/Take() together.

I have two tables: Shops and Companies - Every Shop is assigned to a single Company.
Those are the two table definitions:

public class Shop
{
   public int ShopID { get; set; }
   public string Code { get; set; }
   public string Name { get; set; }
  
   public int CompanyID { get; set; }
   public Company Company { get; set; }
}

public class Company
{
   public int CompanyID { get; set; }
   public string Code { get; set; }
   public string Name { get; set; }
   public string VATCode { get; set; }
}

... and this is the LINQ expression I am referring to:

var shopIQ = _context.Shops.Include(x => x.Company)
                           .OrderBy(x => x.Code)
                           .Skip(0).Take(20);

With EFCore 2.1 this was translated to this SQL Server query:

SELECT [x].[ShopID], [x].[Code], [x].[Name], [x].[CompanyID], [x.Company].[Code] AS [Code0], [x.Company].[Name] AS [Name0], [x.Company].[VATCode] AS [VATCode]
FROM [Shops] AS [x]
INNER JOIN [Companies] AS [x.Company] ON [x].[CompanyID] = [x.Company].[CompanyID]
ORDER BY [x].[Code]
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY

... but with EFCore 5.0 this is translated to:

SELECT [t].[ShopID], [t].[Code], [t].[Name], [t].[CompanyID], [c].[CompanyID], [c].[Code], [c].[Name], [c].[VATCode]
FROM (
   SELECT [s].[ShopID], [s].[Code], [s].[Name], [s].[CompanyID]
   FROM [Shops] AS [s]
   ORDER BY [s].[Code]
   OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY
) AS [t]
INNER JOIN [Companies] AS [c] ON [t].[CompanyID] = [c].[CompanyID]
ORDER BY [t].[Code]

which is indeed correct, but kinda weird to me.

Anyone knows why is this happening?

Thanks.

Provider and version information

EF Core version: 5.0.6
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 5.0
Operating system: Windows 10
IDE: Visual Studio 2019 16.9.4

@ErikEJ
Copy link
Contributor

ErikEJ commented May 24, 2021

The query pipeline was redesigned in release 3.

@spazioporto
Copy link
Author

Thanks for your quick response!
So, as far as I understand, this is an expected behaviour and I don't have to worry about it. Am I right?

@ajcvickers
Copy link
Member

Note from triage: moving this to the backlog. This is an optimization that can only be applied in specific cases. We will consider this, but only if the performance benefit is significant.

@felipesdias
Copy link

I had another problem involving this behavior. When dividing queries they are generating different results due to the ordering of the columns.

The innermost query did not sort the primary columns by default, resulting in different results due to SQL Server optimizations

// Default Behavior is SplitQuery

var query = _context.FromScenario<DbScheduledStopOccupation>()
                .Skip(0)
                .Take(30)
                .Include(o => o.LocationRailroad)
                .Include(o => o.ScheduledStopActivity).ThenInclude(o => o.ScheduledStopActivityTranslations)
                .OrderBy(o => o.ScheduledStopActivity.Code);

var result = query .ToList();
var result2 = query .AsSingleQuery().ToList();

result.Sum(x =>x.ScheduledStopActivity.ScheduledStopActivityTranslations.Count()) // 28
result2.Sum(x =>x.ScheduledStopActivity.ScheduledStopActivityTranslations.Count()) // 60

I analyzed the queries generated with splitquery and found the problem.

First Query:

SELECT [t].[YRZSSOID], [t].[YRZLORID], [t].[YRZOCCUP], [t].[YRZSCEID], [t].[YRZSSAID], [t].[YRZCTUSR], [t].[YRZCRTDT], [t].[YRZCTGMT], [t].[YRZLMUSR], [t].[YRZLMODT], [t].[YRZLMGMT], [y1].[YRDLORID], [y1].[YRDACTIV], [y1].[YRDAXRID], [y1].[YRDLCODE], [y1].[YRDGPVID], [y1].[YRDLTYPE], [y1].[YRDMPSGF], [y1].[YRDMPSGV], [y1].[YRDSCEID], [y1].[YRDCTUSR], [y1].[YRDCRTDT], [y1].[YRDCTGMT], [y1].[YRDLMUSR], [y1].[YRDLMODT], [y1].[YRDLMGMT], [t].[YRWSSAID], [t].[YRWSCODE]
FROM (
      SELECT [y].[YRZSSOID], [y].[YRZLORID], [y].[YRZOCCUP], [y].[YRZSCEID], [y].[YRZSSAID], [y0].[YRWSSAID], [y0].[YRWSCODE], [y].[YRZCTUSR], [y].[YRZCRTDT], [y].[YRZCTGMT], [y].[YRZLMUSR], [y].[YRZLMODT], [y].[YRZLMGMT]
      FROM [YRZSSOCT] AS [y]
         INNER JOIN [YRWSSACT] AS [y0] ON [y].[YRZSSAID] = [y0].[YRWSSAID]
      WHERE [y].[YRZSCEID] = 0
      ORDER BY [y0].[YRWSCODE] DESC OFFSET 0 FETCH NEXT 30 ROWS ONLY
   ) AS [t]
   LEFT JOIN [YRDLORAT] AS [y1] ON [t].[YRZLORID] = [y1].[YRDLORID]
ORDER BY [t].[YRWSCODE] DESC,
   [t].[YRZSSOID],
   [t].[YRWSSAID],
   [y1].[YRDLORID]

Second Query:

SELECT [y2].[YSWLANGU], [y2].[YSWSSAID], [y2].[YSWDESCR], [t].[YRZSSOID], [t].[YRWSSAID], [y1].[YRDLORID]
FROM (
      SELECT [y].[YRZSSOID], [y].[YRZLORID], [y0].[YRWSSAID], [y0].[YRWSCODE]
      FROM [YRZSSOCT] AS [y]
         INNER JOIN [YRWSSACT] AS [y0] ON [y].[YRZSSAID] = [y0].[YRWSSAID]
      WHERE [y].[YRZSCEID] = 1
      ORDER BY [y0].[YRWSCODE] DESC OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
   ) AS [t]
   LEFT JOIN [YRDLORAT] AS [y1] ON [t].[YRZLORID] = [y1].[YRDLORID]
   INNER JOIN [YSWSTRAT] AS [y2] ON [t].[YRWSSAID] = [y2].[YSWSSAID]
ORDER BY [t].[YRWSCODE] DESC,
   [t].[YRZSSOID],
   [t].[YRWSSAID],
   [y1].[YRDLORID]

The problem is because the column "ScheduledStopActivity.Code" (YRWSCODE) repeats several times due to left join, as the SELECT's of the first and second query are different, SQLServer may query more or less data and return different results due to the ordering of the YRWSCODE column , in this case the orderBy of the primary key columns should be included in the innermost query, as in the outermost queries.

Like this:

     ORDER BY [y0].[YRWSCODE] DESC, [y].[YRZSSOID] OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY

Complete query:

SELECT [y2].[YSWLANGU], [y2].[YSWSSAID], [y2].[YSWDESCR], [t].[YRZSSOID], [t].[YRWSSAID], [y1].[YRDLORID]
FROM (
      SELECT [y].[YRZSSOID], [y].[YRZLORID], [y0].[YRWSSAID], [y0].[YRWSCODE]
      FROM [YRZSSOCT] AS [y]
         INNER JOIN [YRWSSACT] AS [y0] ON [y].[YRZSSAID] = [y0].[YRWSSAID]
      WHERE [y].[YRZSCEID] = 1
      ORDER BY [y0].[YRWSCODE] DESC, [y].[YRZSSOID] OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
   ) AS [t]
   LEFT JOIN [YRDLORAT] AS [y1] ON [t].[YRZLORID] = [y1].[YRDLORID]
   INNER JOIN [YSWSTRAT] AS [y2] ON [t].[YRWSSAID] = [y2].[YSWSSAID]
ORDER BY [t].[YRWSCODE] DESC,
   [t].[YRZSSOID],
   [t].[YRWSSAID],
   [y1].[YRDLORID]

@smitpatel
Copy link
Contributor

@felipesdias - First the issue you are running into is not same as OP. The OP's scenario has same results in both cases and translation in single query mode only.

Secondly, the way you are describing mismatch in results it is likely dotnet/EntityFramework.Docs#3242. An unstable ordering with generate wrong results when using split query. If the ordering on Code is stable i.e. each row has unique code then file a new issue with runnable repro code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants