Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant inner join when I'm using GroupBy() #28622

Open
Tracked by #30173
verdysh opened this issue Aug 8, 2022 · 2 comments
Open
Tracked by #30173

Redundant inner join when I'm using GroupBy() #28622

verdysh opened this issue Aug 8, 2022 · 2 comments

Comments

@verdysh
Copy link

verdysh commented Aug 8, 2022

Hi!
Microsoft.EntityFrameworkCore.SqlServer, 6.0.5

Here is my EF query:

var query = db
	.ApplicationsControlMonitorEntriesAggregates.Where(
		a => a.TenantId == _tenantId && 
			a.ApplicationsControlAggregates.ActionDate >= request.StartPeriod &&
			a.ApplicationsControlAggregates.ActionDate <= request.FinishPeriod
	)
	.SelectMany(
		a => db.ProcessStatisticRules.Where(p => p.ProcessName == a.ApplicationsControlAggregates.ProcessCaption && p.TenantId == null).Select(psr => new { ProcessType = (int?)psr.ProcessType }).DefaultIfEmpty(),
		(a, psr) => new
		{
			ApplicationsControlMonitorEntriesAggregate = a,
			ProcessStatisticRuleGeneral = psr
		}
	)
	.SelectMany(
		a => db.ProcessStatisticRules.Where(p => p.ProcessName == a.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.ProcessCaption && p.TenantId == _tenantId).Select(psr => new { ProcessType = (int?)psr.ProcessType }).DefaultIfEmpty(),
		(a, psr) => new
		{
			ApplicationsControlMonitorEntriesAggregate = a.ApplicationsControlMonitorEntriesAggregate,
			ProcessStatisticRuleGeneral = a.ProcessStatisticRuleGeneral,
			ProcessStatisticRuleByTenant = psr
		}
	)
	.GroupBy(r => new
	{
		DeviceId = r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.Device.Id,
		DeviceName = r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.Device.Name,                    
		ProcessCaption = r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.ProcessCaption,
		FilterCaption = r.ApplicationsControlMonitorEntriesAggregate.Filter.Caption,
		FilterCostUsd = r.ApplicationsControlMonitorEntriesAggregate.Filter.CostUsd,
		FilterId = r.ApplicationsControlMonitorEntriesAggregate.Filter.Id,
		FilterIsSensitive = r.ApplicationsControlMonitorEntriesAggregate.Filter.SensitiveType == 1,
		FilterName = r.ApplicationsControlMonitorEntriesAggregate.Filter.Name,
		Belongings = r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.Belongings,
		WebApp = r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.WebApp,
		ProcessType =
					r.ProcessStatisticRuleByTenant.ProcessType != null ?
						r.ProcessStatisticRuleByTenant.ProcessType :
						r.ProcessStatisticRuleGeneral.ProcessType != null ?
							r.ProcessStatisticRuleGeneral.ProcessType :
							null
	})
	.Select(g => new ApplicationRiskRowDto
	{
		FilesCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.FilesCount),
		FilesEncryptedCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.FilesEncryptedCount),
		Filter = new ApplicationRiskContentFilter
		{
			Caption = g.Key.FilterCaption,
			CostUsd = g.Key.FilterCostUsd,
			Id = g.Key.FilterId,
			IsSensitive = g.Key.FilterIsSensitive,
			Name = g.Key.FilterName,
		},
		ApplicationsControlAggregate = new ApplicationsControlAggregateRow
		{
			FilesCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.FilesCount),
			FilesEncryptedCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.FilesEncryptedCount),
			SensitiveFilesCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.SensitiveFilesCount),
			ProcessCaption = g.Key.ProcessCaption,
			WebApp = g.Key.WebApp,
			Belongings = g.Key.Belongings,
			Device = new ApplicationRiskDevice
			{
				Id = g.Key.DeviceId,
				Name = g.Key.DeviceName
			},
			ProcessStatisticRule = new ApplicationRiskProcessStatisticRules
			{
				ProcessType = g.Key.ProcessType
			}
		}
	})
;

It turns into the next SQL query:

SELECT
   COALESCE(SUM([a].[FilesCount]), 0) AS [FilesCount],
   COALESCE(SUM([a].[FilesEncryptedCount]), 0) AS [FilesEncryptedCount],
   [c].[Caption],
   [c].[CostUSD] AS [CostUsd],
   [c].[Id],
   CASE
      WHEN
         [c].[SensitiveType] = 1 
      THEN
         CAST(1 AS bit) 
      ELSE
         CAST(0 AS bit) 
   END
   AS [IsSensitive], [c].[Name], COALESCE(SUM([a1].[FilesCount]), 0) AS [FilesCount], COALESCE(SUM([a1].[FilesEncryptedCount]), 0) AS [FilesEncryptedCount], COALESCE(SUM([a1].[SensitiveFilesCount]), 0) AS [SensitiveFilesCount], [a0].[ProcessCaption], [a0].[WebApp], [a0].[Belongings], [d].[Id], [d].[Name], 
   CASE
      WHEN
         [t0].[ProcessType] IS NOT NULL 
      THEN
         [t0].[ProcessType] 
      WHEN
         [t].[ProcessType] IS NOT NULL 
      THEN
         [t].[ProcessType] 
      ELSE
         NULL 
   END
   AS [ProcessType] 
FROM
   [ApplicationsControlMonitorEntriesAggregates] AS [a] 
   INNER JOIN
      [ApplicationsControlAggregates] AS [a0] 
      ON [a].[ApplicationsControlAggregatesId] = [a0].[Id] 
   LEFT JOIN
      (
         SELECT
            [p].[ProcessType],
            [p].[ProcessName] 
         FROM
            [ProcessStatisticRules] AS [p] 
         WHERE
            [p].[TenantId] IS NULL 
      )
      AS [t] 
      ON [a0].[ProcessCaption] = [t].[ProcessName] 
   LEFT JOIN
      (
         SELECT
            [p0].[ProcessType],
            [p0].[ProcessName] 
         FROM
            [ProcessStatisticRules] AS [p0] 
         WHERE
            [p0].[TenantId] = @___tenantId_3 
      )
      AS [t0] 
      ON [a0].[ProcessCaption] = [t0].[ProcessName] 
   INNER JOIN
      [Devices] AS [d] 
      ON [a0].[DeviceId] = [d].[Id] 
   INNER JOIN
      [ContentFilters] AS [c] 
      ON [a].[FilterId] = [c].[Id] 
   INNER JOIN
      [ApplicationsControlAggregates] AS [a1] 
      ON [a].[ApplicationsControlAggregatesId] = [a1].[Id] 
WHERE
   (
	([a].[TenantId] = @___tenantId_0) AND 
    ([a0].[ActionDate] >= @__request_StartPeriod_1)
   )
   AND 
   ( [a0].[ActionDate] <= @__request_FinishPeriod_2 )
GROUP BY
   [d].[Id],
   [d].[Name],
   [a0].[ProcessCaption],
   [c].[Caption],
   [c].[CostUSD],
   [c].[Id],
   CASE
      WHEN
         [c].[SensitiveType] = 1 
      THEN
         CAST(1 AS bit) 
      ELSE
         CAST(0 AS bit) 
   END
, [c].[Name], [a0].[Belongings], [a0].[WebApp], 
   CASE
      WHEN
         [t0].[ProcessType] IS NOT NULL 
      THEN
         [t0].[ProcessType] 
      WHEN
         [t].[ProcessType] IS NOT NULL 
      THEN
         [t].[ProcessType] 
      ELSE
         NULL 
   END

Query almost fully corresponds to my expectation except double join of [ApplicationsControlAggregates] table.

It seems this part of EF query:

FilesCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.FilesCount),
FilesEncryptedCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.FilesEncryptedCount),
SensitiveFilesCount = g.Sum(r => r.ApplicationsControlMonitorEntriesAggregate.ApplicationsControlAggregates.SensitiveFilesCount),

leads to second join. Because exactly these fields in SQL query are taken from this joined table:
COALESCE(SUM([a1].[FilesCount]), 0) AS [FilesCount], COALESCE(SUM([a1].[FilesEncryptedCount]), 0) AS [FilesEncryptedCount], COALESCE(SUM([a1].[SensitiveFilesCount]), 0) AS [SensitiveFilesCount]

Can I avoid this second join?

@ajcvickers
Copy link
Member

@verdysh Can you test this with EF Core 7.0 Preview 7? Significant changes have been made in this area.

@verdysh
Copy link
Author

verdysh commented Aug 15, 2022

@ajcvickers, I can't say for sure but it seems to me that something is broken.

I'm not sure my current EF query is the same as in start post, so I want to show it one more time to be more accuracy. Here is it.

Version 6

<PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="6.0.5">
<PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="6.0.5" />
<PackageReference Include="Microsoft.EntityFrameworkCore.Tools" Version="6.0.5">

Produced SQL query is here.

Version 7

<PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="7.0.0-preview.7.22376.2">
<PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="7.0.0-preview.7.22376.2" />
<PackageReference Include="Microsoft.EntityFrameworkCore.Tools" Version="7.0.0-preview.7.22376.2">

Produced SQL query is here.

What is got in version 7 looks a little scary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants