Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy generates invalid SQL when using custom database function #29638

Closed
andreikarkkanen opened this issue Nov 21, 2022 · 5 comments · Fixed by #30617
Closed

GroupBy generates invalid SQL when using custom database function #29638

andreikarkkanen opened this issue Nov 21, 2022 · 5 comments · Fixed by #30617
Assignees
Labels
area-groupby area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported easy-for-smit Easy query bugs Servicing-approved type-bug
Milestone

Comments

@andreikarkkanen
Copy link

When executing the following query an exception is thrown

var orderIds = new [] { Guid.NewGuid(), Guid.NewGuid() };

var orders = dbContext.Orders
    .Where(c => dbContext.AsQueryable(orderIds).Contains(c.OrderId))
    .Select(c => new
    {
        c.Project.Code,
        c.Project.Revenue
    })
    .GroupBy(c => new
    {
        c.Code
    })
    .Select(c => new
    {
        c.Key.Code,
        Sum = c.Sum(e => e.Revenue)
    }).ToList();

Microsoft.Data.SqlClient.SqlException (0x80131904): The multi-part identifier "s.Value" could not be bound.

exec sp_executesql N'SELECT [p].[Code], (
    SELECT COALESCE(SUM([p1].[Revenue]), 0.0)
    FROM [Orders] AS [o0]
    INNER JOIN [Projects] AS [p0] ON [o0].[ProjectId] = [p0].[ProjectId]
    INNER JOIN [Projects] AS [p1] ON [o0].[ProjectId] = [p1].[ProjectId]
    WHERE EXISTS (
        SELECT 1
        FROM STRING_SPLIT(@__source_1, @__separator_2) AS [s0]
        WHERE CONVERT(UNIQUEIDENTIFIER, [s0].[Value]) = [o0].[OrderId]) AND [p].[Code] = [p0].[Code]) AS [Sum]
FROM [Orders] AS [o]
INNER JOIN [Projects] AS [p] ON [o].[ProjectId] = [p].[ProjectId]
WHERE EXISTS (
    SELECT 1
    FROM STRING_SPLIT(@__source_1, @__separator_2) AS [s0]
    WHERE CONVERT(UNIQUEIDENTIFIER, [s].[Value]) = [o].[OrderId])
GROUP BY [p].[Code]',N'@__source_1 nvarchar(4000),@__separator_2 nvarchar(4000)',@__source_1=N'7bd14254-eac8-413d-9328-2fda4725b690,e624c495-55dd-436a-8c62-6a7e49c8435d',@__separator_2=N','

The incorrect part is

WHERE CONVERT(UNIQUEIDENTIFIER, [s].[Value]) = [o].[OrderId]

The .AsQueriable() solves a cache plan pollution problem. The full solution is in the attachment.

ConsoleApp1.zip

Or

using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query.SqlExpressions;

var optionsBuilder = new DbContextOptionsBuilder<TestContext>();
optionsBuilder.UseSqlServer("Server=TestDb;Database=CustomerDb;Trusted_Connection=True;Encrypt=False;");

using var dbContext = new TestContext(optionsBuilder.Options);
dbContext.Database.EnsureDeleted();
dbContext.Database.EnsureCreated();

var orderIds = new [] { Guid.NewGuid(), Guid.NewGuid() };

var orders = dbContext.Orders
    .Where(c => dbContext.AsQueryable(orderIds).Contains(c.OrderId))
    .Select(c => new
    {
        c.Project.Code,
        c.Project.Revenue
    })
    .GroupBy(c => new
    {
        c.Code
    })
    .Select(c => new
    {
        c.Key.Code,
        Sum = c.Sum(e => e.Revenue)
    }).ToList();

public class Project
{
    public Guid ProjectId { get; set; }
    public decimal Revenue { get; set; }
    public string Code { get; set; }
}

public class Order
{
    public Guid OrderId { get; set; }
    public Guid ProjectId { get; set; }

    public Project Project { get; set; }
}

public class TestContext : DbContext
{
    public TestContext(DbContextOptions options)
        : base(options)
    {
    }

    public DbSet<Project> Projects { get; set; }
    public DbSet<Order> Orders { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.ConfigureDbFunctions();
    }

    [Keyless]
    private class StringSplitResult
    {
        public string Value { get; set; }
    }
        
    [DbFunction(IsBuiltIn = true, Name = "STRING_SPLIT")]
    private IQueryable<StringSplitResult> Split(string source, string separator)
        => FromExpression(() => Split(source, separator));

    public IQueryable<Guid> AsQueryable(IEnumerable<Guid> source)
        => Split(string.Join(",", source.Select(x => Convert.ToString(x))), ",")
            .Select(s => DbFunctionExtensions.ToGuid(s.Value)!.Value);
}

public static class DbFunctionExtensions
{
    public static Guid? ToGuid(string value) => throw new NotImplementedException();

    public static void ConfigureDbFunctions(this ModelBuilder modelBuilder)
	{
		var type = typeof(DbFunctionExtensions);

        modelBuilder.HasDbFunction(type.GetMethod(nameof(ToGuid))!)
			.HasTranslation(t => new SqlFunctionExpression(
				functionName: "CONVERT",
				arguments: new[] { new SqlFragmentExpression("UNIQUEIDENTIFIER"), t[0] },
				nullable: true,
				argumentsPropagateNullability: new[] { false, true },
				typeof(Guid),
				typeMapping: null))
			.IsBuiltIn();
    }
}

Environment details:
EF Core: 7.0.0
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 7.0
Operating system: Windows 10
IDE: Visual Studio 2022 17.4.1

@roji
Copy link
Member

roji commented Nov 21, 2022

Confirmed that this repros; not a regression (failed in EF Core 6.0 as well).

Somewhat simplified repro:

await using var ctx = new BlogContext();
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();

var orderIds = new [] { "id1", "id2" };

_ = ctx.Orders
    .Where(c => ctx
        .Split(string.Join(",", orderIds), ",")
        .Select(s => s.Value)
        .Contains(c.OrderId))
    .Select(c => new
    {
        c.Project.Code,
        c.Project.Revenue
    })
    .GroupBy(c => new { c.Code })
    .Select(c => new
    {
        c.Key.Code,
        Sum = c.Sum(e => e.Revenue)
    }).ToList();

public class BlogContext : DbContext
{
    public DbSet<Project> Projects { get; set; }
    public DbSet<Order> Orders { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    [Keyless]
    public class StringSplitResult
    {
        public string Value { get; set; }
    }

    [DbFunction(IsBuiltIn = true, Name = "STRING_SPLIT")]
    public IQueryable<StringSplitResult> Split(string source, string separator)
        => FromExpression(() => Split(source, separator));
}

public class Project
{
    public Guid ProjectId { get; set; }
    public decimal Revenue { get; set; }
    public string Code { get; set; }
}

public class Order
{
    public string OrderId { get; set; }
    public Guid ProjectId { get; set; }

    public Project Project { get; set; }
}

SQL:

      SELECT [p].[Code], (
          SELECT COALESCE(SUM([p1].[Revenue]), 0.0)
          FROM [Orders] AS [o0]
          INNER JOIN [Projects] AS [p0] ON [o0].[ProjectId] = [p0].[ProjectId]
          INNER JOIN [Projects] AS [p1] ON [o0].[ProjectId] = [p1].[ProjectId]
          WHERE EXISTS (
              SELECT 1
              FROM STRING_SPLIT(@__Join_1, N',') AS [s0]
              WHERE [s0].[Value] = [o0].[OrderId]) AND [p].[Code] = [p0].[Code]) AS [Sum]
      FROM [Orders] AS [o]
      INNER JOIN [Projects] AS [p] ON [o].[ProjectId] = [p].[ProjectId]
      WHERE EXISTS (
          SELECT 1
          FROM STRING_SPLIT(@__Join_1, N',') AS [s0]
          WHERE [s].[Value] = [o].[OrderId]) -- [s] looks like it should be [s0
      GROUP BY [p].[Code]

We seem to be generating the incorrect alias in the WHERE subquery ([s] instead of [s0]).

As a side-note, it seems like the whole subquery shouldn't be there in the projection, since the TVF check is only a filter (so should be in WHERE only)?

@superjulius
Copy link

@andreikarkkanen We are facing the same issue on one query with GroupBy. Others do not seem affected.

While not being ideal, @maumar provided a workaround by adding a Distinct. As he said, this is hacky but it is fixing our query and we may go that route for the time being.

@maumar
Copy link
Contributor

maumar commented Mar 31, 2023

STRING_SPLIT(@__Join_1, N',') AS [s] from the projection and the predicate are the same object. Discrepancy happens in the ApplyProjection.

We have this query expression:

        Projection Mapping:
            Code -> p.Code
            Sum -> (
                SELECT SUM(p0.Revenue)
                FROM Orders AS o
                INNER JOIN Projects AS p ON o.ProjectId == p.ProjectId
                INNER JOIN Projects AS p0 ON o.ProjectId == p0.ProjectId
                WHERE EXISTS (
                    SELECT 1
                    FROM STRING_SPLIT(@__Join_1, N',') AS s
                    WHERE s.Value == o.Order29638Id) && (p.Code == p.Code))
        SELECT 1
        FROM Orders AS o
        INNER JOIN Projects AS p ON o.ProjectId == p.ProjectId
        WHERE EXISTS (
            SELECT 1
            FROM STRING_SPLIT(@__Join_1, N',') AS s
            WHERE s.Value == o.Order29638Id)
        GROUP BY p.Code

so we have this SelectExpression

        SELECT 1
        FROM Orders AS o
        INNER JOIN Projects AS p ON o.ProjectId == p.ProjectId
        WHERE EXISTS (
            SELECT 1
            FROM STRING_SPLIT(@__Join_1, N',') AS s
            WHERE s.Value == o.Order29638Id)
        GROUP BY p.Code

and try to apply this into the projection:

SELECT SUM(p0.Revenue)
                FROM Orders AS o
                INNER JOIN Projects AS p ON o.ProjectId == p.ProjectId
                INNER JOIN Projects AS p0 ON o.ProjectId == p0.ProjectId
                WHERE EXISTS (
                    SELECT 1
                    FROM STRING_SPLIT(@__Join_1, N',') AS s
                    WHERE s.Value == o.Order29638Id) && (p.Code == p.Code))

since the alias s is already used in the outer expression (by the STRING_SPLIT also) we try to uniquify aliases in the SelectExpression we are about to apply. But since STRING_SPLIT(@__Join_1, N',') AS s are the same instance in the projection and the outer select, by changing alias in the projection we also change alias in the outer select.

@maumar
Copy link
Contributor

maumar commented Apr 4, 2023

problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.

maumar added a commit that referenced this issue Apr 4, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Apr 4, 2023
maumar added a commit that referenced this issue Apr 4, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
maumar added a commit that referenced this issue Apr 5, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
maumar added a commit that referenced this issue Apr 5, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
maumar added a commit that referenced this issue Apr 12, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
roji pushed a commit to roji/efcore that referenced this issue Apr 13, 2023
… database function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes dotnet#29638
maumar added a commit that referenced this issue Apr 14, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy. Also added defensive check that throws if we encounter TableExpressionBase that doesn't implement the cloning logic, so that this never happens again.

Fixes #29638
maumar added a commit that referenced this issue Apr 14, 2023
…ase function (#30617)

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem.
Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy. Also added defensive check that throws if we encounter TableExpressionBase that doesn't implement the cloning logic, so that this never happens again.

Fixes #29638
@ajcvickers ajcvickers modified the milestones: 8.0.0, 8.0.0-preview4 Apr 20, 2023
@davidnemeti
Copy link

davidnemeti commented Aug 31, 2023

@ajcvickers, @maumar: Is there any chance that this bug will be fixed in EF Core 7.0 too?

@ajcvickers ajcvickers reopened this Sep 5, 2023
@ajcvickers ajcvickers removed this from the 8.0.0-preview4 milestone Sep 5, 2023
maumar added a commit that referenced this issue Sep 14, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem. Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
maumar added a commit that referenced this issue Sep 14, 2023
…ase function

Problem is that CloningExpressionVisitor doesn't have proper handling for TableValuedFunctionExpression, and therefore goes through default expression visitor pattern (visit all children, check if there are any changes, if there are return new, if not return the same). Since there are no changes, the same instance is returned from cloning, and causes the problem. Fix is to add proper handling of TVFExpression in the CloningExpressionVisitor so that it produces a proper copy.

Fixes #29638
@ajcvickers ajcvickers added this to the 7.0.12 milestone Sep 20, 2023
@ajcvickers ajcvickers modified the milestones: 7.0.12, 7.0.x, 7.0.13 Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-groupby area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported easy-for-smit Easy query bugs Servicing-approved type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants