Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct string.Join or string.Concat causing SQL Syntax error #29845

Open
2767mr opened this issue Dec 13, 2022 · 4 comments
Open

Distinct string.Join or string.Concat causing SQL Syntax error #29845

2767mr opened this issue Dec 13, 2022 · 4 comments

Comments

@2767mr
Copy link

2767mr commented Dec 13, 2022

While trying to work around #29200 using groupBy we discovered that the combination of .Distinct() and string.Join or string.Concat causes a SQL Syntax error. This issue seems similar to #29638.

Example Query

Full project: https://github.com/2767mr/EFAggregateExample

db.Posts
    .GroupBy(p => p.Blog)
    .Select(g => new
    {
        Url = g.Key.Url,
        PostTitles = string.Join(", ", g.Select(p => p.Title).Distinct())
    })

Generated SQL:

SELECT [b].[Url], COALESCE(STRING_AGG(DISTINCT ([p].[Title]), N', '), N'') AS [PostTitles]
      FROM [Posts] AS [p]
      INNER JOIN [Blogs] AS [b] ON [p].[BlogId] = [b].[BlogId]
      GROUP BY [b].[BlogId], [b].[Url]

Stacktrace:

An exception occurred while iterating over the results of a query for context type 'BloggingContext'.
    Microsoft.Data.SqlClient.SqlException (0x80131904): Incorrect syntax near ','.
       at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__208_0(Task`1 result)
       at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
       at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
    --- End of stack trace from previous location ---
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
    --- End of stack trace from previous location ---
       at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
    ClientConnectionId:04983856-a507-4efe-a0ef-59bed7eff375
    Error Number:102,State:1,Class:15
    Microsoft.Data.SqlClient.SqlException (0x80131904): Incorrect syntax near ','.
       at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__208_0(Task`1 result)
       at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
       at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
    --- End of stack trace from previous location ---
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
    --- End of stack trace from previous location ---
       at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
       at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
    ClientConnectionId:04983856-a507-4efe-a0ef-59bed7eff375
    Error Number:102,State:1,Class:15

Provider and version information

EF Core version: 7.0.1
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 6.0
Operating system: Windows 11
IDE: Visual Studio 2022

@roji
Copy link
Member

roji commented Dec 14, 2022

Unfortunately, SQL Server's STRING_AGG doesn't seem to support DISTINCT (most other aggregate functions do):

DROP TABLE IF EXISTS [Blogs];
CREATE TABLE [Blogs] (
    [Id] int NOT NULL IDENTITY,
    [Category] int,
    [Name] nvarchar(max)
);
INSERT INTO Blogs (Category, Name) VALUES (1, 'foo'), (1, 'bar');

SELECT STRING_AGG(DISTINCT (b.Name), N', ')
FROM Blogs AS b
GROUP BY b.Category;

We should ideally fail translation rather than generate invalid SQL, but otherwise there's not much for us to do here...

@2767mr
Copy link
Author

2767mr commented Dec 15, 2022

It is possible to do this with OUTER APPLY though I am not sure if that's a good idea:

DROP TABLE IF EXISTS [Blogs];
CREATE TABLE [Blogs] (
    [Id] int NOT NULL IDENTITY,
    [Category] int,
    [Name] nvarchar(max)
);
INSERT INTO Blogs (Category, Name) VALUES (1, 'foo'), (1, 'bar'), (2, 'baz');

SELECT STRING_AGG(tmp.Name, N', ')
FROM Blogs AS b
OUTER APPLY (
	SELECT DISTINCT b2.Name AS Name 
  	FROM Blogs b2 
  	WHERE b2.Id = b.Id
) tmp
GROUP BY b.Category;

@roji
Copy link
Member

roji commented Dec 15, 2022

@2767mr yeah, that's definitely possible - but we generally don't do such complicated things in method translations.

Note that you can achieve a very similar thing via the following:

_ = ctx.Blogs
    .Select(b => new
    {
        b.Url,
        PostTitles = string.Join(", ", b.Posts.Select(p => p.Title).Distinct())
    })
    .ToList();

This starts with the Blogs instead of the Posts, thus avoiding the need for a GroupBy altogether. The SQL it generates indeed uses an OUTER APPLY:

SELECT [b].[Url], [b].[Id], [t].[Title]
FROM [Blogs] AS [b]
OUTER APPLY (
    SELECT DISTINCT [p].[Title]
    FROM [Posts] AS [p]
    WHERE [b].[Id] = [p].[BlogId]
) AS [t]
ORDER BY [b].[Id]

Importantly, EF doesn't translate to STRING_AGG in this context (aggregate function without groupby, #29200), so this is client-evaluated instead: the titles are pulled back from the database and joined via .NET string.Join. This isn't a problem in this particularly query (note that no extra data is being transferred as a result of the client eval), but does mean that you can't further compose LINQ operators on top of the query for execution in the database.

@ajcvickers
Copy link
Contributor

Note from triage: we should throw a better exception here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants