Query: Sargability of string literals in combination with non-Unicode columns #4686

divega · 2016-03-02T21:38:09Z

PR #4667 "fixed" issue #4622 by making all string literals Unicode in our SQL generation.

But it has been claimed that SQL Server cannot leverage indexes when comparing a Unicode literal or parameter against a non-Unicode column. For an example, see http://stackoverflow.com/questions/5828621/.

Rather than reopening #4622, I am creating a new issue to discuss the priority of sargability in this scenario on its own.

Whenever we decide to improve this we should verify in which scenarios the claim is valid and then we can avoid creating Unicode literals in those cases.

mikes-gh · 2016-03-11T15:54:27Z

Just came across this with latest vnext rc2 bits. Wondered why my query was suddenly so slow
In the following example CustSuppRef is varchar and indexed in SQL server 2008 r2

It was because
Generated SQL was

SELECT [c].[TransRef], [c].[CommentLineNo], [c].[Comments], [c].[TimeStamp]
FROM [CommentDetails] AS [c]
INNER JOIN (
    SELECT DISTINCT [t].[TransRef]
    FROM [Transactions] AS [t]
    WHERE [t].[CustSuppRef] IN (N'5802685') AND (([t].[TransCode] = 0) OR ([t].[TransCode] = 1))
) AS [t2] ON [c].[TransRef] = [t2].[TransRef]
ORDER BY [t2].[TransRef]

Taking 2 seconds

In SSMS I altered the query to

SELECT [c].[TransRef], [c].[CommentLineNo], [c].[Comments], [c].[TimeStamp]
FROM [CommentDetails] AS [c]
INNER JOIN (
    SELECT DISTINCT [t].[TransRef]
    FROM [Transactions] AS [t]
    WHERE [t].[CustSuppRef] IN ('5802685') AND (([t].[TransCode] = 0) OR ([t].[TransCode] = 1))
) AS [t2] ON [c].[TransRef] = [t2].[TransRef]
ORDER BY [t2].[TransRef

Taking a few miliseconds. (Note no N on literal '5802685')

So it looks like treating all string literals as Unicode will hurt performance in this scenario dramatically
I would change this from enhancement to bug please!!.

Due to CustSuppRef being varchar and indexed
Any workarounds in the meantime?

ErikEJ · 2016-03-11T15:59:45Z

Is custsuppref mapped AS varchar In Your model?

mikes-gh · 2016-03-11T16:25:15Z

  entity.Property(e => e.CustSuppRef)
                    .HasColumnType("varchar(15)");

Yes.

mikes-gh · 2016-03-11T16:52:57Z

Heres the code

  public ICollection<Invoice> GetInvoices(int[] invoiceNumber)
        {
            if (invoiceNumber == null) return null;

            string[] invoiceNoStrings = invoiceNumber.Select(x => x.ToString()).ToArray();
            ICollection<Models.Transactions> invoiceTransactions = _tradingCompanyDbContext.Transactions
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.InvTypeNavigation)
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.PaymentFreqNavigation)
                .Include(t => t.InvoiceDetails).ThenInclude(i => i.StockNoNavigation)
                .Include(t => t.DiscountDetails)
                .Include(t => t.CommentDetails)
                .Where(t => invoiceNoStrings.Contains(t.CustSuppRef)
                    && (t.TransCode == 0 || t.TransCode == 1))
                .AsNoTracking()
                .ToList();

            if (invoiceTransactions.Count != 0)
            {
                Models.CompanyDetails companyDetail = _tradingCompanyDbContext.CompanyDetails
                    .AsNoTracking()
                    .FirstOrDefault();

                return GetInvoiceViewModel(companyDetail, invoiceTransactions);
            }
            else
            {
                return null;
            }

        }

divega · 2016-03-11T18:38:25Z

Clearing up milestone based on the new data.

mikes-gh · 2016-03-11T18:59:59Z

BTW the generated sql I posted is just part of 4 queries generated by the code. They all exhibit the same issue.
So code that used to be almost instant now takes over 8 secs.

mikes-gh · 2016-03-14T07:16:16Z

Will this fix make it to rc2?
If I can be of any further help let me know

mikes-gh · 2016-03-16T15:22:41Z

@divega @ErikEJ Are there any workarounds.
I'm currently blocked because of this issue.

ErikEJ · 2016-03-16T18:21:01Z

Custom version of EFCore.Relational, or use FromSQL ?

gdoron · 2016-03-16T21:45:40Z

Can't EF check what DB type is the string being filtered, and if it's an NVARCHAR use unicode literal and if's VARCHAR is non-unicode literal?

And there should be something similar to EntityFunctions.AsNonUnicode in EF core that people can use to customize it for weird cases where this rule of thumb isn't good because the index is (for God knows why) VARCHAR and the column in the table is NVARCHAR.

mikes-gh · 2016-03-16T22:02:37Z

@rowanmiller
Are you saying (from the labels) no support for varchar columns with indexes until RTM #4667 broke this scenario very recently.
Varchar is still common place in database first senarios.

divega · 2016-03-16T22:29:03Z

Can't EF check what DB type is the string being filtered, and if it's an NVARCHAR use unicode literal and if's VARCHAR is non-unicode literal?

Yes, that is a good description of the fix that we want to implement.

Ideally we would recognize all C# relational operators and (as we already saw in @mikes-gh's example) usages of Enumerable.Contains() in both positive and negated form, between a property (for which we may know it is non-Unicode) and a literal, parameter or arbitrary expression including those (for which we only know it is a string), etc.

In practice I would expect the solution to do a reasonable effort to recognize those patterns and to leave the rest for some explicit way to specify that a string should be treated as non-Unicode (in EF 6 we had DbFunctions.AsNonUnicode() for that).

We can try to get something in soon for this but I doubt it will be ready for RC2 (which is rather close).

ajcvickers · 2016-03-17T16:57:17Z

@divega @rowanmiller Maybe we could do just the AsNonUnicode part for RC2 so that there is a reasonable workaround?

mikes-gh · 2016-03-17T23:01:49Z

@ajcvickers that would be great.
This is the first bug to block me with no easy workaround.

mikes-gh · 2016-03-17T23:41:00Z

This is interesting.
http://sqlblog.com/blogs/marco_russo/archive/2006/10/21/unicode-varchar-nvarchar-and-index-usage-in-sql-server.aspx

It seems we can supply a non-Unicode literal or parameter as a predicate against an nvarchar field and index can be used.
It just can't apply a Unicode literal as a predicate to a varchar indexed field.

Sounds like a strong argument for reverting #4667

divega · 2016-03-17T23:50:27Z

It seems we can supply a non-Unicode literal or parameter as a predicate against an nvarchar field and index can be used.

Even if I that is the case I don't think reverting #4667 is the right solution. That change actually fixed a bug. See #4781.

@ajcvickers Sounds good to me if that is viable.

mikes-gh · 2016-03-17T23:58:29Z

Ah sorry yes sometimes forgot the world is not all on the same alphabet.

mikes-gh · 2016-03-21T09:30:57Z

This make interesting reading

http://www.olcot.co.uk/sql-blogs/revised-difference-between-collation-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as

It appears if we use a windows collation (Latin1_General_CI_AS) on the database, index seek can still be used when comparing an nvarchar predicate to a varchar column.
My current collation is SQL_Latin1_General_CP1_CI which would cause an index scan on my large table hence the dramatic performance drop.

Why is this? The reason behind all of this is down to the actual differences between the two collations. The SQL_Latin1_General_CP1_CI_AS collation is a SQL collation and the rules around sorting data for unicode and non-unicode data are different. The Latin1_General_CI_AS collation is a Windows collation and the rules around sorting unicode and non-unicode data are the same. A Windows collation as per this example can still use an index if comparing unicode and non-unicode data albeit with a slight performance hit. A SQL collation cannot do this as shown above and comparing nvarchar data to varchar removes the ability to perform an index seek.

Of course this is not something to be done lightly but in my case I think it will be a good option as I see no downsides from moving form the SQL collation to the Wndows collation which seems to be the recommended approach anyway now.

I will report back.

mikes-gh · 2016-03-21T15:10:55Z

OK good news to report

SELECT [c].[TransRef], [c].[CommentLineNo], [c].[Comments], [c].[TimeStamp]
FROM [CommentDetails] AS [c]
INNER JOIN (
    SELECT DISTINCT [t].[TransRef]
    FROM [Transactions] AS [t]
    WHERE [t].[CustSuppRef] IN (N'4802685') AND (([t].[TransCode] = 0) OR ([t].[TransCode] = 1))
) AS [t2] ON [c].[TransRef] = [t2].[TransRef]
ORDER BY [t2].[TransRef]

Now runs similar times for N'4802685'' or '4802685''

So basically changing database, table and column collation from

SQL_Latin1_General_CP1_CI
to
Latin1_General_CI_AS
Helps with this issue. (Although generating correct sql depending on data type is the ultimate solution)

BTW My sql server installation is set to Latin1_General_CI_AS so any new databases would be this anyway.
The databases in question are quite old so the SQL collation has followed them from a previous install of SQL server.

Changing collation is a bit time consuming but for me its worth it as I now have a more future proof database. I can also still benefit from smaller index size of varchar for large 3mil plus row table and SQL Server now knows how to compare nvarchar to varchar efficiently.
EFCore now works for me again 😃

Hope this helps someone else.

smitpatel · 2016-03-31T22:43:06Z

#4937 added functionality to infer the Unicode-ness of string literal & parameters for common cases. (when one side of operator is Column/property).

divega · 2016-03-31T23:22:35Z

@smitpatel should we close this issue now? From my understanding of what you checked in, it doesn't seem that the explicit method is compelling anymore.

smitpatel · 2016-03-31T23:26:34Z

There are complex cases which may not be inferred easily which may need method.
one example is this disabled test.
https://github.com/aspnet/EntityFramework/blob/dev/test/Microsoft.EntityFrameworkCore.FunctionalTests/GearsOfWarQueryTestBase.cs#L1014
The issue is we can infer the Unicode-ness when one side is property and traverse the other side like that but we still do not have concept of Unicode-ness of return value. There can be other complex cases like that which may need explicit method.

We can track it in new issue or here only. Either way is fine with me

mikes-gh · 2016-04-01T21:44:16Z

General question... How do I know when this fix goes to cidev.
I would like to test my senario.

smitpatel · 2016-04-01T22:41:41Z

@mikes-gh - 20401 was successful build number which took this commit in. It should be in cidev packages now.

smitpatel · 2016-04-04T23:26:44Z

filed #4978 for complex cases. Closing this.

mikes-gh · 2016-04-05T21:46:22Z

@smitpatel thanks on hols at mo but will report back in this thread in acouple of days

mikes-gh · 2016-04-11T14:37:12Z

@smitpatel
ComfrimedTSQL now without 'N and optimal execution plan as a result from my example above

.Where(t => invoiceNoStrings.Contains(t.CustSuppRef)

Many thanks

mk9753 · 2017-05-03T16:28:59Z

I think I'm coming across something related to this issue...

There is a column set up as varchar:
modelBuilder.Entity<Quotes>().Property(s => s.StripInsuredCorporationName).HasColumnType("varchar(500)");

Query:
var tmp = (from q in context.Quotes where (q.StripInsuredCorporationName.Contains(strippedName)) select new { q.QuoteID });

The resulting SQL:
SELECT [q].[QuoteID] FROM [tblQuotes] AS [q] WHERE CHARINDEX(N'Test', [q].[StripInsuredCorporationName]) > 0

Since the column is defined as varchar, not sure what is causing the N to be there.

Thanks

divega · 2017-05-03T16:57:33Z

@mk9753 could you create a new issue, complete with information about the EF Core version you are using?

rowanmiller added this to the Backlog milestone Mar 7, 2016

rowanmiller added the type-enhancement label Mar 7, 2016

rowanmiller changed the title ~~Sargability of string literals in combination with non-Unicode columns~~ Query: Sargability of string literals in combination with non-Unicode columns Mar 7, 2016

divega removed this from the Backlog milestone Mar 11, 2016

rowanmiller added area-perf pri0 labels Mar 11, 2016

rowanmiller added this to the 1.0.0 milestone Mar 11, 2016

rowanmiller assigned smitpatel Mar 11, 2016

smitpatel mentioned this issue Mar 30, 2016

Infer literal/parameter's unicodeness from property #4937

Merged

divega mentioned this issue Mar 31, 2016

Update pipeline: Add tests for Sargability of non-Unicode string keys #4949

Closed

smitpatel closed this as completed Apr 4, 2016

smitpatel added the 2 - Done label Apr 4, 2016

smitpatel modified the milestones: 1.0.0-rc2, 1.0.0 Apr 4, 2016

azhoshkin mentioned this issue Apr 22, 2016

Contains method for strings does not add N before string literal in SQL query #5139

Closed

mk9753 mentioned this issue May 3, 2017

Varchar column sending unicode #8370

Closed

kdblocher mentioned this issue Nov 7, 2018

String.CompareTo emits unicode literal against varchar column #13906

Closed

shvmgpt116 mentioned this issue Jan 4, 2022

EF Core 3.1- Parameter does not follow the model definition used for the entity property #27106

Closed

ajcvickers modified the milestones: 1.0.0-rc2, 1.0.0 Oct 15, 2022

ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Oct 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query: Sargability of string literals in combination with non-Unicode columns #4686

Query: Sargability of string literals in combination with non-Unicode columns #4686

divega commented Mar 2, 2016

mikes-gh commented Mar 11, 2016

ErikEJ commented Mar 11, 2016

mikes-gh commented Mar 11, 2016

mikes-gh commented Mar 11, 2016

divega commented Mar 11, 2016

mikes-gh commented Mar 11, 2016

mikes-gh commented Mar 14, 2016

mikes-gh commented Mar 16, 2016

ErikEJ commented Mar 16, 2016

gdoron commented Mar 16, 2016

mikes-gh commented Mar 16, 2016

divega commented Mar 16, 2016

ajcvickers commented Mar 17, 2016

mikes-gh commented Mar 17, 2016

mikes-gh commented Mar 17, 2016

divega commented Mar 17, 2016

mikes-gh commented Mar 17, 2016

mikes-gh commented Mar 21, 2016

mikes-gh commented Mar 21, 2016

smitpatel commented Mar 31, 2016

divega commented Mar 31, 2016

smitpatel commented Mar 31, 2016

mikes-gh commented Apr 1, 2016

smitpatel commented Apr 1, 2016

smitpatel commented Apr 4, 2016

mikes-gh commented Apr 5, 2016

mikes-gh commented Apr 11, 2016

mk9753 commented May 3, 2017

divega commented May 3, 2017