Description
PROVIDERS BEWARE:
Linq translation for methods Contains, EndsWith and StartsWith that we have in the Relational package uses LIKE operator, which may return incorrect results if the value parameter (what we are searching for) contains wildcard characters, e.g. '%' or '_'.
This issue addresses SqlServer and Sqlite providers, but all other providers will still use the old translation. Each provider that can be affected by this should implement their own MethodCallTranslators for Contains, EndsWith and StartsWith.
Currently in EF7 a LINQ query like this:
var things = Things.Where(t => t.Name.StartsWith("a"));
Gets translated to SQL like this (note that I am simplifying the query and expanding parameter values for clarity):
SELECT * FROM Things WHERE Name LIKE 'a%' ;
However, in order to return correct results, a LINQ query like this:
var underscoreAThings = Things.Where(t => t.Name.StartsWith("_a"));
Should be translated to SQL like this:
SELECT * FROM Things WHERE Name LIKE '~_a%' ESCAPE '~';
The escaping accounts for SQL wildcard characters in the input string which should not be treated as wildcards (we can add a separate Like()
method for passing patterns, but that belongs in a separate work item).
When the input string is store correlated (e.g. is another column in the database instead of parameter or a literal in the query) using LIKE in the translation correctly becomes more difficult, e.g. it would be hard to perform the required escaping in SQL.
In general for cases in which LIKE doesn't work well we can fall back to alternative translations that don't rely on LIKE, e.g. for String.StartsWith()
:
var underscoreAThings = Things.Where(t => t.Name.StartsWith(t.Prefix));
SELECT * FROM Things WHERE CHARINDEX(Prefix, Name) = 1 OR Prefix='';
Note that CHARINDEX()
won't match an empty string but String.StartsWith("")
always return true
, that's why we add the Prefix ='' condition.
The main disadvantage of this translation is that it is not sargable. That can be addressed with a hybrid translation, e.g.:
SELECT * FROM Things WHERE Name LIKE Prefix+'%' AND (CHARINDEX(Prefix, Name) = 1 OR Prefix = '');
This should be quick to evaluate using an index because the LIKE condition should be able to take advantage of the index to produce fairly selective results and the second condition will filter out false positives returned by LIKE
.
Notice that this alternative removes the need to fiddle with the input value: we no longer need to escape wildcards because in the worse case they will produce false positive matches which the CHARINDEX() based condition will still be able to filter out.
Also notice that based on the current query caching design we wouldn't need to always produce this more complex translation. Instead, we could sniff into the argument of String.StartsWith()
and pivot on it to produce different translations, e.g.:
- If the value is opaque (i.e. it comes from the store) or if it contains a wildcard character, then produce the condition based on
CHARINDEX()
- If the value does not contain a wildcard character in the first position then we can emit the condition based on
LIKE
Similar approaches can be used for String.EndsWith()
and String.Contains()
. However for these methods LIKE does not really contribute to the performance since the beginning of the input value cannot be used to perform index lookups, so it should be ok to produce a translation that doesn't use LIKE at all.