-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: Consider translating String.Equals(String, StringComparison) for selected values of StringComparison #1222
Comments
Actually I just got a warning about String.Equals (without the third parameter) not being able to be translated and being evaluated locally ... |
After upgrade to 1.1.0, I use
|
@jeffiy can you please create a new issue and provide a repro? If this is a regression in 1.1 we may want to fix it in 1.1.1. |
Could this be implemented using Collation on SQL Server? Using We wouldn't know this is happening since this would be translated by EF internally and would get us by surprise. Since I would have no warning about "local evaluation" on this case, I would expect that my ORM is doing the right thing and not creating a performance problem. Doing a |
@divega Please, it is a must have. |
Still does not work in 2.0 |
I used |
Any update on this? |
Migrating to EFCore is becoming a show stopper for us because this issue. Any update @divega @rowanmiller ? |
@mgolois This issue is about |
@mgolois to be more precise:
|
Does equals usage with StringComparison at least produces a warning? |
@xperiandri - It does evaluation on client to produce correct results. |
@xperiandri and it produces a warning like this in the log:
Which can optionally be turned into an error as explained in the documentation. |
Maybe you would add Roslyn analyzer that will display info about that in design time? |
It is very good! |
so what exactly happens right now if I use Equals with OrdinalIgnoreCase ? |
Is there any plan to do this in the near future? I wasn't aware of the fact that it's not supported, saw the warning when I already used this pattern in several queries. Do you recommend changing it to |
@smitpatel let’s bring this to a design meeting. I would be ok with the overload with StringComparison started to throw in the default implementation in absence of client eval, but I would also be open to grandfathering it (via ignoring the StringComparison argument) if the general feeling is that it is preferable to mitigate the impact of the breaking change. I think it is preferable not to do the ToLower trick behind the users’ backs. They can still do that themselves explicitly, and there is still a chance that some of them will miss the fact that it isn’t sargable, but at least we didn’t do it without them asking. |
@smitpatel Besides, I believe many databases will default to case insensitive comparisons anyway, so what is harder to achieve is a case sensitive one. |
@BalassaMarton if you control the database schema it is preferable to specify a column collation that will compare string according to your preference. If you have a case insensitive collation and can’t change it but need a comparison that is case sensitive, you can do a regular ( I have tried other workarounds using functions that encapsulate comparisons with explicit collations, but SQL Server actually doesn’t support inline scalar functions, and non-inline functions aren’t sargable, so you end up needing to write a TVF. It is possible but complicated. |
Workaround: Instead of: Warning: I know this not contains invariant culture, but this is better than waiting for EF Core 6 or 7, staying on old EF 6 or rewriting all API consumers (if we gets rid of comparison and use only |
@Saibamen it's recommended to look into column collations rather than calling ToLower. With collations, an index defined over your column can be used. |
I think Pomelo.EntityFrameworkCore.MySql has already do it. |
In some provider such as PostgreSQL, I think |
@PMExtra this issue is about String.Equals, where PostgreSQL ILIKE does pattern matching with wildcards (like LIKE), so that doesn't seem right. You can already use collations with EF Core 5.0 to make PostgreSQL insensitive equality operations. |
@roji But we can use |
@PMExtra I'm not aware of a way to use LIKE/ILIKE without wildcards. If the pattern is constant, we can indeed escape all the wildcards (this is what we currently do for other cases), but not when the pattern is a parameter or column. More generally, I suggest following the discussing from the beginning to understand why I think the recently-introduced collations functionality is the better way to go. Regardless, if users prefer either TOLOWER or ILIKE, they can already express that themselves. |
The whole .NET And it doesn't solve the problem that users can write EF code that uses At the very least the EF Core team should add a Roslyn analyzer for |
FWIW this isn't about databases "being behind", but about these operations execute in a very different way in the database and in a regular program. The important point here is that databases natively use indexes, which is why the collation must be set up in advance in your schema, and queries need to match that collation; whereas a regular .NET program simply goes over a list of strings in memory, and can do whatever type of comparison it wants. This is precisely why StringComparison makes sense in a .NET program, but not when translating a query to SQL. Re the rest, there's absolutely no guarantee that queries which work on InMemory will work against a real database - this is very much by-design. The specific constructs and functions that are supported vary across databases (SQL Server, PostgreSQL and Sqlite support very different things), and InMemory simply cannot mimic a real database. There have been various requests in the past to have an analyzer which detects queries which would fail at runtime - unfortunately this simply isn't feasible for most cases. Specifically for your proposal, consider that StringComparison does make sense on some providers even if it doesn't on most relational databases: here's a proposal to translate it for Cosmos. In other words, the analyzer would have to somehow know - at compile-time - which provider is being targeted, and that's not really possible. At the end of the day, a robust test suite is the only way to be sure that your queries execute properly, and continue executing properly. |
Unnecessary. Simply write an analyzer that assumes that people are using a database that supports nothing and should therefore be warned about everything, make it an add-on NuGet package, and allow end-users to ignore the warnings that don't fit their DBMS and/or use-case using the mechanisms that exist in .NET today.
Most people would consider a test suite using the in-memory provider to be sufficiently robust, and yet it demonstrably isn't in this case. The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via |
That sounds like a pretty bad user experience, and would also not work for applications using multiple providers (which have different capabilities). And once again, predicting at compile-time whether a query will execute successfully is simply not feasible; for the specific case of StringComparison it does seem easy to detect the specific invocation, but in many other cases translatibility depends on more complex factors (what exact parameter is being used, where in the query is the construct/function being used...). And since translatibility can't be successfully predicted in the general case, users have to write tests anyway to ensure their code works. Stepping back, I understand the initial frustration of writing a query using StringComparison and getting an exception - but that's part of learning how an API works when first starting to use it. LINQ and queries are maybe a bit tougher, since they're an open space where query shapes are basically infinite. But if you try using StringComparison, we already include a very specific message pointing you to the doc page about collations - that seems reasonable enough. Not every issue in programming can be flagged at compile-time.
We very explicitly discourage testing EF Core applications with InMemory, for precisely this kind of thing. InMemory cannot and will never behave like your real database; either create a repository layer above EF and mock that (in which case you provide results directly and don't need InMemory), or test against your production database system (in which case you get full fidelity between test and production).
The difference is that the InMemory provider can support ToLower/ToUpper/StringComparison, whereas it cannot support transactions or raw SQL. And once again, ToLower/ToUpper/StringComparison may make sense in certain other production providers. |
But this particular one can, and relatively easily as you yourself have already agreed. "We can't do it for every case" should not imply "we won't do it for a single simple case", in the same way that perfect should not be the enemy of good enough.
Which is why you should be able to configure it. Default behaviour:
Someone like me using MSSQL in prod:
|
I disagree - this would give the illusion that InMemory does approximate the SQL Server behavior; if it catches this specific SQL Server incompatibility, the expectation/assumption would be that it catches in general. It's better to say loud and clear that "this provider does not behave like SQL Server", then to make it match SQL Server on one very small bit. Once again, users must properly test their queries on their production database system, so it's better not to give any illusions to the contrary.
This is not scalable, given the number of other little behavior differences between InMemory and any/all other provider. There would have to be hundreds/thousands of flags controlling every possible imaginable language construct and function. It also does not help users, since it puts the onus on them to tweak their InMemory configuration to match the SQL Server behavior. A solution that would require every user to fully research which function is supported on their provider, and tweak InMemory accordingly doesn't make sense to me. |
Some folks here may have thoughts on dotnet/roslyn-analyzers#6743 |
@dahlbyk thanks for the point, wrote a response there. |
Pulling over the most relevant part of @roji's comment:
Wondering if this is a problem for EF to solve vs individual providers. Or if EF provides the base Analyzer and providers can register their limitations somehow? |
Yeah, I'm not sure how this would look like. There's definitely a common functionality there across providers, e.g. to detect that something is within an expression in the first place. I suspect we'll have some common core/relational analyzer functionality which individual providers can leverage or something similar. |
I don't want this illusion. What I want is that when I write the C# expression that exactly matches the database behavior, it is translated, rather than triggering an error. Otherwise, even when what you are doing is possible with all the providers, it is impossible† to share a filter expression between logic running against databases, web APIs, and lists. I'm fine with making this a requirement on providers rather than translating it in EF Core itself, if EF Core doesn't have enough information to know what the C# equivalent of the current collation is. Perhaps the provider would be responsible for registering which StringComparison or CompareInfo, if any, matches each database collation. I would think the (EF or dotnet) platform should provide the shared functionality of recognizing which comparison an expression is using and/or translating the matching expression(s) to a simple Equals. † without spending a bunch of time writing custom expression visitors and properly hooking them up as a wrapper around the provider's built-in translation. |
In various cases, that's simply not possible. For example, you may be running your EF application against some database which EF did not create; we do not know the collation of the column you're querying, and so cannot know whether the comparison will be case-sensitive or not. EF cannot inspect the schema of each and every column in your database just in order to know what the collation is and how comparisons against that column will operate.
This is the (possibly unfortunate) reality: LINQ is a nice abstraction, but it is necessarily leaky because the query is executed differently based on where it's executed. There are many behavioral discrepancies between where queries execute, around case-sensitivity, null semantics, which functions can be translated, and many other factors. By design, EF does not attempt to gloss over and make queries execute the same wherever executed; that is mostly impossible, and where possible, can lead to significant perf degradation.
This is not just a provider problem, it's a collation problem, and different collations can be defined on different columns. For example, SQL Server by default uses a case-insensitive collation, but that can be overridden both at the database level and at the column level. |
I was under the impression EF already could know the collation of each and every column, because you can define it as shown in collations and case sensitivity. If that is not the case, I still think the platform should provide the functionality for identifying the "collation" used by a comparison expression, rather than having to rewrite that in each provider, some providers only support StringComparison, others only support CompareInfo from the locale, etc. Then I would send the request to allow matching collations to the team for the provider for MS SQL Server (and any other databases I'm using). If it is possible to specify collation in the model, but the developer has not, just throw an error that gives enough info for the developer to realize they need to specify it.
Yes, but it is also preventing the developer from doing it when it is possible, and at least in the case of specified collation, if only the one matching the model is allowed, it should not cause perf degradation. It's preventing the simplest, most common case too, when the only difference is default case-sensitivity, and even the simplest C# expression for case-insensitive compare is not allowed. I agree that unspecified/default collation ( |
As I wrote above, that's only if you're using EF to create the database. But you could be using EF to connect to an existing database that's managed externally. It's not practical for various reasons to ask EF to discover schema details about all columns you'll potentially be querying. An additional friction point here is that database collations can be very elaborate/complex, and don't necessarily correspond to .NET's StringComparison enum; to get an idea, see the SQL Server collations page (and remember that's just one database in what we have to support). Since StringComparison cannot capture the richness of the database collation, we really cannot say that the comparison will occur in the query as it would have occurred in .NET; this is frequently something that English-only devs miss (case sentivity just isn't enough: what about accent sensitivity, kana sensitivity...). And at the end of the day, say we implemented this; this would mean that you can start using e.g. EF translates from one domain (.NET/LINQ) to another (relational database/SQL); there are mismatches along the way, both between the source and target domains and between the different target domains. This is unfortunately something that devs must accept and embrace. |
Put another way, if a particular column is meant to match case-sensitively or not then very often the optimal way to specify that is in the database rather than scattered in the EF code. Unless you really like table scans. Thus my request to turn off the compiler warnings suggesting the |
Does adding On projects I've been working on, we've usually used the command line tool to scaffold the context, since we were not usually using migrations. That already exports information about collations, at least for some providers.
Yes, I expected that for some collations there would be no match for the database collation in .Net. I don't have statistics on global collation use, but I bet a very large fraction use one of
The dev really wouldn't have to go make sure of the column collation in many cases... if you know all your non-legacy databases are configured by default with a particular collation, you can just use that one and only worry about the specifics in special circumstances, with the added safety that you'll get an exception if you forget you are in a special circumstance.
It's not just discoverability of getting it wrong, it is the ability to even run the code that does it right. Such as writing a comparison expression that will work the same across databases and LINQ to objects. (Though perhaps only if you're able to change all of them to similar enough collations.) Would you say it would be an objectionable feature for EF to have? Or a feature that would be ok, but not important enough to allocate resources to? |
Adding UseCollation doesn't force you to use a migration - EF generally doesn't force you to migrations; you give it a connection string and it connects to that database. If you're managing your database externally, it simply doesn't take effect on the column collation. UseCollation may still affect certain things in the query pipeline, but the column collation is what you defined it to be when creating the database.
Not everyone does that.
The point is that if your databases are set up correctly, it makes zero difference whether you're using the StringComparison overload or the overload without StringComparison - you get the exact same behavior. The only thing that your proposal brings is for EF to throw when the database column is not set up correctly, in which case the dev would go and set it up correctly. That's why I can't see this being anything beyond a misconfiguration discoverability feature.
First of all, EF never does the sort of "automatic schema inspection" this feature would require, i.e. to know the collations of all columns in your database in advance in case they are queried (or to get that information on-demand when some column is queried). Aside from the considerable work this would require, this will very likely be a problematic approach for many reasons - so I'd object to doing it, especially for the very little value this brings. And second, as I wrote above, since things really do not correspond between .NET and the database on this, I really do think it's a better design for the LINQ query to be completely agnostic of collation details. IMO the right way here is for the user to express that "the column should be compared to something" without specifying anything about the how. This promotes understanding of how things are actually working under the hood, i.e. that the how is fully specified on the database-side in the column collation. |
Currently we support the
==
operator and theString.Equals(String)
methods which both have the same behavior in-memory asString.Equals(String, StringComparison.Ordinal)
. We could also support the latter without making significant changes and possibility other values of StringComparsion such as OrdinalIgnoreCase by applyingLOWER()
on both operands.The text was updated successfully, but these errors were encountered: