-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic OR predicate on multiple columns #32092
Comments
I was also running into this the other day, I think it's a missing piece in Linq. It would be nice to have some sort of composable Any so we could so something like the following pseudo code: var query = something... As of today we're forced to take some workarounds to build such queries like for example manually constructing expression trees. The code above doesn't work today because Maybe I'm missing something obvious but I'm not aware of how we can express a list of OR conditions in Linq today. |
I've seen people mention this PredicateBuilder but it's not clear if it's supported. |
Duplicate of #11799 |
We've made various progress around Contains that goes in this direction, but full support for tuples (which the above requires) isn't yet supported.
@clement911 what SQL are you currently using for the above? @ilmax I'm not sure what's missing in LINQ here. First, the above - Contains with tuple - is already the right LINQ expression for the operation; the problem is that EF doesn't yet support translating it (because support is lacking for tuples in general). When we implement support for this, the array of tuples parameter ( |
@roji I think I described my wish in the comment above. I'd like the ability to construct multiple conditions and then OR- them together, same thing we can do with the The contains with tuple to me just doesn't feels good. It's difficult to read compared with my pseudo proposal above (but this is my personal opinion) and it's also more limited than what I was proposing above. |
@ilmax it sounds like you're asking for #15670. If so, that approach generates different SQLs for arrays of different size (since there are more ORs); this is bad, since it defeats query plan caching at various levels and hurts performance considerably (see #15670 (comment) for more on this). For that reason, I definitely think that's not the right solution here. Packing the parameter as a JSON parameter and expanding it to a table in the database is the better way to do this.
Well, "doesn't feel good" isn't a very strong argument... The contains with tuple works exactly the same way as the contains without a tuple, so I'm not sure why anyone would find it difficult to read. More importantly, your proposal doesn't show what the query would actually look like, in code. If you're asking for us to add some new, unknown LINQ construct for this, I definitely don't see us doing that when a perfectly suitable LINQ construct already exists. |
I agree, that's why I said it's just my personal opinion, the second point about limitation is more important and it should have been added as first. Having said that and ignoring the perf side of thing for now, I was saying that to me looks like a Linq limitation because I feel (but I may be wrong) that's not easy with Linq today to compose several query with OR because So what I feel like is missing is the ability to compose multiple conditions together with an OR. this is possible today in Linq to object but not always properly translated by various providers (not talking only about EF core here and that's my point about exposing a simpler method in Linq and risking to go O.T. here). Take for example this trivial linq 2 object code that's just for illustration purposes: var customers = new List<Customer>();
for (int i = 0; i < 20; i++)
{
customers.Add(new Customer(i, i % 5, i*2));
}
var customersToFind = new List<(int Id, int TenantId)>
{
{(8, 4)},
{(9, 4)},
{(14, 4)}
};
var found = customers.Where(i => customersToFind.Any(x => i.TenantId == x.TenantId && (x.Id == i.Id || x.Id == i.AlternateId))).ToArray();
public record Customer(int Id, int TenantId, int AlternateId); If we want to translate the same query to EF or other Linq provider (looking at you, Cosmos SDK) out there, we're mostly out of luck. EF today (tested with the latest 8.0-RC2 version available, see gist here) can't translate a similar query. var customerQuery = customers.AsQueryable();
foreach (var custId in customersToFind)
{
customerQuery = customerQuery.Or(c => c.TenantId == custId.TenantId && (c.Id == custId.Id || c.AlternateId == custId.Id));
} In order to express such functionality today and not pulling in 3rd parties libraries (unless I'm missing something) we have to take a stroll into manually creating expression trees and that's not trivial and, if memory serves me right, it was discouraged by the EF team itself. We can argue that's probably not the best way to query perf wise and that you should have a better db design to avoid such queries and everything else but, every now and then, those query came up and today I always end up with manually created expression trees. I hope I expressed myself in a clearer way now :) |
So again, I think you're really conflating two things:
I'm seeing nothing in this issue that can't be expressed with LINQ - via Contains and tuples; this works perfectly with LINQ to Objects, and IMHO is also the ideal and compact way to express the operation. Then there's the question of specific LINQ providers (and/or EF providers) actually translating that construct; that's really a separate question. Note that there are various other unrelated LINQ constructs that EF can't translate to SQL. I've already written above why generating dynamic SQL is a bad idea (bad perf because of how SQL plan caching works). I suggest fully understanding that; it should make clear why your proposal wouldn't be a good way forward. Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product). |
I'm well aware of how Linq and EF works, that was why I was thinking about exposing it in Linq itself (even though I'm aware that the Linq library has a very high bar to accept any changes so I knew my idea would go nowhere, I just wanted to see if there's a need of such functionality)
I'm not sure I agree on the ideal part. What I propose has far superior readability (subjective point) and can be composed easily, I also think that's probably easier for a linq provider to translate since it doesn't have to understand the reference a collection and translating the Any()
I've read the comments and understand the drawbacks, but we don't live in a perfect world and sometimes those kind of trade-offs have to be made
I agree, everything can be done today but it's impractical. What I was hoping was to simplify what to me seems like a common enough case to deserve to be treated as a first class citizen. Given that I've failed to prove my case, I will live with my manually constructed expression trees :) |
@roji We use the following SQL work around. (long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
string orPredicate = orderIds.Select(i => $"({nameof(Order.TenantId)} = {i.tenantId} AND {nameof(Order.Id)} = {i.orderId})")
.Join(" OR ");
var orders = await ctx.Set<Order>()
.FromSqlRaw($"""
SELECT *
FROM [{SCHEMA_PUBLIC}].[{nameof(Order)}]
WHERE ({orPredicate})
""")
.ToArrayAsync(); But of course we don't like because it's brittle, verbose, and we lose many of the benefits of querying with EF. Using a linq query like below would be ideal.
My understanding is that Postgres natively supports tuple comparison. select * from books WHERE
(title, id) > ('KVFNdl5F', 994364) Unfortunately, we use sql server and sql server doesn't have native support for tuples. You mentioned that EF would likely encode the array of IDs as a JSON string and send to the database, to be unpacked with OPENJSON? But to take a step back, we don't care too much about the actual generated SQL, as long as it performs well. It hadn't occur to me to use OPENJSON. For example, imagine that the user provides a list of prefixes and the query needs to return all orders for which the product name starts with any of the supplied prefixes. var orPredicate = EF.Or<Order>();
foreach (string prefix in prefixes)
orPredicate = orPredicate.Or(o => o.ProductName.StartsWith(prefix);
var orders = ctx.Orders.Where(o => o.TenantId = tenantId && orPredicate).ToArrayAsync(); I know that we can build expression trees by hand but who does? If I understand correctly, the EF sql generation pipeline wouldn't even need to change, since it wouldn't care that the expression tree was built using a utility rather than with a 'normal' query. So it seems like it would be a quick win. While I agree that |
The Contains behavior is changing in 8.0, precisely because of the query plan issues inherent in varying SQL, see this blog post for more details. I think that in principle, it's best if EF never generate varying SQLs (like it does for Contains over a parameterized list prior to 8.0). Regard whether it's more or less complicated, I don't think that should be something the user cares much about... As long as the query works correctly and has good perf, the underlying mechanisms used to translate the LINQ query are implementation details.
There's no need for dynamic ORs here either; in fact, EF Core 8.0 supports this kind of query using the same OPENJSON technique used for Contains. The following LINQ query: var prefixes = new[] { "foo", "bar" };
_ = await ctx.Blogs.Where(b => prefixes.Any(p => b.Name.StartsWith(p))).ToListAsync(); Translates in 8.0 as follows: SELECT [b].[Id], [b].[Flag], [b].[Name]
FROM [Blogs] AS [b]
WHERE EXISTS (
SELECT 1
FROM OPENJSON(@__prefixes_0) WITH ([value] nvarchar(max) '$') AS [p]
WHERE [p].[value] IS NOT NULL AND LEFT([b].[Name], LEN([p].[value])) = [p].[value]) In other words, OPENJSON (or the equivalent in other databases) is used to unpack a JSON list into a table, at which point the regular relational tools can be used. This is extremely powerful/flexible and allows translating a wide array of LINQ constructs which weren't translatable prior to 8.0.
If PredicateBuilder works and satisfies your needs, then why does it need to be a built-in utility in EF Core? We very much don't believe that all possible functionality needs to be in-the-box inside EF, and are always happy to see 3rd-party utilities integrate with EF. It's simply not practical to expect the EF team to do everything. |
Thanks @roji
That's really cool. Works: long[] orderIds= ...;
await ctx.Orders.Where(o => orderIds.Any(id => o.Id == id).ToListAsync(); Doesn't work: (long tenantId, long orderId)[] orderIds = ...;
await ctx.Orders.Where(o => orderIds.Any(id => o.TenantId == id.tenantId && o.Id == id.orerId).ToListAsync(); Right? Also I'm curious if using OPENJSON works properly when using various collations? I understand that EF cannot realistically implement and support every possible feature. |
Take a look at this comment and at that issue in general. I haven't benchmarked dynamic ORs specifically, but I have compared with the previous Contains behavior (i.e. embed constants in the SQL). The bottom line is that if you concentrate solely on the specific query performance, we've seen the OPENJSON-based technique perform slightly worse than constant embedding; however, once you consider the overall perf impact on other queries (i.e. other query plans getting evicted because of varying SQLs), paying that relatively small constant perf hit should make sense.
Yes - arrays of tuples aren't yet supported (as opposed to arrays of e.g. ints). This is something I hope we'll be able to improve in the near future.
AFAIK that's not how it works - have you tested this? Parameters - just like nvarchar constants - do not have a collation (by default); when you compare them to a column, the column's collation is used for the comparison operation. Similarly, the OPENJSON-generated table has a column with no collation, so OPENJSON shouldn't make any difference compared to sending a simple string parameter. If you're seeing another behavior please open a new issue with some code so we can investigate. |
I think that the discussion here has run its course. To summarize my point of view:
Given the above I'll go ahead and close this issue, but of course feel free to post back here with any other thoughts and continue the conversation. |
@roji I appreciate your fast and clear comments. However, I did some quick testing and I think that the collation thing will be an issue that needs to be addressed otherwise it'll break queries with a .Contains that operate on char columns with a non default collation. |
@clement911 can you post your quick testing so we can investigate? |
@roji Yes, I will. |
@clement911 you need the preview SDK. Then you can use VS Code to write the code or get the right VS version. |
I raised a bug issue here: #32147 |
For anyone who have to do #32092 (comment)
and copy-pasted so many /// <see>https://github.com/dotnet/efcore/issues/32092#issuecomment-2221633692</see>
/// <see>https://stackoverflow.com/questions/70744232/query-where-multiple-columns-have-to-match-a-value-set-simultaneously-in-ef-core/78732959#78732959</see>
public static IQueryable<TEntity> WhereOrContainsValues<TEntity, TToCompare>(
this IQueryable<TEntity> queryable,
IEnumerable<TToCompare> valuesToCompare,
IEnumerable<Func<TToCompare, Expression<Func<TEntity, bool>>>> comparatorExpressionFactories) =>
queryable.Where(valuesToCompare.Aggregate(
LinqKit.PredicateBuilder.New<TEntity>(),
(outerPredicate, valueToCompare) => outerPredicate.Or(
comparatorExpressionFactories.Aggregate(
LinqKit.PredicateBuilder.New<TEntity>(),
(innerPredicate, expressionFactory) =>
innerPredicate.And(expressionFactory(valueToCompare)))))); Taking the example from #32092 (comment) public class Order
{
[Key]
public long TenantId {get;set;}
[Key]
public long Id {get;set;}
//other properties...
}
(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync(); //this doesn't work it now would be: var orders = await ctx.Orders.WhereOrContainsValues(orderIds,
[
orderId => order => orderId.tenantId == order.TenantId,
orderId => order => orderId.orderId == order.Id
]).ToArrayAsync(); and translated to SQL like: SELECT fields FROM Orders
WHERE (TenantId = $1 AND Id = $2)
OR (TenantId = $3 AND Id = $4)
-- keep go on for each item in orderIds |
…mment) to simplify multiple usages of `LinqKit.PredicateBuilder` @ ExtensionMethods.cs @ c#/crawler
Also try out a more generic one: https://stackoverflow.com/questions/67934374/ef-core-5-contains-with-several-columns/67935339#67935339 /// <see>https://stackoverflow.com/questions/67666649/lambda-linq-with-contains-criteria-for-multiple-keywords/67666993#67666993</see>
public static IQueryable<T> FilterByItems<T, TItem>(
this IQueryable<T> query,
IEnumerable<TItem> items,
Expression<Func<T, TItem, bool>> filterPattern,
bool isOr = true)
{
var predicate = items.Aggregate<TItem, Expression?>(null, (current, item) =>
{
var itemExpr = Expression.Constant(item);
var itemCondition = FilterByItemsExpressionReplacer
.Replace(filterPattern.Body, filterPattern.Parameters[1], itemExpr);
return current == null
? itemCondition
: Expression.MakeBinary(isOr ? ExpressionType.OrElse : ExpressionType.AndAlso,
current,
itemCondition);
}) ?? Expression.Constant(false);
var filterLambda = Expression.Lambda<Func<T, bool>>(predicate, filterPattern.Parameters[0]);
return query.Where(filterLambda);
}
private sealed class FilterByItemsExpressionReplacer(IDictionary<Expression, Expression> replaceMap) : ExpressionVisitor
{
private readonly IDictionary<Expression, Expression> _replaceMap
= replaceMap ?? throw new ArgumentNullException(nameof(replaceMap));
public static Expression Replace(Expression expr, Expression toReplace, Expression toExpr) =>
new FilterByItemsExpressionReplacer(new Dictionary<Expression, Expression> {{toReplace, toExpr}})
.Visit(expr);
[return: NotNullIfNotNull(nameof(node))]
public override Expression? Visit(Expression? node) =>
node != null && _replaceMap.TryGetValue(node, out var replacement)
? replacement
: base.Visit(node);
} var orders = await ctx.Orders.FilterByItems(orderIds, (ids, o) => ids.Contains((o.TenantId, o.Id)).ToArrayAsync(); |
In EF core, is it possible to have WHERE with a multi-column OR predicate that is based on a dynamic list?
Let me explain with some simple pseudo code.
Assume I need to fetch a list of orders, based of a list of order ids.
I can do as follows:
That works great.
Now, imagine that the order class has a composite key made of multiple columns.
I would like to write something like this:
However, this doesn't work.
Is there an alternative?
As of now, we generate sql manually and use
FromSqlRaw
. That's not ideal though.The text was updated successfully, but these errors were encountered: