Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Array contains element" translation #460

Open
havotto opened this issue Jun 6, 2018 · 18 comments
Open

"Array contains element" translation #460

havotto opened this issue Jun 6, 2018 · 18 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@havotto
Copy link

havotto commented Jun 6, 2018

According to the documentation array element containment is translated like this:

.Where(c => c.SomeArray.Contains(3))

to

WHERE 3 = ANY("c"."SomeArray")

As I understand this filter cannot use GIN index created on the array column.
So what about changing the translation to:

WHERE "c"."SomeArray" @> ARRAY[3]

or

WHERE "c"."SomeArray" @> '{3}'

There is a difference with NULL element handling though.

@roji
Copy link
Member

roji commented Jun 6, 2018

Can you please point to the documentation about this? Translating array.Contains(scalar) to a PostgreSQL array-contains-array set operation is quite strange, I'm hoping there's a way to keep the array/scalar types without compromising on the index.

Also, it's important for SQL translations to be as close as possible to their C# source, so a difference in behavior (with regards to NULL for example) is likely not acceptable.

@havotto
Copy link
Author

havotto commented Jun 7, 2018

Yeah, I agree that this NULL behavior difference is problematic.
Anyway, here are the operator classes that GIN indexes support: GIN built-in operator classes
And these are the array operators: Array operators
And here is a discussion (the second answer is relevant): ANY vs @>

What kind of approach do you think would be good to take advantage of GIN indexes on arrays?

@roji
Copy link
Member

roji commented Jun 8, 2018

I'm not an expert on PostgreSQL indices, but isn't it possible to use an expression index to speed up ANY searches?

@havotto
Copy link
Author

havotto commented Jun 13, 2018

I don't this that expression based indexes can make a difference here.
I am thinking about supporting these operators in a different, and more direct way.
Is it possible to translate custom static functions (like EF.Functions.Like) to SQL in EF Core?
If it is, then static such static functions could be defined for these operators (@>, &&, and others).
What do think about this?

@roji
Copy link
Member

roji commented Jun 14, 2018

I don't this that expression based indexes can make a difference here.

Why not? Do you have anything to back this up? The whole point of expression-based indexes is, well, that you can use them for any expression...

Is it possible to translate custom static functions (like EF.Functions.Like) to SQL in EF Core?
If it is, then static such static functions could be defined for these operators (@>, &&, and others).
What do think about this?

Of course it's possible to translate custom static functions to SQL - this is exactly what happens with EF.Functions.Like. But how would that help here? The question isn't how the C# code looks like - is it if (arr.Contains(x)) or is it if (EF.Functions.Foo(arr, x)) - but rather what the generated SQL looks like (does it use ANY, @>...).

@havotto
Copy link
Author

havotto commented Jun 14, 2018

Expression-based indexes can only be used if that exact same expression is present in the SQL. For example 'lower(name)'. But here we have value = ANY(array_column). There is no expression here that can be used in an index expression.

As for the static functions, these would be mapped directly to these special PostgreSQL operators, so the user would not confuse Array.Contains(x) with these. @> is one of these operators, it can be used with several PostgreSQL types, so there could be several overloads of this static function:

EF.PgOperator.Contains<T>(T[] container, T[] contained)
EF.PgOperator.Contains<T>(NpgsqlRange<T> container, NpgsqlRange<T> contained)
EF.PgOperator.Contains<T>(NpgsqlRange<T> range, T value)

There are other operators, like overlaps (&&). This could also used in a similar way:

EF.PgOperator.Overlaps<T>(T[] array1, T[] array2) //true if have common value
EF.PgOperator.Overlaps<T>(NpgsqlRange<T> range1, NpgsqlRange<T> range2)

And with the appropriate index these expressions would use that index.

@roji
Copy link
Member

roji commented Jun 17, 2018

OK, thanks for the links and the explanation - it's clear to me why value = ANY(array) is insufficient.

It still seems that ANY should be the translation for .NET array.Contains(value), rather than PostgreSQL @>. Aside from the behavior difference around NULL, value = ANY(array) does work well if value is the one being indexed and array is a constant or parameter, and is a more natural translation for "scalar in array" (whereas @> is does "array in array").

But it does seem clear that some way is needed to generate a @> expression instead, and adding an EF.Functions.Contains<T>(T[] container, T[] contained) seems like the right way to go, and indeed it's a good idea to add all the array operators supported by PostgreSQL. This should actually be quite simple to do, do you intend to submit a PR for this?

Note that the range operators have already been added in version 2.1.0 (see #311). You can look at that implementation as the basis for your own PR for array operations.

@roji roji added this to the 2.2.0 milestone Jun 17, 2018
@roji roji added the enhancement New feature or request label Jun 17, 2018
@austindrenski
Copy link
Contributor

I have some work in progress on general translation support for array operators in #431, which currently supports this type of translation:

src.Where(x => x.SomeList.All(y => x.SomeArray.Contains(y)));
WHERE x."SomeList" <@ x."SomeArray"

This should also work with an inline array:

src.Where(x => new int[] { 3 }.All(y => x.SomeArray.Contains(y)));
WHERE '{3}'::int[] <@ x."SomeArray"

@havotto Would this form of translation support your use case?

@roji
Copy link
Member

roji commented Jun 17, 2018

@austindrenski thanks, I forgot about your ongoing work, which I'll try to review ASAP :) Do you plan to provide translations for all of the array operations?

@havotto
Copy link
Author

havotto commented Jun 17, 2018

@austindrenski Yes, this looks great. But I think that with array input syntax SQL parameters cannot be supported, like this:

src.Where(x => new int[] { myIntValue }.All(y => x.SomeArray.Contains(y)));

There is an alternative syntax for arrays, but I am not sure if it supports SQL parameters for elements:

WHERE ARRAY[3]::int[] <@ x."SomeArray"

@austindrenski
Copy link
Contributor

@roji The big goal is to provide translations for the major ones by 2.2.0. But if there's time, I would like to add support for all of them.

I would recommend reviewing with #444 before #431, as it splits out some of the generalized components that could be used by related features.

@havotto You are correct that we translate arrays with the constructor syntax, rather than the literal syntax:

-- What I wrote:
WHERE '{3}'::int[] <@ x."SomeArray"

-- What I should have written:
WHERE ARRAY[3]::int[] <@ x."SomeArray"

I believe that we handle new int[] { myIntValue, 3, 4 } by generating ARRAY[myIntValue, 3, 4].

But, I'll add a test to confirm.

@roji
Copy link
Member

roji commented Jun 18, 2018

@austindrenski and @havotto, note the specific behavior around NULL which the PostgreSQL array operators have (e.g. @>): SELECT ARRAY[1,NULL] @> ARRAY[1,NULL]; will return false although its C# counterpart will return true. We need to decide whether this discrepancy is enough to not translate C# native constructs (whether it's array.Contains(value) or array1.All(x => array2.Contains(x)) and have a dedicated function under EF.Functions instead.

I don't think there should be an issue with supporting SQL parameters for array elements - we should definitely be able to translate that correctly.

@havotto
Copy link
Author

havotto commented Jun 18, 2018

I would prefer the dedicated function approach. It's meaning is more straightforward for the user.

@austindrenski
Copy link
Contributor

austindrenski commented Jun 22, 2018

note the specific behavior around NULL which the PostgreSQL array operators have (e.g. @>): SELECT ARRAY[1,NULL] @> ARRAY[1,NULL]; will return false although its C# counterpart will return true.

@havotto, @roji Thank you both for pointing out the NULL discrepancy. I've spent a few days thinking through the different options and I'm on board with adding extensions on EF.Functions. At a minimum, it'll provide an option for users who want to explicitly influence the SQL generation.

But, in general, I do prefer to see translations of standard LINQ patterns, rather than provider specific extensions. This is partly motivated by portability, but also by considerations for the consumer experience. If someone has array.Contains(value) in a query today, I don't think they should have to rewrite it to take advantage of a new translation on our part.

In this case, I think there's room for both the extension methods and pattern translation under specific conditions.

Indeed, part of what makes this operator set complicated is that the NULL behavior doesn't always differ from C#. Consider the following where example 2 and 3 wouldn't differ from their C# equivalents:

-- Example 1:
SELECT ARRAY[1,NULL] @> ARRAY[1,NULL];
?column?
----------
f
(1 row)

-- Example 2:
SELECT ARRAY[1,NULL] @> ARRAY[1];
?column?
----------
t
(1 row)

-- Example 3
SELECT ARRAY[1,NULL] @> ARRAY[]::int[];
?column?
----------
t
(1 row)

From the C# side, we should be able to translate parameters of value type arrays. If the parameter array can't contain NULL, then there's nothing to worry about.

int[] myArray = new int[] { 0, 1, 2 };

src.Select(x => myArray.All(y => x.SomeArray.Contains(y)));
// or
src.Select(x => EF.Functions.Contains(x.SomeArray, myArray));
SELECT x."SomeArray" @> "myArray";

When a parameter array contains nullable values and those values can be verified to be non-null, then pattern translation should also apply:

// We can verify that the elements are not null
string[] myArray = new string[] { "aaa", "bbb", "ccc" };

src.Select(x => myArray.All(y => x.SomeArray.Contains(y)));
// or
src.Select(x => EF.Functions.Contains(x.SomeArray, myArray));
SELECT x."SomeArray" @> "myArray";

There are a lot of cases to consider, but I think my approach for #431 moving forward will be:

  1. Provide extension methods if NULL behavior varies.
  2. Support a pattern translation if NULL behavior can be shown to not apply under specific conditions.
  3. Avoid creating murky conditions for consumers.
    • An average user ought to be able to infer whether a given translation will happen.
      • "This is an array of value types, it'll translate."
      • "This is an array of nullable types, it may or may not translate."

@roji
Copy link
Member

roji commented Jun 22, 2018

@roji Thanks for pointing out the NULL discrepancy. I've spent a few days thinking through the different options and I'm on board with adding extensions on EF.Functions. At a minimum, it'll provide an option for users who want to explicitly influence the SQL generation.

You should thank @havotto instead :)

But, in general, I do prefer to see translations of standard LINQ patterns, rather than provider specific extensions. This is partly motivated by portability, but also by considerations for the consumer experience. If someone has array.Contains(value) in a query today, I don't think they should have to rewrite it to take advantage of a new translation on our part.

Of course we always prefer to translate standard .NET expressions rather than introducing new and specific functions. The guiding principle is simply to translate when the server-side behavior corresponds exactly to the client behavior, and not to do it otherwise.

I think that portability here is a somewhat complex idea. In my opinion, whether an expression is executed client-side or server-side is also part of portability: imagine a theoretical application working well on PostgreSQL where an expression is server-evaluated, and when ported to MySQL breaks down completely because the expression is client-evaluated and breaks performance. In that sense, translating ordinary .NET expressions is actually a bit more dangerous, since it hides things: when you execute a special function from the Npgsql provider assembly, it's at least pretty clear that this function will only be server-evaluated in PostgreSQL. This doesn't mean we shouldn't translate ordinary .NET expressions (client evaluation warnings do exist for this), but it's primarily an experience/convenience thing.

Regarding the idea of translating only value types, I'm a bit skeptical. It seems pretty confusing (and a little dangerous) for the exact same code to start being client-evaluated only because the type was switched from int to string (again don't forget the potentially huge impact on performance) - this is exactly the kind of murky behavior you referred to above. I'd rather have code that if it works, works in all cases - that means less subtlety and logic to understand on the user side. And don't forget that the ordinary .NET construct (arr1.All(x => arr.Contains(x))) currently doesn't get translated in any other provider, including Npgsql - so this is about adding a new feature, and that feature would only work in some cases (i.e. value types).

I'd also add that translating these complex LINQ expressions is more complicated than the usual simple function translation we do - it requires translating fragments and other things (#444) we haven't done yet, making the provider generally more complex and somewhat prone to break moving forward as EFCore evolves (as I wrote in #444, the services being replaced are currently internal in EF Core). To be clear, I'm OK in general with translating complex expressions/fragments, but there just doesn't seem to be enough value in what is currently on the table...

Let me know your thoughts, and thanks as always for the great conversations and exchanges of ideas.

@mikes-gh
Copy link

I think I just hit this

string[] invoiceNoStrings = invoiceNumber.Select(x => x.ToString()).ToArray();
            ICollection<Models.Transactions> invoiceTransactions = _tradingCompanyDbContext.Transactions
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.InvTypeNavigation)
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.PaymentFreqNavigation)
                .Include(t => t.InvoiceDetails).ThenInclude(i => i.StockNoNavigation)
                .Include(t => t.DiscountDetails)
                .Include(t => t.CommentDetails)
                .Where(t => invoiceNoStrings.Contains(t.CustSuppRef)
                    && (t.TransCode == 0 || t.TransCode == 1))
                .AsNoTracking()
                .ToList();

Im migrating from SQL server and their provider translates this.
Since I have a large database this kills the database.
When I was using the SQL provider it errors if client side evaluation is going to happen.
Is that the responsibility of the provider?

@mikes-gh
Copy link

Is there another way I can write the where clause here

@mikes-gh
Copy link

OK I think im hitting #1152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants