Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow an empty collection_valued_input_parameter in an "IN" expression #100

Closed
lukasj opened this issue Feb 21, 2015 · 39 comments · Fixed by #571
Closed

Allow an empty collection_valued_input_parameter in an "IN" expression #100

lukasj opened this issue Feb 21, 2015 · 39 comments · Fixed by #571

Comments

@lukasj
Copy link
Contributor

lukasj commented Feb 21, 2015

In section 4.6.9, In Expressions, it should be specified that an empty collection_valued_input_parameter must not result in an exception. More specifically:

the expression
o.country IN (:countries)
must be treated as false

and the expression
o.country NOT IN (:countries)
must be treated as true

This should be trivial for JPA implementations to implement, since they can simply replace the expression with an expression like "1 = 0" or "1 = 1".

Currently, JPA implementations throw an exception in this case, which makes it cumbersome to use collection valued input parameters.

@lukasj
Copy link
Contributor Author

lukasj commented Feb 21, 2015

@glassfishrobot Commented
Reported by anthony_ve

@lukasj
Copy link
Contributor Author

lukasj commented Dec 2, 2016

@glassfishrobot Commented
jgigov said:
Having no values between the parentheses causes an SQL syntax error in MySQL and PostgreSQL. I don't have other databases to test it with.
A way to simulate no results is to replace the parameter within parentheses with a subquery like this: select 1 where false. This workaround worked for both with integer fields, but PostgreSQL will never accept it if it can't resolve the types and find a conversion function. The fact that there is nothing to convert is irrelevant to it.

@lukasj
Copy link
Contributor Author

lukasj commented Dec 5, 2016

@glassfishrobot Commented
datanucleus1 said:
DataNucleus JPA has always supported checking for empty input lists and generating the SQL as you show.
If your JPA provider doesn't then raise a bug on them (though the JPA spec should be explicit about what should happen in that situation as per your description)

@lukasj
Copy link
Contributor Author

lukasj commented May 5, 2017

@glassfishrobot Commented
This issue was imported from java.net JIRA JPA_SPEC-100

@lukasj
Copy link
Contributor Author

lukasj commented Aug 31, 2018

@m-reza-rahman
Copy link

This looks like something that could be easily looked at and resolved. Perhaps it makes sense to triage this for consideration to be worked on during the next release? I would still mark this low priority. Perhaps it can simply be independently/fairly evaluated and closed? This seems like an issue about finer grained SQL generation, which JPA generally tries to stay out of...

Reza Rahman
Jakarta EE Ambassador, Author, Blogger, Speaker

Please note views expressed here are my own as an individual community member and do not reflect the views of my employer.

@gavinking
Copy link
Contributor

This looks like something that could be easily looked at and resolved.

I disagree. JPA implementations usually cache the generated SQL and try to leave it alone at query execution time. This is a new feature that requires runtime processing of the cached SQL.

Therefore, this change would impose a surprisingly large burden on implementers. I'm not against it per se, but it doesn't seem like a priority.

@m-reza-rahman
Copy link

m-reza-rahman commented Apr 30, 2021

Just for clarity, simply closing/triaging some of this (keeping fairness in mind of course) as too low priority, too vague, too far fetched, too forward looking, out-of-date, out-of-scope, etc I think is a perfectly fine resolution. Whatever the factors, that does not seem to have been done for some time. It hopefully leads to focusing on making some tangible progress towards getting Jakarta Persistence moving again and focusing on items that make more practical sense in the near term.

Reza Rahman
Jakarta EE Ambassador, Author, Blogger, Speaker

Please note views expressed here are my own as an individual community member and do not reflect the views of my employer.

@arjantijms
Copy link

I've encountered this issue quite often in actual projects. Collections are passed-in, and people either forget that collections can be empty (resulting in unneeded exceptions), or there are rather ugly if/else statements.

I agree it's a burden on Jakarta Persistence implementations to implement this, but not doing so shifts that burden to user applications which have to handle the empty/not-empty case.

@gavinking
Copy link
Contributor

So I just tried these lines in Hibernate:

em.createQuery("select name from Author where id in :ids", String.class)
        .setParameter("ids", List.of())
        .getResultList()
em.createQuery("select name from Author where id not in :ids", String.class)
        .setParameter("ids", List.of())
        .getResultList()

and both resulted in the correct SQL.

So there's no objection on our side to requiring this. @lukasj does EclipseLink also support it?

@gavinking
Copy link
Contributor

To be clear, this would be done as part of #570, since "collection-valued input parameters" are currently underspecified.

@gavinking
Copy link
Contributor

Since this issue was closed by the PR, I should clarify that we ended up writing this:

A list of arguments may be assigned to a collection-valued parameter of a JPQL query by packaging the arguments in a non-null instance of java.util.List and passing the list as an argument to the second parameter of setParameter(). The list should contain at least one element. If the list is empty the behavior is undefined. Portable applications should not pass an empty list to a collection-valued parameter.

Which is to say, @lukasj decided that since SQL itself does not allow an empty in list, that JPQL should not require that it be allowed.

This is I guess not the resolution some people were hoping for, but it's a resolution.

[We have also done a lot to clarify the semantics in this area.]

@gavinking gavinking closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2024
@arjantijms
Copy link

since SQL itself does not allow an empty in list, that JPQL should not require that it be allowed.

Well, actually the idea was that it should be allowed. As JPQL is higher level, shouldn't it be possible to generate the lower level legal SQL?

@lukasj
Copy link
Contributor Author

lukasj commented Mar 4, 2024

since SQL itself does not allow an empty in list, that JPQL should not require that it be allowed.

Well, actually the idea was that it should be allowed. As JPQL is higher level, shouldn't it be possible to generate the lower level legal SQL?

It is not a problem of the generation of valid SQL, it is about which valid SQL to generate in this particular case - should an empty list become an SQL NULL? Should it be a list with a NULL item? Or should it be some magic constant based on the query being evaluated so the NULL returned by the DB for sth like '2' IN (NULL) evaluates to true/false based on the current context?

@gavinking
Copy link
Contributor

Yeah so this is something I just really didn't appreciate before, but Lukas is correct.

  • if you translate foo in () to foo in (null), then you get the wrong behavior when foo is non-null
  • if you translate foo in () to 1=2, like our product does, then you get the wrong behavior when foo is null
  • if you translate it to foo in (-66666666) then you get the wrong behavior when foo=-66666666.

I can't think of a completely-correct solution here, can you?

@gavinking
Copy link
Contributor

I can't think of a completely-correct solution here, can you?

Actually I this works, I believe:

case when foo is null then null else 1=2 end

@lukasj
Copy link
Contributor Author

lukasj commented Mar 4, 2024

Also note that two NULL values are not equal by definition - see https://jakarta.ee/specifications/persistence/3.1/jakarta-persistence-spec-3.1#a5676

@arjantijms
Copy link

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:

<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>

The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.

It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

@gavinking
Copy link
Contributor

@arjantijms That's completely the opposite of what the semantics should logically be!

Quite clearly 2=2 and 1 in () should evaluate to false. You have it evaluating to true.

@gavinking
Copy link
Contributor

(So if we did this, it would do the exact opposite of what you're asking for.)

@quaff
Copy link

quaff commented Mar 5, 2024

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:

<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>

The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.

It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

you should use CriteriaQuery instead to construct dynamic query.

@gavinking
Copy link
Contributor

I can't think of a completely-correct solution here, can you?

Actually I this works, I believe:

case when foo is null then null else 1=2 end

What I found that works on all major databases is to translate x in () to (1 = case x is not null then 0 end).

@beikov
Copy link

beikov commented Mar 5, 2024

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:

<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>

The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.
It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

you should use CriteriaQuery instead to construct dynamic query.

💯

If that predicate depends on a join that is only needed for that one predicate, the JPA provider would still have to do the join, but you can be smarter about this when you construct the query dynamically and avoid the join altogether if the input list is empty.

@arjantijms
Copy link

you should use CriteriaQuery instead to construct dynamic query.

Well, yes and no.

Especially for somewhat more complicated queries, JPQL is much easier to read and maintain.

Exactly for this reason I've proposed more than a decade ago to have the ability to transform JPQL into Criteria. That way I can write a base query in JPQL, and then use Criteria to add or remove certain parts to it.

@gavinking
Copy link
Contributor

Exactly for this reason I've proposed more than a decade ago to have the ability to transform JPQL into Criteria. That way I can write a base query in JPQL, and then use Criteria to add or remove certain parts to it.

FTR our implementation now supports this. But I'm not sure about other implementations.

@arjantijms
Copy link

arjantijms commented Mar 5, 2024

That's completely the opposite of what the semantics should logically be!

I hear you, and I had been thinking about that a lot. It's not completely straightforward what is logical.

If you see the list as constraints and there simply are no constraints, evaluating to true might be logical. This often translates to how a choice from a UI works. If the user doesn't select any explicit country from a list, a table or whatever searches in all countries. If the user specifies one or more countries, the search is only for those specified countries.

However, if the list is seen as requirements, then evaluating to false is more logical. If my memory serves me correctly, around a decade ago exactly this was discussed on the JPA JIRA or list.

Maybe something like this could work?

<named-query name="Website.getByUserIdAndCountries" empty-collection-semantics="[true|false]">

There's maybe alternative ways to specific this, but the idea would be for the user to specify whether an empty collection should evaluate to effectively true or false.

@arjantijms
Copy link

FTR our implementation now supports this. But I'm not sure about other implementations.

That's extremely cool. Would be awesome to have this in the Jakarta spec. I'm sure there were various issues for that.

@gavinking
Copy link
Contributor

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

@gavinking
Copy link
Contributor

Would be awesome to have this in the Jakarta spec.

Sure, but it can only go in the spec if it's implementable by others.

@lukasj
Copy link
Contributor Author

lukasj commented Mar 5, 2024

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:

<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>

The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.

It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

Why exactly are two queries required? Isn't sth like

<named-query name="Website.getByUserIdAndCountries">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    case when (:countryIds is not null) then website.countryId in (:countryIds) 
             else true
  </query>
 </named-query>

equivalent? Also be aware that every IN can be rewritten to JOIN (but not vice-versa).

However, if the list is seen as requirements, then evaluating to false is more logical.

Once nullable columns and/or NULLs come to play, what seems logical is not what boolean algebra defines.

@lukasj
Copy link
Contributor Author

lukasj commented Mar 5, 2024

This often translates to how a choice from a UI works. If the user doesn't select any explicit country from a list, a table or whatever searches in all countries. If the user specifies one or more countries, the search is only for those specified countries.

the developer is always able to define his own, specific null-element to base his conditions in queries on

@gavinking
Copy link
Contributor

case when (:countryIds is not null) then website.countryId in (:countryIds) else true

I think we disallow binding null to a collection-valued input param.

@arjantijms
Copy link

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

@lukasj
Copy link
Contributor Author

lukasj commented Mar 5, 2024

case when (:countryIds is not null) then website.countryId in (:countryIds) else true

I think we disallow binding null to a collection-valued input param.

what about empty collection? that should not be null and is not null should return true only if it is not null and not empty... anyway, be it is not null or is not empty - my point is about defining the query correctly

@gavinking
Copy link
Contributor

gavinking commented Mar 5, 2024

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

I have no idea. I can, however, promise you that in set theory x ∈ {} evaluates to false for every x.

@lukasj
Copy link
Contributor Author

lukasj commented Mar 5, 2024

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

it is undefined

mysql> select 1 where 1 in ();
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1

@arjantijms
Copy link

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

I have no idea. I can, however, promise you that in set theory x ∈ {} evaluates to false for every x.

Yes, and there is no element of {} for which any property P holds.

Then again, are SQL / relational databases strictly based on set theory, or only partially? Should especially JPQL provide conveniences in addition to math adherence, or should it ask its users to implement said conveniences?

Maybe omitting the part of the where clause entirely when the collection is empty should be provided by another mechanism than the empty collection semantics. I don't claim to have the answers here, just thinking out loud.

@arjantijms
Copy link

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

it is undefined

Indeed, it's the same on pretty much every dialect I'ver ever worked with (Mostly MySQL, Postgres, H2 and Derby). The question is mostly here: why is it undefined if set theory so clearly states the outcome should be false?

@lukasj
Copy link
Contributor Author

lukasj commented Mar 5, 2024

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

it is undefined

Indeed, it's the same on pretty much every dialect I'ver ever worked with (Mostly MySQL, Postgres, H2 and Derby). The question is mostly here: why is it undefined if set theory so clearly states the outcome should be false?

no, I wasn't completely accurate - formally, it is defined to be UNKNOWN - not true, not false (like Boolean, which can be true, false and also null)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants