Allow an empty collection_valued_input_parameter in an "IN" expression #100

lukasj · 2015-02-21T15:36:18Z

In section 4.6.9, In Expressions, it should be specified that an empty collection_valued_input_parameter must not result in an exception. More specifically:

the expression
o.country IN (:countries)
must be treated as false

and the expression
o.country NOT IN (:countries)
must be treated as true

This should be trivial for JPA implementations to implement, since they can simply replace the expression with an expression like "1 = 0" or "1 = 1".

Currently, JPA implementations throw an exception in this case, which makes it cumbersome to use collection valued input parameters.

lukasj · 2015-02-21T15:36:18Z

@glassfishrobot Commented
Reported by anthony_ve

lukasj · 2016-12-02T18:12:00Z

@glassfishrobot Commented
jgigov said:
Having no values between the parentheses causes an SQL syntax error in MySQL and PostgreSQL. I don't have other databases to test it with.
A way to simulate no results is to replace the parameter within parentheses with a subquery like this: select 1 where false. This workaround worked for both with integer fields, but PostgreSQL will never accept it if it can't resolve the types and find a conversion function. The fact that there is nothing to convert is irrelevant to it.

lukasj · 2016-12-05T15:20:41Z

@glassfishrobot Commented
datanucleus1 said:
DataNucleus JPA has always supported checking for empty input lists and generating the SQL as you show.
If your JPA provider doesn't then raise a bug on them (though the JPA spec should be explicit about what should happen in that situation as per your description)

lukasj · 2017-05-05T06:05:40Z

@glassfishrobot Commented
This issue was imported from java.net JIRA JPA_SPEC-100

lukasj · 2018-08-31T16:49:25Z

Issue Imported From: https://github.com/javaee/jpa-spec/issues/100
Original Issue Raised By:@glassfishrobot
Original Issue Assigned To: @ldemichiel

m-reza-rahman · 2021-04-30T13:23:41Z

This looks like something that could be easily looked at and resolved. Perhaps it makes sense to triage this for consideration to be worked on during the next release? I would still mark this low priority. Perhaps it can simply be independently/fairly evaluated and closed? This seems like an issue about finer grained SQL generation, which JPA generally tries to stay out of...

Reza Rahman
Jakarta EE Ambassador, Author, Blogger, Speaker

Please note views expressed here are my own as an individual community member and do not reflect the views of my employer.

gavinking · 2021-04-30T14:11:12Z

This looks like something that could be easily looked at and resolved.

I disagree. JPA implementations usually cache the generated SQL and try to leave it alone at query execution time. This is a new feature that requires runtime processing of the cached SQL.

Therefore, this change would impose a surprisingly large burden on implementers. I'm not against it per se, but it doesn't seem like a priority.

m-reza-rahman · 2021-04-30T14:56:54Z

Just for clarity, simply closing/triaging some of this (keeping fairness in mind of course) as too low priority, too vague, too far fetched, too forward looking, out-of-date, out-of-scope, etc I think is a perfectly fine resolution. Whatever the factors, that does not seem to have been done for some time. It hopefully leads to focusing on making some tangible progress towards getting Jakarta Persistence moving again and focusing on items that make more practical sense in the near term.

Reza Rahman
Jakarta EE Ambassador, Author, Blogger, Speaker

Please note views expressed here are my own as an individual community member and do not reflect the views of my employer.

arjantijms · 2022-10-04T10:22:22Z

I've encountered this issue quite often in actual projects. Collections are passed-in, and people either forget that collections can be empty (resulting in unneeded exceptions), or there are rather ugly if/else statements.

I agree it's a burden on Jakarta Persistence implementations to implement this, but not doing so shifts that burden to user applications which have to handle the empty/not-empty case.

gavinking · 2024-01-26T12:19:31Z

So I just tried these lines in Hibernate:

em.createQuery("select name from Author where id in :ids", String.class)
        .setParameter("ids", List.of())
        .getResultList()
em.createQuery("select name from Author where id not in :ids", String.class)
        .setParameter("ids", List.of())
        .getResultList()

and both resulted in the correct SQL.

So there's no objection on our side to requiring this. @lukasj does EclipseLink also support it?

gavinking · 2024-01-26T12:21:09Z

To be clear, this would be done as part of #570, since "collection-valued input parameters" are currently underspecified.

gavinking · 2024-02-26T20:21:57Z

Since this issue was closed by the PR, I should clarify that we ended up writing this:

A list of arguments may be assigned to a collection-valued parameter of a JPQL query by packaging the arguments in a non-null instance of java.util.List and passing the list as an argument to the second parameter of setParameter(). The list should contain at least one element. If the list is empty the behavior is undefined. Portable applications should not pass an empty list to a collection-valued parameter.

Which is to say, @lukasj decided that since SQL itself does not allow an empty in list, that JPQL should not require that it be allowed.

This is I guess not the resolution some people were hoping for, but it's a resolution.

[We have also done a lot to clarify the semantics in this area.]

arjantijms · 2024-03-04T21:41:07Z

since SQL itself does not allow an empty in list, that JPQL should not require that it be allowed.

Well, actually the idea was that it should be allowed. As JPQL is higher level, shouldn't it be possible to generate the lower level legal SQL?

lukasj · 2024-03-04T22:26:18Z

since SQL itself does not allow an empty in list, that JPQL should not require that it be allowed.

Well, actually the idea was that it should be allowed. As JPQL is higher level, shouldn't it be possible to generate the lower level legal SQL?

It is not a problem of the generation of valid SQL, it is about which valid SQL to generate in this particular case - should an empty list become an SQL NULL? Should it be a list with a NULL item? Or should it be some magic constant based on the query being evaluated so the NULL returned by the DB for sth like '2' IN (NULL) evaluates to true/false based on the current context?

gavinking · 2024-03-04T22:47:50Z

Yeah so this is something I just really didn't appreciate before, but Lukas is correct.

if you translate foo in () to foo in (null), then you get the wrong behavior when foo is non-null
if you translate foo in () to 1=2, like our product does, then you get the wrong behavior when foo is null
if you translate it to foo in (-66666666) then you get the wrong behavior when foo=-66666666.

I can't think of a completely-correct solution here, can you?

gavinking · 2024-03-04T22:54:13Z

I can't think of a completely-correct solution here, can you?

Actually I this works, I believe:

case when foo is null then null else 1=2 end

lukasj · 2024-03-04T23:03:27Z

Also note that two NULL values are not equal by definition - see https://jakarta.ee/specifications/persistence/3.1/jakarta-persistence-spec-3.1#a5676

arjantijms · 2024-03-04T23:13:53Z

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:

<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>

<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>

The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.

It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

gavinking · 2024-03-04T23:18:36Z

@arjantijms That's completely the opposite of what the semantics should logically be!

Quite clearly 2=2 and 1 in () should evaluate to false. You have it evaluating to true.

gavinking · 2024-03-04T23:19:42Z

(So if we did this, it would do the exact opposite of what you're asking for.)

quaff · 2024-03-05T02:23:14Z

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:
<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>
The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.

It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

you should use CriteriaQuery instead to construct dynamic query.

gavinking · 2024-03-05T07:12:16Z

I can't think of a completely-correct solution here, can you?

Actually I this works, I believe:
case when foo is null then null else 1=2 end

What I found that works on all major databases is to translate x in () to (1 = case x is not null then 0 end).

beikov · 2024-03-05T08:05:49Z

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:
<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>
The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.
It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.
you should use CriteriaQuery instead to construct dynamic query.

💯

If that predicate depends on a join that is only needed for that one predicate, the JPA provider would still have to do the join, but you can be smarter about this when you construct the query dynamically and avoid the join altogether if the input list is empty.

arjantijms · 2024-03-05T12:37:16Z

you should use CriteriaQuery instead to construct dynamic query.

Well, yes and no.

Especially for somewhat more complicated queries, JPQL is much easier to read and maintain.

Exactly for this reason I've proposed more than a decade ago to have the ability to transform JPQL into Criteria. That way I can write a base query in JPQL, and then use Criteria to add or remove certain parts to it.

gavinking · 2024-03-05T12:42:31Z

Exactly for this reason I've proposed more than a decade ago to have the ability to transform JPQL into Criteria. That way I can write a base query in JPQL, and then use Criteria to add or remove certain parts to it.

FTR our implementation now supports this. But I'm not sure about other implementations.

arjantijms · 2024-03-05T12:46:11Z

That's completely the opposite of what the semantics should logically be!

I hear you, and I had been thinking about that a lot. It's not completely straightforward what is logical.

If you see the list as constraints and there simply are no constraints, evaluating to true might be logical. This often translates to how a choice from a UI works. If the user doesn't select any explicit country from a list, a table or whatever searches in all countries. If the user specifies one or more countries, the search is only for those specified countries.

However, if the list is seen as requirements, then evaluating to false is more logical. If my memory serves me correctly, around a decade ago exactly this was discussed on the JPA JIRA or list.

Maybe something like this could work?

<named-query name="Website.getByUserIdAndCountries" empty-collection-semantics="[true|false]">

There's maybe alternative ways to specific this, but the idea would be for the user to specify whether an empty collection should evaluate to effectively true or false.

arjantijms · 2024-03-05T12:47:54Z

FTR our implementation now supports this. But I'm not sure about other implementations.

That's extremely cool. Would be awesome to have this in the Jakarta spec. I'm sure there were various issues for that.

gavinking · 2024-03-05T12:51:24Z

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

gavinking · 2024-03-05T12:52:08Z

Would be awesome to have this in the Jakarta spec.

Sure, but it can only go in the spec if it's implementable by others.

lukasj · 2024-03-05T13:10:58Z

Maybe I'm simplifying it too much, but in practice I always write two queries now when there's 1 collection valued input. I then check if that that collection is not empty, and then I invoke query 1, otherwise query 2:
<named-query name="Website.getByUserIdAndCountries1">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    website.countryId in (:countryIds)
  </query>
 </named-query>
<named-query name="Website.getByUserIdAndCountries2">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId
  </query>
 </named-query>
The naming here is a little silly, but it's just for the example. I'd wish Jakarta Persistence would automatically omit the AND website.countryId in (:countryIds) when countryIds is empty.

It gets complicated and tedious to do manually in bigger queries and when there is more than 1 collection valued input parameter.

Why exactly are two queries required? Isn't sth like

<named-query name="Website.getByUserIdAndCountries">
  <query>
   SELECT 
    website
   FROM 
    Website website
   WHERE
    website.userId = :userId AND
    case when (:countryIds is not null) then website.countryId in (:countryIds) 
             else true
  </query>
 </named-query>

equivalent? Also be aware that every IN can be rewritten to JOIN (but not vice-versa).

However, if the list is seen as requirements, then evaluating to false is more logical.

Once nullable columns and/or NULLs come to play, what seems logical is not what boolean algebra defines.

lukasj · 2024-03-05T13:14:58Z

This often translates to how a choice from a UI works. If the user doesn't select any explicit country from a list, a table or whatever searches in all countries. If the user specifies one or more countries, the search is only for those specified countries.

the developer is always able to define his own, specific null-element to base his conditions in queries on

gavinking · 2024-03-05T13:15:36Z

case when (:countryIds is not null) then website.countryId in (:countryIds) else true

I think we disallow binding null to a collection-valued input param.

arjantijms · 2024-03-05T13:19:25Z

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

lukasj · 2024-03-05T13:21:10Z

case when (:countryIds is not null) then website.countryId in (:countryIds) else true

I think we disallow binding null to a collection-valued input param.

what about empty collection? that should not be null and is not null should return true only if it is not null and not empty... anyway, be it is not null or is not empty - my point is about defining the query correctly

gavinking · 2024-03-05T13:21:44Z

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

I have no idea. I can, however, promise you that in set theory x ∈ {} evaluates to false for every x.

lukasj · 2024-03-05T13:22:19Z

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

it is undefined

mysql> select 1 where 1 in ();
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1

arjantijms · 2024-03-05T13:31:30Z

@arjantijms This isn't just some arbitrary thing we get to pick and choose the semantics of based on convenience. The semantics are determined by logical consistency. That is, by math.

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

I have no idea. I can, however, promise you that in set theory x ∈ {} evaluates to false for every x.

Yes, and there is no element of {} for which any property P holds.

Then again, are SQL / relational databases strictly based on set theory, or only partially? Should especially JPQL provide conveniences in addition to math adherence, or should it ask its users to implement said conveniences?

Maybe omitting the part of the where clause entirely when the collection is empty should be provided by another mechanism than the empty collection semantics. I don't claim to have the answers here, just thinking out loud.

arjantijms · 2024-03-05T13:33:23Z

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

it is undefined

Indeed, it's the same on pretty much every dialect I'ver ever worked with (Mostly MySQL, Postgres, H2 and Derby). The question is mostly here: why is it undefined if set theory so clearly states the outcome should be false?

lukasj · 2024-03-05T13:38:55Z

I'm just wondering if it was so logical, why SQL dialects themselves have never defined foo in () as being false and not foo in () as being true.

it is undefined

Indeed, it's the same on pretty much every dialect I'ver ever worked with (Mostly MySQL, Postgres, H2 and Derby). The question is mostly here: why is it undefined if set theory so clearly states the outcome should be false?

no, I wasn't completely accurate - formally, it is defined to be UNKNOWN - not true, not false (like Boolean, which can be true, false and also null)

lukasj added Priority: Minor Type: Improvement labels Aug 31, 2018

lukasj mentioned this issue Jan 26, 2024

parametrized 'in' predicate lists #570

Closed

gavinking linked a pull request Jan 26, 2024 that will close this issue

nail down semantics of collection-valued query parameters #571

Merged

rfelcman mentioned this issue Feb 1, 2024

[master] Allow an empty collection or null as a parameter in an "IN" expression eclipse-ee4j/eclipselink#2054

Merged

lukasj closed this as completed in #571 Feb 26, 2024

gavinking closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2024

github-project-automation bot added this to 3.2.0 Aug 29, 2024

github-project-automation bot moved this to Done in 3.2.0 Aug 29, 2024

Allow an empty collection_valued_input_parameter in an "IN" expression #100

Allow an empty collection_valued_input_parameter in an "IN" expression #100

Comments

lukasj commented Feb 21, 2015

lukasj commented Feb 21, 2015

lukasj commented Dec 2, 2016

lukasj commented Dec 5, 2016

lukasj commented May 5, 2017

lukasj commented Aug 31, 2018

m-reza-rahman commented Apr 30, 2021

gavinking commented Apr 30, 2021

m-reza-rahman commented Apr 30, 2021 • edited Loading

arjantijms commented Oct 4, 2022

gavinking commented Jan 26, 2024

gavinking commented Jan 26, 2024

gavinking commented Feb 26, 2024

arjantijms commented Mar 4, 2024

lukasj commented Mar 4, 2024 • edited Loading

gavinking commented Mar 4, 2024

gavinking commented Mar 4, 2024

lukasj commented Mar 4, 2024

arjantijms commented Mar 4, 2024

gavinking commented Mar 4, 2024

gavinking commented Mar 4, 2024

quaff commented Mar 5, 2024

gavinking commented Mar 5, 2024

beikov commented Mar 5, 2024

arjantijms commented Mar 5, 2024

gavinking commented Mar 5, 2024

arjantijms commented Mar 5, 2024 • edited Loading

arjantijms commented Mar 5, 2024

gavinking commented Mar 5, 2024

gavinking commented Mar 5, 2024

lukasj commented Mar 5, 2024 • edited Loading

lukasj commented Mar 5, 2024

gavinking commented Mar 5, 2024

arjantijms commented Mar 5, 2024

lukasj commented Mar 5, 2024 • edited Loading

gavinking commented Mar 5, 2024 • edited Loading

lukasj commented Mar 5, 2024 • edited Loading

arjantijms commented Mar 5, 2024

arjantijms commented Mar 5, 2024

lukasj commented Mar 5, 2024

m-reza-rahman commented Apr 30, 2021 •

edited

Loading

lukasj commented Mar 4, 2024 •

edited

Loading

arjantijms commented Mar 5, 2024 •

edited

Loading

lukasj commented Mar 5, 2024 •

edited

Loading

lukasj commented Mar 5, 2024 •

edited

Loading

gavinking commented Mar 5, 2024 •

edited

Loading

lukasj commented Mar 5, 2024 •

edited

Loading