Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Language to Jakarta Data Proposal [Vote] #458

Closed
2 of 4 tasks
otaviojava opened this issue Feb 20, 2024 · 51 comments
Closed
2 of 4 tasks

Query Language to Jakarta Data Proposal [Vote] #458

otaviojava opened this issue Feb 20, 2024 · 51 comments
Labels
enhancement New feature or request vote Something to vote on

Comments

@otaviojava
Copy link
Contributor

otaviojava commented Feb 20, 2024

As a ...

  • Application user/user of the configuration itself
  • API user (application developer)
  • SPI user (container or runtime developer)
  • Specification implementer

I need to be able to ...

The Jakarta Data project aims to introduce a unified Query Language specification to facilitate seamless data querying across various databases, primarily focusing on SQL and NoSQL. The recent delay in the release of Jakarta presents an opportunity to include more features in the specification before the official release.

Which enables me to ...

The primary goal of this proposal is to initiate a discussion and a vote regarding the inclusion of the Jakarta Data project in the Jakarta EE 1.0 release.

Key Features:

  1. Query Language Specification: The Jakarta Data project will introduce a Query Language specification tailored to work across different database systems, focusing on supporting SQL and NoSQL databases.

  2. Supported Operations: Initially, the specification will support essential query operations such as selecting, deleting, inserting, and updating data. Subsequent versions may incorporate additional query functionalities.

  3. Compatibility: A significant challenge is ensuring compatibility between SQL and NoSQL databases. For instance, while certain operations like joins may not be universally supported across all databases, efforts will be made to accommodate such operations optionally or provide alternative approaches.

  4. Annotation Support: The project will include annotation support, such as the @Query annotation, allowing developers to specify whether to use native queries from the provider or the Jakarta Data Query Language.

Additional information

@Repository
public interface Garage {

  @Query("FOR car IN cars FILTER car._key == key RETURN car", native=true)
  Optional<Car> findNative(String key);
  // Running a native query language using ArangoDB query language

  // Running a Jakarta Data query
  @Query("select * from Car where key = @key")
  Optional<Car> findDataQuery(String key);
}

Scope for Discussion:

  1. Inclusion in Jakarta EE 1.0: Discuss whether it is appropriate to include the Jakarta Data project in the Jakarta EE 1.0 release, considering its potential benefits and impact on the ecosystem.

  2. Compatibility and Extensibility: Evaluate the proposed Query Language specification's compatibility with different database systems and its extensibility for future enhancements.

BNF to Start:

<Query> ::= <Select> | <Delete> | <Insert> | <Update>

<Select> ::= 'select' <Fields> 'from' <Entity> <Where>? <Skip>? <Limit>? <Order>? EOF
<Delete> ::= 'delete' <DeleteFields>? 'from' <Entity> <Where>? EOF
<Insert> ::= 'insert' <Entity> (<Conditions> | <JSON>) <TTL>? EOF
<Update> ::= 'update' <Entity> (<Conditions> | <JSON>) EOF

<Fields> ::= <Star> | <Name> (',' <Name>)*
<DeleteFields> ::= <Name> (',' <Name>)*
<Conditions> ::= '(' <Changes> ')'
<Order> ::= 'order' 'by' <OrderName> (<OrderName>)*
<Where> ::= 'where' <Condition> (<And> <Condition> | <Or> <Condition>)*

<Change> ::= <Name> '=' <Value>
<Changes> ::= <Change> (',' <Change>)*
<Key> ::= <Value>
<Keys> ::= <Value> (',' <Value>)*

<Value> ::= (<Number> | <String> | <Bool> | <Array> | <Function> | <Parameter> | <JSON>)
<ValueString> ::= <String> | <Parameter>
<Name> ::= ANY_NAME
<Entity> ::= ANY_NAME

<OrderName> ::= <Name> | <Name> <Asc> | <Name> <Desc>

<Asc> ::= 'asc'
<Desc> ::= 'desc'
<And> ::= 'and'
<Or> ::= 'or'
<TTL> ::= INT <Unit>
<Unit> ::= 'day' | 'hour' | 'minute' | 'second' | 'millisecond' | 'nanosecond'

<Condition> ::= <EQ> | <GT> | <GTE> | <LT> | <LTE> | <Between> | <In> | <Like>
<EQ> ::= <Not>? <Name> '=' <Value>
<GT> ::= <Not>? <Name> '>' <Value>
<GTE> ::= <Not>? <Name> '>=' <Value>
<LT> ::= <Not>? <Name> '<' <Value>
<LTE> ::= <Not>? <Name> '<=' <Value>
<Between> ::= <Not>? <Name> 'between' <Value> 'and' <Value>
<In> ::= <Name> <Not>? 'in' '(' <Value> (',' <Value>)* ')'
<Like> ::= <Name> <Not>? 'like' <ValueString>

<Not> ::= 'not'

<JSON> ::= <ObjectJSON> | <ArrayJSON>
<ObjectJSON> ::= '{' <PairJSON> (',' <PairJSON>)* '}' | '{' '}'
<PairJSON> ::= <String> ':' <ValueJSON>
<ValueJSON> ::= <String> | <Bool> | <Number> | <ObjectJSON> | <ArrayJSON> | 'null'
<ArrayJSON> ::= '[' <ValueJSON> (',' <ValueJSON>)* ']' | '[' ']'

<Function> ::= 'convert'
<Convert> ::= 'convert(' <Element> ',' <Name> ')'

<Element> ::= <Number> | <String>
<Number> ::= '-'? (<NUMBER> | <INT>)
<String> ::= STRING
<Bool> ::= BOOLEAN

<Array> ::= '{' <Element> (',' <Element>)* '}'
<Parameter> ::= PARAMETER

<STRING> ::=  '\'' ( <ESC> | ~('\\'|'\'') )* '\'' |'"' ( <ESC> | ~('\\'|'"') )* '"'
<INT> ::= [0-9]+
<NUMBER> ::= <INT> [.]? <INT>?
<BOOLEAN> ::= 'true' | 'false'
<ANY_NAME> ::= [a-zA-Z_.][a-zA-Z_.0-9-]*
<PARAMETER> ::= '@' <ANY_NAME>
<WS> ::= [ \t\r\n]+
<SL_COMMENT> ::= '//' .*? '\n'

<ESC> ::= '\\' (["\\/bfnrt] | <UNICODE>)
<UNICODE> ::= 'u' <HEX> <HEX> <HEX> <HEX>
<HEX> ::= [0-9a-fA-F]
@gavinking
Copy link
Contributor

Excellent, this is definitely going in the direction I have been hoping!

I have some feedback on detail, but I'll keep that to myself for now, so as not to poison the well while others look over what you've proposed. Looking forward to discussing it on the call tomorrow.

@njr-11
Copy link
Contributor

njr-11 commented Feb 20, 2024

If we are going to add standardized query language in version 1.0, I would recommend that it be defined as a subset of JPQL from Jakarta Persistence (whichever subset is sufficiently common with NoSQL). JPQL is a well established standard and would make sense to reuse from it, otherwise I think it would be too much to get right in version 1.0. Whatever we define, we will need to remain compatible with going forward to avoid breaking changes. I don't think that would be a concern with a JPQL subset, and it looks like that would mostly align with the capability that you have in your example BNF.

@otaviojava
Copy link
Contributor Author

If we are going to add standardized query language in version 1.0, I would recommend that it be defined as a subset of JPQL from Jakarta Persistence (whichever subset is sufficiently common with NoSQL). JPQL is a well established standard and would make sense to reuse from it, otherwise I think it would be too much to get right in version 1.0. Whatever we define, we will need to remain compatible with going forward to avoid breaking changes. I don't think that would be a concern with a JPQL subset, and it looks like that would mostly align with the capability that you have in your example BNF.

I am fine, I just propose something to start. If it makes sense we can go further with it.

@gavinking
Copy link
Contributor

gavinking commented Feb 21, 2024

OK, just to give some specific feedback on the proposal here, after our discussion today:

  1. It seems to me that the inclusion of JSON literals goes too far for what we should do int 1.0.
  2. Since we already have an API way to do pagination, I'm not sure we need skip/limit in the query language (at least for now).
  3. A minor point: the @name parameter syntax diverges from JPA, which uses :name and ?1. [I'm not especially attached to any particular character here, but I would prefer things to be uniform unless there's a good reason.]
  4. What I really do want to see is a list of functions that can be supported portably.
  5. And I would definitely include arithmetic operators as well.
  6. I'm not sure what the <TTL> rule does.
  7. But on the other hand I'm not sure we need insert at this stage.
  8. I don't think I understand 'delete' <DeleteFields>? 'from' <Entity>.
  9. We seem to be missing is null.
  10. Not sure we need comments, but if we do, I would go with /* ... */ instead of line-end.
  11. We do need path expressions, in my opinion.
  12. Similarly, I would totally include grouping parentheses.

@gavinking
Copy link
Contributor

So, @njr-11 suggested starting from a cut-down version of JPQL.

Here's what that might look like. Note that I have diverged from JPA in at least two respects:

  1. No identification variables for entities (I've already proposed making them optional in JPQL.)
  2. All clauses of the select statement are optional! Believe it or not, this makes sense, since we can infer things from the query method signature.

This is going to look a bit scary, because the JPQL grammar is defined in a very verbose way, mixing in typing rules with parsing rules.

QL_statement ::= select_statement | update_statement | delete_statement
select_statement ::= [select_clause] [from_clause] [where_clause] [orderby_clause]
update_statement ::= update_clause [where_clause]
delete_statement ::= delete_clause [where_clause]

from_clause ::= FROM entity_name

update_clause ::= UPDATE entity_name SET update_item {, update_item}*
update_item ::=
    {single_valued_embeddable_object_field.}*
    {state_field | single_valued_object_field}
    = {scalar_expression | NULL}

delete_clause ::= DELETE FROM entity_name

select_clause ::= SELECT select_item {, select_item}*
select_item ::= select_expression
select_expression ::= state_field_path_expression | scalar_expression

where_clause ::= WHERE conditional_expression

orderby_clause ::= ORDER BY orderby_item {, orderby_item}*
orderby_item ::= state_field_path_expression [ASC | DESC]

scalar_expression ::=
    arithmetic_expression |
    string_expression |
    enum_expression |
    datetime_expression |
    boolean_expression |
    case_expression

conditional_expression ::= conditional_term | conditional_expression OR conditional_term
conditional_term ::= conditional_factor | conditional_term AND conditional_factor
conditional_factor ::= [NOT] conditional_primary
conditional_primary ::= simple_cond_expression | (conditional_expression)
simple_cond_expression ::=
    comparison_expression |
    between_expression |
    in_expression |
    like_expression |
    null_comparison_expression

between_expression ::=
    arithmetic_expression [NOT] BETWEEN arithmetic_expression AND arithmetic_expression |
    string_expression [NOT] BETWEEN string_expression AND string_expression |
    datetime_expression [NOT] BETWEEN datetime_expression AND datetime_expression

in_expression ::= state_field_path_expression [NOT] IN (in_item{, in_item}*)
in_item ::= literal | input_parameter

like_expression ::= string_expression [NOT] LIKE pattern_value

null_comparison_expression ::=
    {state_field_path_expression | input_parameter} IS [NOT] NULL

comparison_expression ::=
    string_expression {= | > | >= | < | <= | <>} string_expression |
    boolean_expression {= | <>} boolean_expression |
    enum_expression {= | <>} enum_expression |
    datetime_expression {= | > | >= | < | <= | <>} datetime_expression |
    entity_expression {= | <>} entity_expression |
    arithmetic_expression {= | > | >= | < | <= | <>} arithmetic_expression

arithmetic_expression ::=
    arithmetic_term | arithmetic_expression {+ | -} arithmetic_term
arithmetic_term ::= arithmetic_factor | arithmetic_term {* | /} arithmetic_factor
arithmetic_factor ::= [+ | -] arithmetic_primary
arithmetic_primary ::=
    state_field_path_expression |
    numeric_literal |
    (arithmetic_expression) |
    input_parameter |
    functions_returning_numerics |
    case_expression

string_expression ::=
    state_field_path_expression |
    string_literal |
    input_parameter |
    functions_returning_strings |
    case_expression |
    string_expression || string_expression

datetime_expression ::=
    state_field_path_expression |
    input_parameter |
    functions_returning_datetime |
    case_expression |
    date_time_timestamp_literal

boolean_expression ::=
    state_field_path_expression |
    boolean_literal |
    input_parameter |
    case_expression

enum_expression ::=
    state_field_path_expression |
    enum_literal |
    input_parameter |
    case_expression

entity_expression ::=
    single_valued_object_path_expression |
    input_parameter

single_valued_object_path_expression ::= single_valued_object_field{.single_valued_object_field}*
state_field_path_expression ::= single_valued_object_path_expression.state_field

boolean_literal = TRUE | FALSE

case_expression ::= general_case_expression | simple_case_expression
general_case_expression::= CASE when_clause {when_clause}* ELSE scalar_expression END
when_clause ::= WHEN conditional_expression THEN scalar_expression
simple_case_expression ::=
    CASE state_field_path_expression simple_when_clause {simple_when_clause}*
    ELSE scalar_expression
    END
simple_when_clause ::= WHEN scalar_expression THEN scalar_expression

Note that this grammar does not include lexical rules.

@otaviojava
Copy link
Contributor Author

OK, just to give some specific feedback on the proposal here, after our discussion today:

  1. It seems to me that the inclusion of JSON literals goes too far for what we should do int 1.0.
  2. Since we already have an API way to do pagination, I'm not sure we need skip/limit in the query language (at least for now).
  3. A minor point: the @name parameter syntax diverges from JPA, which uses :name and ?1. [I'm not especially attached to any particular character here, but I would prefer things to be uniform unless there's a good reason.]
  4. What I really do want to see is a list of functions that can be supported portably.
  5. And I would definitely include arithmetic operators as well.
  6. I'm not sure what the <TTL> rule does.
  7. But on the other hand I'm not sure we need insert at this stage.
  8. I don't think I understand 'delete' <DeleteFields>? 'from' <Entity>.
  9. We seem to be missing is null.
  10. Not sure we need comments, but if we do, I would go with /* ... */ instead of line-end.
  11. We do need path expressions, in my opinion.
  12. Similarly, I would totally include grouping parentheses.

I agree!
As I mentioned today, it was the best that I could do it in a couple of minutes to express the idea. Once we agree to move it forward. we can refine it.

@gavinking
Copy link
Contributor

Note also that:

  1. I didn't include functions, and
  2. you might prefer to leave out case expressions in a first release.

@njr-11
Copy link
Contributor

njr-11 commented Feb 21, 2024

  1. No identification variables for entities (I've already proposed making them optional in JPQL.)

That will be nice if your proposal for identification variables makes it into Jakarta Persistence 3.2. Do you know when they will decide on it? Otherwise we woiuld need to choose between including identification variables for now and making optional later to coincide with Jakarta Persistence in the future, or wait on the query language until next version to line up with Jakarta Persistence.

  1. All clauses of the select statement are optional! Believe it or not, this makes sense, since we can infer things from the query method signature.

At first I thought that was a mistake, but it does make sense it can be inferred. It will depend on being able to remove identification variables. If it turns out we cannot, then I think it would be too much to ask of Jakarta Data providers to need to figure out the identification variable from the remainder of the where_clause or orderby_clause. Ideally, the query language should just be passed through to the Jakarta Persistence (or NoSQL/other) provider and not need to be parsed by the Jakarta Data provider.

Overall this looks great and headed in the right direction. I spotted a few minor things, probably typos: string_expression || string_expression and
boolean_literal = TRUE | FALSE which should be boolean_literal ::= TRUE | FALSE

It doesn't make any difference to me whether to include case in version 1. I'll let Otavio comment on whether it makes sense for NoSQL compatibility.

@gavinking
Copy link
Contributor

That will be nice if your proposal for identification variables makes it into Jakarta Persistence 3.2. Do you know when they will decide on it?

I think we should assume it won't be for JPA 3.2, and we will have to wait for JPA 4.

I mean, I guess could push for it i.e. nag @lukasj about it....

Otherwise we would need to choose between including identification variables for now and making optional later to coincide with Jakarta Persistence in the future, or wait on the query language until next version to line up with Jakarta Persistence.

I guess I don't really see it in quite those terms. It's a sort of "trivial" transformation to go adding an identification var at the start of each path, given how simplistic the above grammar is. That's something that's even pretty straightforward to do at annotation-processing time.

But on the other hand note that Hibernate already supports this so I know very well that I won't have to do this transformation. [So I guess I'm biased.]

Of course we can put the identification variable back in to the grammar, and make it optional later, but it's pretty redundant here unless we're going to allow multiple entities in the from clause, which I guess I assumed we were not going to allow.

Ideally, the query language should just be passed through to the Jakarta Persistence (or NoSQL/other) provider and not need to be parsed by the Jakarta Data provider.

Yes, for sure that's the easiest way. It's certainly how I'm going to do it.

But IMO, it's a nongoal at this stage to provide for interop between Jakarta Data providers and Jakarta Persistence providers. So the repository implementation is not really limited to calling the JPA provider via JPA-standard APIs. So, for example, our implementation of Jakarta Data is going to support Hibernate, but probably not EclipseLink, at least not initially. And Hibernate already accepts this syntax for years. If someone wants to make an EclipseLink-based Jakarta Data provider, then they can build support for identification variable-free queries directly into EclipseLink, assuming it doesn't already tolerate that, without waiting for permission from the JPA spec.

So what my contention is that it's OK if the JPA spec trails a bit behind the implementations in this respect.

@gavinking
Copy link
Contributor

That's something that's even pretty straightforward to do at annotation-processing time.

Note, FTR, that I do waaaaaay more magical things than this in our annotation processor: I not only parse JPQL, but I completely type check the JPQL against the entity types in the compilation unit and spit out appropriate compilation errors.

@gavinking
Copy link
Contributor

string_expression || string_expression

Heh, no, actually, that's the standard concatenation operator ||.

Yeah the syntax used in the JPA spec is a bit ambiguous.

@gavinking
Copy link
Contributor

That will be nice if your proposal for identification variables makes it into Jakarta Persistence 3.2.

For the record, the relevant issue is: jakartaee/persistence#452

@njr-11
Copy link
Contributor

njr-11 commented Feb 22, 2024

string_expression || string_expression

Heh, no, actually, that's the standard concatenation operator ||.

Yeah the syntax used in the JPA spec is a bit ambiguous.

Nice - I was comparing with the BNF in the Jakarta Persistence 3.1 spec which didn't have ||. But I do see that 3.2 states that it "Adds || string concatenation operator to Jakarta Persistence QL". No issue there then.

@gavinking
Copy link
Contributor

gavinking commented Feb 25, 2024

So I wanted to keep aggregate functions out of this for this release, but I now realize that there's precisely one aggregate function we are going to need, and it's count(*).

This is a hard requirement in order to be able to fix the following abomination in BasicRepository:

long countBy();

When I saw first this I thought it was a mistake. And then I realized what was really going on. It's because it's a magical method name query. 😒

So once we have a proper query language, it can be redefined as:

@Query("select count(*)")
long countAll();

Similarly, "exists" queries can be written in terms of "select count(*)>0". This would surely let us come up with a better name than existsById().

I note, for the record, that even for the very first, most primitive repository, magical method name queries already lead to unnatural naming. We gotta get away from that stuff.

@gavinking
Copy link
Contributor

gavinking commented Feb 25, 2024

In terms of regular (i.e. non-aggregate) functions, the ones I would propose are:

  • length(), upper(), lower(), left(), right() for strings,
  • abs() for numeric values (here I'm uncertain ... there's more functions we could include here)
  • local date and local datetime
  • coalesce() is very useful but can it be easily implemented on all kinds of database???

Those, together with count(*) are probably enough for a first release.

@gavinking
Copy link
Contributor

gavinking commented Feb 25, 2024

Finally, I wonder what further simplifications we could reasonably make to the grammar I posted above, just to make it as easy as possible to implement for a first release:

  • as I mentioned, case expressions aren't absolutely necessary
  • we could (and probably should) leave out date/time literals
  • we could limit the select clause to a single select item, so just select_clause ::= SELECT select_item
  • we could leave out update queries in the first release, and add them back in later, along with insert ... select ... statements*

On the other hand, I just realized that even though I stripped MEMBER OF and IS [NOT] EMPTY out of the grammar I posted above, the current spec does talk about these operations, so if you guys think they're needed I can put them back in.


* Note: the spec currently mentions that magical method name queries can have <action> = "update", but then there's no more mention of this anywhere that I can find, and I don't see how this would actually work without some sort of set operator, so I don't think update queries actually exist in a well-defined form as of today.

@gavinking
Copy link
Contributor

Here's an update with:

  • the functions mentioned above added in, and
  • case expressions, date/time literals, and entity-valued expressions removed.

I have not restored member of or empty expressions because Jakarta Data simply doesn't have collection-valued members as of today, and so the Contains and Empty operators in 4.5.2 of the spec just aren't well-defined.

So the following is a well-defined and strict superset of what is current possible with method name queries.

QL_statement ::= select_statement | update_statement | delete_statement

select_statement ::= [select_clause] [from_clause] [where_clause] [orderby_clause]
update_statement ::= update_clause [where_clause]
delete_statement ::= delete_clause [where_clause]

from_clause ::= FROM entity_name

update_clause ::= UPDATE entity_name SET update_item {, update_item}*
update_item ::=
    {single_valued_embeddable_object_field.}* state_field
    = {scalar_expression | NULL}

delete_clause ::= DELETE FROM entity_name

select_clause ::= SELECT select_item {, select_item}*
select_item ::= select_expression
select_expression ::=
    state_field_path_expression |
    scalar_expression |
    aggregate_expression
aggregate_expression ::= COUNT(*)

where_clause ::= WHERE conditional_expression

orderby_clause ::= ORDER BY orderby_item {, orderby_item}*
orderby_item ::= state_field_path_expression [ASC | DESC]

state_field_path_expression ::= {single_valued_object_field.}* state_field

conditional_expression ::= conditional_term | conditional_expression OR conditional_term
conditional_term ::= conditional_factor | conditional_term AND conditional_factor
conditional_factor ::= [NOT] conditional_primary
conditional_primary ::= simple_cond_expression | (conditional_expression)

simple_cond_expression ::=
    comparison_expression |
    between_expression |
    in_expression |
    like_expression |
    null_comparison_expression

comparison_expression ::=
    arithmetic_expression {= | > | >= | < | <= | <>} arithmetic_expression |
    string_expression {= | > | >= | < | <= | <>} string_expression |
    datetime_expression {= | > | >= | < | <= | <>} datetime_expression |
    boolean_expression {= | <>} boolean_expression |
    enum_expression {= | <>} enum_expression

between_expression ::=
    arithmetic_expression [NOT] BETWEEN arithmetic_expression AND arithmetic_expression |
    string_expression [NOT] BETWEEN string_expression AND string_expression |
    datetime_expression [NOT] BETWEEN datetime_expression AND datetime_expression

in_expression ::= state_field_path_expression [NOT] IN (in_item {, in_item}*)
in_item ::= literal | input_parameter

like_expression ::= string_expression [NOT] LIKE pattern_value

null_comparison_expression ::= state_field_path_expression IS [NOT] NULL

scalar_expression ::=
    arithmetic_expression |
    string_expression |
    datetime_expression |
    boolean_expression |
    enum_expression

arithmetic_expression ::=
    arithmetic_term | arithmetic_expression {+ | -} arithmetic_term
arithmetic_term ::= arithmetic_factor | arithmetic_term {* | /} arithmetic_factor
arithmetic_factor ::= [+ | -] arithmetic_primary
arithmetic_primary ::=
    state_field_path_expression |
    numeric_literal |
    (arithmetic_expression) |
    input_parameter |
    functions_returning_numerics
functions_returning_numerics ::=
    ABS(arithmetic_expression) |
    LENGTH(string_expression) |

string_expression ::=
    state_field_path_expression |
    string_literal |
    input_parameter |
    functions_returning_strings |
    string_expression || string_expression
functions_returning_strings ::=
    LOWER(string_expression) |
    UPPER(string_expression) |
    LEFT(string_expression, arithmetic_expression) |
    RIGHT(string_expression, arithmetic_expression)

datetime_expression ::=
    state_field_path_expression |
    input_parameter |
    functions_returning_datetime
functions_returning_datetime ::=
    LOCAL DATE |
    LOCAL DATETIME

boolean_expression ::=
    state_field_path_expression |
    boolean_literal |
    input_parameter
boolean_literal ::= TRUE | FALSE

enum_expression ::=
    state_field_path_expression |
    enum_literal |
    input_parameter

Even if this grammar looks a bit verbose, it's really very straightforward and super-easy to implement.

@gavinking
Copy link
Contributor

A word about the left() and right() functions. I added these in when I saw that section 4.5.2 has StartsWith and EndsWith predicates. But then I remembered that you can emulate those with like.

Still, we probably would want to have a way to chop up strings. JPQL has a substring() function that diverges from the syntax of ANSI SQL, but is accepted natively by some databases. I tend to find left() and right() a bit more convenient most of the time, but either option is fine.

@njr-11
Copy link
Contributor

njr-11 commented Feb 26, 2024

So once we have a proper query language, it can be redefined as:

@Query("select count(*)")
long countAll();

The above is one way to fix the countBy() method name. Another way would be to have @Count that would behave the same as @Find except that it returns a count rather than query results. Then you could do,

@Count
long countAll();

@njr-11
Copy link
Contributor

njr-11 commented Feb 26, 2024

To clarify, don't interpret my prior comment as an argument against query language. I just wanted to point out one of the other possible approaches. It would be nice to eventually end up with both.

@gavinking
Copy link
Contributor

gavinking commented Feb 26, 2024

. Another way would be to have @Count that would behave the same as @Find except that it returns a count rather than query results.

Yeah, I've toyed with that idea a bit myself. The reasons I'm not really a fan of it is that:

  1. it only works when the repository has a primary entity type, and
  2. IMO, count(*) queries are not really so common that they really deserve this special treatment.

Of course, on 2, YMMV.

Note that there is one case where count(*) is useful, and that's for pagination. But Jakarta Data already has a good way to deal with that: Page.totalElements(). That eliminates a major motivation for needing to write such queries explicitly.

@njr-11
Copy link
Contributor

njr-11 commented Feb 26, 2024

Note that there is one case where count(*) is useful, and that's for pagination. But Jakarta Data already has a good way to deal with that: Page.totalElements(). That eliminates a major motivation for needing to write such queries explicitly.

That brings up another, even simpler option. Just remove countBy() and don't replace it in version 1.0. I agree it seems like something you would rarely use, and if a user does want it, they can write a query or if they really want use the awkward Query by Method Name approach.

@gavinking
Copy link
Contributor

That brings up another, even simpler option. Just remove countBy() and don't replace it in version 1.0.

That would be great.

@gavinking
Copy link
Contributor

gavinking commented Mar 4, 2024

Rough draft of a section on expressions

Expressions

An expression is a sequence of tokens to which a Java type can be assigned, and which evaluates to a well-defined value when the query is executed. In JDQL, expressions may be categorized as:

  • literals,
  • special values,
  • parameters,
  • enum literals,
  • paths,
  • function calls, and
  • operator expressions.

A string, integer, or decimal literal is assigned the type it would be assigned in Java. So, for example, 'Hello' is assigned the type java.lang.String, 123 is assigned the type int, 1e4 is assigned the type double, and 1.23f is assigned the type float. The syntax for literal expressions is given by the literal grammar rule, and in the previous section titled Lexical structure.

When executed, a literal expression evaluates to its literal value.

The special values true and false are assigned the type boolean, and evaluate to their literal values. The special values local date and local datetime are assigned the types java.time.LocalDate and java.time.LocalDateTime, and evaluate to the current date and current datetime of the database server, respectively. The syntax for special values is given by the special_expression grammar rule.

A parameter expression, with syntax given by input_parameter, is assigned the type of the repository method parameter it matches. For example, the parameter :titlePattern is assigned the type java.lang.String:

@Query("where title like :titlePattern")
List<Book> booksMatchingTitle(String titlePattern);

When executed, a parameter expression evaluates to the argument supplied to the parameter of the repository method.

An enum literal expression is a Java identifier, with syntax specified by enum_literal, and may only occur as the right operand of a set assignment or =/<> equality comparison. It is assigned the type of the left operand of the assignment or comparison. The type must be a Java enum type, and the identifier must be the name of an enumerated value of the enum type. For example, day <> MONDAY is a legal comparison expression.

When executed, an enum expression evaluates to the named member of the Java enum type.

A path expression is a period-separated list of Java identifiers, with syntax specified by state_field_path_expression. Each identifier is interpreted as the name of a field of an entity or embeddable class. Each prefix of the list is assigned a Java type:

  • the first element of the list is assigned the type of the named field of the entity being queried, and
  • each subsequent element is assigned the type of the named field of the type assigned to the previous element.

The type of the whole path expression is the type of the last element of the list. For example, pages is assigned the type int, address is assigned the type org.example.Address, and address.street is assigned the type java.lang.String.

When executed, a path expression is evaluated in the context of a given record of the queried entity type, and evaluates to the value of the entity field for the given record.

A function call is the name of a JDQL function, followed by a parenthesized list of argument expressions, with syntax given by function_expression.

  • The abs() function is assigned the type of its numeric argument, and evaluates to the absolute value of the numeric value to which its argument evaluates.

  • The length() function is assigned the type java.lang.Integer, and evaluates to the length of string to which its argument evaluates.

  • The lower() function is assigned the type java.lang.String, and evaluates to the lowercase form of the string to which its argument evaluates.

  • The upper() function is assigned the type java.lang.String, and evaluates to the uppercase form of the string to which its argument evaluates.

  • The left() function is assigned the type java.lang.String, and evaluates to a prefix of the string to which its first argument evaluates. The length of the prefix is given by the integer value to which its second argument evaluates.

  • The right() function is assigned the type java.lang.String, and evaluates to a suffix of the string to which its first argument evaluates. The length of the suffix is given by the integer value to which its second argument evaluates.

When any argument expression of any function call evaluates to a null value, the whole function call evaluates to null.

The syntax of an operator expression is given by the scalar_expression rule. Within an operator expression, parentheses indicate grouping.

The concatenation operator || is assigned the type java.lang.String, and may be applied to operand expressions of type java.lang.String. When executed, a concatenation operator expression evaluates to a new string concatenating the strings to which its arguments evaluate.

The numeric operators +, -, *, and / have the same meaning for primitive numeric types they have in Java, and operator expression involving these operators are assigned the types they would be assigned in Java.

NOTE: As an exception, when the operands of / are both integers, a JDQL implementation is not required to interpret the operator expression as integer division if that is not the native semantics of the database. However, portability is maximized when Jakarta Data providers do interpret such an expression as integer division.

The four numeric operators may also be applied to an operand of wrapper type, for example, to java.lang.Integer or java.lang.Double. In this case, the operator expression is assigned a wrapper type, and evaluates to a null value when either of its operands evaluates to a null value. When both operands are non-null, the semantics are identical to the semantics of an operator expression involving the corresponding primitive types.

The four numeric operators may also be applied to operands of type java.math.BigInteger or java.math.BigDecimal.

The type assigned to an operator expression depends on the types of its operand expression, which need not be identical. The rules for numeric promotion are given in section 4.7 of the Jakarta Persistence specification version 3.2:

  • If there is an operand of type Double or double, the expression is of type Double;
  • otherwise, if there is an operand of type Float or float, the expression is of type Float;
  • otherwise, if there is an operand of type BigDecimal, the expression is of type BigDecimal;
  • otherwise, if there is an operand of type BigInteger, the expression is of type BigInteger, unless the operator is / (division), in which case the expression type is not defined here;
  • otherwise, if there is an operand of type Long or long, the expression is of type Long, unless the operator is / (division), in which case the expression type is not defined here;
  • otherwise, if there is an operand of integral type, the expression is of type Integer, unless the operator is / (division), in which case the expression type is not defined here.

A numeric operator expression is evaluated according to the native semantics of the database. In translating an operator expression to the native query language of the database, a Jakarta Data provider is encouraged, but not required, to apply reasonable transformations so that evaluation of the expression more closely mimics the semantics of the Java language.

@lukasj
Copy link

lukasj commented Mar 4, 2024

would it make sense to take JPQL out from the persistence spec and split it into 2+ parts ("core", extensions for RDBS, extensions for NoSQL,...) instead? That would allow consistency and interoperability of the QL across Jakarta specs as well as open an option to having more independent implementations of the parser itself (...and some of them possibly in non-Java language)

@gavinking
Copy link
Contributor

would it make sense to take JPQL out from the persistence spec and split it into 2+ parts ("core", extensions for RDBS, extensions for NoSQL,...) instead?

Yes it would totally make sense. I would love to see that, and I would love to work on it. BUT:

  1. Doing this would require a substantial rewrite of the spec for JPQL, where a lot of things are currently quite underspecified or "specified" by a semi-implicit handwave to SQL. If there were a standalone Jakarta Query spec, it would have to actually say what everything really means, and currently Jakarta Persistence doesn't completely do that. (And arguably it didn't really need to in the past because we were only targeting translation to SQL.) So there's quite a lot of work there.
  2. Jakarta Data needs to get something out on a very limited timeframe, and so we don't want to mess that up by opening up a coordination problem between specs.

On the other hand, with those caveats stated, the stuff we write down now could be used as a starting point for such a "substantial rewrite". That is to say, the JDQL spec we produce here could eventually be the "core" part of a new Jakarta Query spec.

Of course we need to make sure the two languages don't diverge. But I'm very confident that this is achievable. And @lukasj it would be awesome if you could keep your finger on what is going on here.

@gavinking
Copy link
Contributor

gavinking commented Mar 4, 2024

Strawman for conditional expressions, where I have taken care to not require an implementation based on ternary logic. Is that the right approach??

Conditional expressions

A conditional expression is a sequence of tokens which specifies a condition which, for a given record, might be satisfied or unsatisfied. Unlike the scalar Expressions defined in the previous section, a conditional expression is not considered to have a well-defined type.

NOTE: JPQL defines the result of a conditional expression in terms of ternary logic. JDQL does not specify that a conditional expression evaluates to well-defined value, only the effect of the conditional expression when it is used as a restriction. The "value" of a conditional expression is not considered observable by the application program.

Conditional expressions may be categorized as:

  • null comparisons,
  • in expressions,
  • between expressions,
  • like expressions,
  • equality and inequality operator expressions, and
  • logical operator expressions.

The syntax for conditional expressions is given by the conditional_expression rule. Within a conditional expression, parentheses indicate grouping.

A null comparison, with syntax given by null_comparison_expression is satisfied when:

  • the not keyword is missing, and its operand evaluates to a null value, or
  • the not keyword occurs, and its operand evaluates to any non-null value.

An in expression, with syntax given by in_expression is satisfied when its leftmost operand evaluates to a non-null value, and:

  • the not keyword is missing, and any one of its parenthesized operands evaluates to the same value as its leftmost operand, or
  • the not keyword occurs, and none of its parenthesized operands evaluate to the same value as its leftmost operand.

A between expression, with syntax given by between_expression is satisfied when its operands all evaluate to non-null values, and, if the not keyword is missing, its left operand evaluates to a value which is:

  • larger than or equal to the value take by its middle operand, and
  • smaller than or equal to the value taken by its right operand.

Or, if the not keyword occurs, the left operand must evaluate to a value which is:

  • strictly smaller than to the value take by its middle operand, or
  • strictly larger than the value taken by its right operand.

A like expression is satisfied when its left operand evaluates to a non-null value and:

  • the not keyword is missing, and this value matches the pattern, or
  • the not keyword occurs, and the value does not match the pattern.

Within the pattern, _ matches any single character, and % matches any sequence of characters.

The equality and inequality operators are =, <>, <, >, <=, >=.

  • For primitive types, these operators have the same meaning they have in Java, except for <> which has the same meaning that != has in Java. Such an operator expression is satisfied when the equivalent operator expression would evaluate to true in Java.
  • For wrapper types, these operators are satisfied if both operands evaluate to non-null values, and the equivalent operator expression involving primitives would be satisfied.
  • For other types, these operators are evaluated according to the native semantics of the database.

NOTE: Portability is maximized when Jakarta Data providers interpret equality and inequality operators in a manner consistent with the implementation of Object.equals() or Comparable.compareTo() for the assigned Java type.

NOTE: For string values, a database might have a different collation algorithm to Java. In evaluating an inequality involving string operands, an implementation of JDQL is not required to emulate Java collation.

The logical operators are and, or, and not.

  • An and operator expression is satisfied if and only if both its operands are satisfied.
  • An or operator expression is satisfied if and only if at least one of its operands are satisfied.
  • A not operator expression is never satisfied if its operand is satisfied.

This specification leaves undefined the interpretation of the not operator when its operand is not satisfied.

CAUTION: A compliant implementation of JDQL might feature SQL/JPQL-style ternary logic, where not n > 0 is an unsatisfied logical expression when n evaluates to null, or it might feature binary logic where the same expression is considered satisfied. Application programmers should take great care when using the not operator with scalar expressions involving null values.

@njr-11
Copy link
Contributor

njr-11 commented Mar 4, 2024

the stuff we write down now could be used as a starting point for such a "substantial rewrite". That is to say, the JDQL spec we produce here could eventually be the "core" part of a new Jakarta Query spec.

This is an excellent idea. I agree that given the timeframe, the only achievable approach for EE 11 will be to put the subset query language in Jakarta Data, but as long as we are careful to ensure it is a subset of JPQL, it should be possible to move it to a Jakarta Query spec for both specs to use in EE 12 without breaking compatibility.

@otaviojava
Copy link
Contributor Author

would it make sense to take JPQL out from the persistence spec and split it into 2+ parts ("core", extensions for RDBS, extensions for NoSQL,...) instead? That would allow consistency and interoperability of the QL across Jakarta specs as well as open an option to having more independent implementations of the parser itself (...and some of them possibly in non-Java language)

I love the idea; let's see the others.

@njr-11
Copy link
Contributor

njr-11 commented Mar 4, 2024

The above issue comments with draft specification text seem like they represent a level of capability that make sense for Jakarta Data to include in a subset of JPQL. I think the biggest concern will be ensuring all the details fully line up with Jakarta Persistence (hopefully we can get participants from Jakarta Persistence to help review/confirm that as well when we are further along) without defining anything incompatible. In general, I would say these look great and are very well written.

@gavinking
Copy link
Contributor

Going to need information on which parts of the expression language are not required on specific kinds of datastore technology. A lot of that information is in section 4.6.2 I suppose. But I'm a bit surprised by the extent of the limitations listed there. For example: is it really true that in can't be implemented on a document database? It seems to me that is you have and and = then you can emulate in.

@gavinking
Copy link
Contributor

gavinking commented Mar 5, 2024

With this last bit, I believe the language is close to fully-specified. (Though I believe I still need to add some more info on typing rules.)

Clauses

Each JDQL statement is built from a sequence of clauses. The beginning of a clause is identified by a keyword: from, where, select, set, or order.

There is a logical ordering of clauses, reflecting the order in which their effect must be computed by the datastore:

  1. from
  2. where,
  3. select or set,
  4. order

The interpretation and effect of each clause in this list is influenced by clauses occurring earlier in the list, but not by clauses occurring later in the list.

The from clause, with syntax given by from_clause, specifies an entity name which identifies the queried entity. Path expressions occurring in later clauses are interpreted with respect to this entity. That is, the first element of each path expression in the query must be a persistent field of the entity named in the from clause. The entity name is a Java identifier, usually the unqualified name of the entity class. [This needs to be clarified in section 3.1.]

The from clause is optional in select statements. When it is missing, the queried entity is determined by the return type of the repository method, or, if the return type is not an entity type, by the primary entity type of the repository.

NOTE: The syntax of the update statement is irregular, with the from keyword implied. That is, the syntax should be update from Entity, but for historical reasons it is simply update Entity.

The where clause, with syntax given by where_clause, specifies a conditional expression used to restrict the records returned, deleted, or updated by the query. Only records for which the conditional expression is satisfied are returned, deleted, or updated.

The where clause is always optional. When it is missing, there is no restriction, and, and all records of the queried entity type are returned, deleted, or updated.

The select clause, with syntax given by select_clause, specifies a list of expressions which are returned by the query. Each expression in the list is evaluated for each record which satisfies the restriction imposed by the where clause. Alternatively, the select clause may contain a single count(*) aggregate expression, which evaluates to the number of records which satisfy the restriction.

The select clause is optional in select statements. When it is missing, the query returns the queried entity.

The set clause, with syntax given by set_clause, specifies a list of updates to fields of the queried entity. For each record which satisfies the restriction imposed by the where clause, and for each element of the list, the scalar expression is evaluated and assigned to the entity field identified by the path expression.

The order clause, with syntax given by orderby_clause, specifies a lexicographic order for the query results, that is, a list of entity fields used to sort the records which satisfy the restriction imposed by the where clause. The keywords asc and desc specify that a given field should be sorted in ascending or descending order respectively; when neither is specified, ascending order is the default.

The order clause is always optional. When it is missing, the order of the query results is undefined, and may not be deterministic.

Statements

Finally, there are three kinds of statement:

  • select statements,
  • update statements, and
  • delete statements.

The clauses which can appear in a statement are given by the grammar for each kind of statement.

A select statement, with syntax given by select_statement, returns data to the client. For each record which satisfies the restriction imposed by the where clause, a result is returned containing the values obtained by evaluating the scalar expressions in the select clause. Alternatively, for the case of select count(*), the query returns the number of records which satisfied the restriction.

An update statement, with syntax given by update_statement, updates each record which satisfies the restriction imposed by the where clause, and returns the number of updated records to the client.

A delete statement, with syntax given by delete_statement, deletes each record which satisfies the restriction imposed by the where clause, and returns the number of deleted records to the client.

@gavinking
Copy link
Contributor

gavinking commented Mar 5, 2024

(Though I believe I still need to add some more info on typing rules.)

Also, if we really are supporting multiple items in the select list, we need to specify how they are returned. JPQL says they're returned as an array of type Object[]. On the other hand, there are quite strong reasons to allow this sort of thing:

record Summary(String isbn, String title, String author) {}

@Query("select isbn, title, author.name from Book")
List<Summary> summaries();

@otaviojava
Copy link
Contributor Author

Going to need information on which parts of the expression language are not required on specific kinds of datastore technology. A lot of that information is in section 4.6.2 I suppose. But I'm a bit surprised by the extent of the limitations listed there. For example: is it really true that in can't be implemented on a document database? It seems to me that is you have and and = then you can emulate in.

In general, a document can be implemented in the same way as a wide column; however, we don't guarantee that all the fields will be included.

For example, Cassandra supports it only on the key or indexed fields.

Amazon DynamoDB, as far as I know is a document that has not the in clausure.

In those cases, I would go to town an UnsupportedOperationException.

@gavinking
Copy link
Contributor

I have collected my proposals here https://github.com/jakartaee/data/pull/520/files.

@njr-11
Copy link
Contributor

njr-11 commented Mar 5, 2024

Also, if we really are supporting multiple items in the select list, we need to specify how they are returned. JPQL says they're returned as an array of type Object[]. On the other hand, there are quite strong reasons to allow this sort of thing:

record Summary(String isbn, String title, String author) {}

@Query("select isbn, title, author.name from Book")
List<Summary> summaries();

Supporting multiple items in the select list is nice for writing more efficient queries that don't need to fetch the entire content from the database. Object[] is not a type safe way to do it, but the record/constructor approach is type safe, has excellent usability, and is also supported by JPQL, so we can (and I think should) include it in Jakarta Data as a subset of JPQL. The example given above would look like the following,

record Summary(String isbn, String title, String author) {}

@Query("select new org.eclipse.example.Summary(b.isbn, b.title, b.author.name) from Book b")
List<Summary> summaries();

@gavinking
Copy link
Contributor

gavinking commented Mar 5, 2024

@njr-11 I actually hate the select new syntax now, for its verbosity, and I've proposed that it be made optional in JPA:

jakartaee/persistence#420

Remember that this syntax was from JPA 1.0, from before Java had generics, and has not really made a lot of sense since we introduced TypedQuery in JPA 2.0.

But here, the argument that we don't need it is even stronger, since we can always safely infer the return type of the query from looking at the return type of the repository method.

So I think it's reasonable to say that a repository method can just automatically repackage the Object[] array coming from JPA as an instance of its return type, assuming that the return type has an appropriate constructor.

@njr-11
Copy link
Contributor

njr-11 commented Mar 5, 2024

I've proposed that it be made optional in JPA:

jakartaee/persistence#420

Excellent - that would be great to see the more concise syntax go into JPQL. In that case, I think we should omit this from our version 1.0 and plan to add it in once JPQL adds the more concise form.

@gavinking
Copy link
Contributor

In that case, I think we should omit this from our version 1.0 and plan to add it in once JPQL adds the more concise form.

That's fine by me.

@gavinking
Copy link
Contributor

gavinking commented Mar 15, 2024

Excellent - that would be great to see the more concise syntax go into JPQL.

FTR, my proposal was just merged by @lukasj, and this is now a legal JPQL query:

FROM Order
WHERE customer.lastname = 'Smith'
  AND customer.firstname = 'John'

and is equivalent to:

SELECT this
FROM Order AS this
WHERE this.customer.lastname = 'Smith'
  AND this.customer.firstname = 'John'

@KyleAure KyleAure mentioned this issue Mar 19, 2024
19 tasks
@gavinking
Copy link
Contributor

Done!! 🎉🥳🎈🎉

@njr-11
Copy link
Contributor

njr-11 commented Mar 25, 2024

Reopening - we need to write TCK tests for this.

njr-11 added a commit to njr-11/data that referenced this issue Mar 25, 2024
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
njr-11 added a commit to njr-11/data that referenced this issue Mar 25, 2024
njr-11 added a commit to njr-11/data that referenced this issue Mar 25, 2024
njr-11 added a commit to njr-11/data that referenced this issue Mar 26, 2024
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
njr-11 added a commit to njr-11/data that referenced this issue Mar 26, 2024
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
njr-11 added a commit to njr-11/data that referenced this issue Mar 27, 2024
…resent

Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
njr-11 added a commit to njr-11/data that referenced this issue Mar 29, 2024
Signed-off-by: Nathan Rauh <nathan.rauh@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vote Something to vote on
Projects
None yet
Development

No branches or pull requests

5 participants